7.3 Preparing Data

In order to perform empirical calibration it is necessary to have some quantitative data that relates directly to the Itrax XRF data. The quantitative data is unlikely to be at the same resolution as the Itrax XRF data. For example, it is typical to scan cores at between 0.2 and 1 mm, but typical to sub-sample for WD-XRF or ICP at between 5 - 10 mm. In addition, where Itrax scans are contiguous, sub-samples may not be, for example a 10 mm sub-sample might be taken every 80 mm. In this example we have sub-sampled and analysed the 60 samples defined in the previous section. Here they have been freeze-dried, digested in aqua regia using microwave digestion (Roje (2010)), filtered and analysed using ICP-OES.

icp <- read_csv("calibration_samples/calibration_data.csv") %>%
  #select(-`Sample Id`) %>%
  mutate(water_content = (((wet-tare)-(dry-tare))/(wet-tare))*100) %>%
  select(-c("wet", "tare", "dry")) %>%
  # remove negative values
  mutate(across(any_of(elementsList), function(x){replace(x, which(x<0), NA)})) %>%
  # adjust for dilution
  mutate(across(any_of(elementsList), function(x){(x*`Dilution`)/`weight`})) %>%
  # adjust for water content
  mutate(across(any_of(elementsList), function(x){x*(1-water_content/100)})) %>%
  # remove Inf or NaN values
  mutate(across(any_of(elementsList), function(x){replace(x, is.infinite(x) | is.nan(x), NA)})) %>%
  select(-c("weight", "Dilution")) %>%
  filter(!top %in% c("BLANK", "MESS-4")) %>%
  mutate(top = as.numeric(top), bot = as.numeric(bot))
## Rows: 65 Columns: 39
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): top, SampleID
## dbl (37): bot, weight, tare, wet, dry, Dilution, Ag, Al, B, Ba, Bi, Ca, Cd, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(icp)
## Rows: 60
## Columns: 35
## $ top           <dbl> 330, 430, 440, 460, 670, 770, 800, 840, 930, 1020, 1090,…
## $ bot           <dbl> 340, 440, 450, 470, 680, 780, 810, 850, 940, 1030, 1100,…
## $ SampleID      <chr> "B14", "A29", "B01", "B03", "A18", "A20", "B16", "A08", …
## $ Ag            <dbl> 5.415270, 4.785250, 4.594999, 4.342017, 5.529114, 4.8258…
## $ Al            <dbl> 6069.841, 6420.848, 8793.844, 7130.261, 5629.675, 6557.3…
## $ B             <dbl> NA, 0.6380333, NA, NA, 2.7645570, NA, NA, NA, NA, NA, 0.…
## $ Ba            <dbl> 101.53632, 97.93811, 107.32606, 74.81630, 54.25443, 82.0…
## $ Bi            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Ca            <dbl> 170233.08, 152129.46, 146189.91, 140256.85, 162182.39, 1…
## $ Cd            <dbl> 0.3384544, 0.0000000, 0.3282142, 0.0000000, 0.3455696, 0…
## $ Co            <dbl> 5.415270, 5.742299, 5.579642, 4.342017, 5.529114, 7.5834…
## $ Cr            <dbl> 12.522813, 22.012148, 32.164996, 13.694055, 11.403797, 1…
## $ Cu            <dbl> 16.584265, 15.631815, 15.754284, 12.024048, 18.660759, 2…
## $ Fe            <dbl> 6195.408, 7816.546, 9227.415, 8653.307, 7051.694, 8217.7…
## $ Ga            <dbl> 34.52235, 26.47838, 32.49321, 23.38009, 44.57848, 56.876…
## $ In            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.9032826, N…
## $ K             <dbl> 1504.768, 1744.064, 2508.213, 2244.489, 1444.827, 1454.6…
## $ Li            <dbl> 9.476723, 12.122632, 13.456784, 12.692051, 11.749367, 11…
## $ Mg            <dbl> 2504.901, 3840.003, 4673.771, 4586.506, 2908.659, 2888.9…
## $ Mn            <dbl> 281.5941, 274.6733, 272.0896, 213.4269, 199.0481, 372.62…
## $ Na            <dbl> 5742.556, 5555.675, 5723.400, 5228.123, 5576.111, 5440.8…
## $ Ni            <dbl> 9.815177, 13.398699, 14.441427, 11.022044, 11.749367, 15…
## $ Pb            <dbl> NA, NA, NA, NA, NA, 23.095198, NA, NA, NA, NA, 16.424885…
## $ Sr            <dbl> 732.4153, 632.9290, 596.0371, 495.3240, 677.3165, 641.83…
## $ Tl            <dbl> 9.4767230, 4.4662329, 1.9692855, 7.6820307, 4.8379747, N…
## $ Zn            <dbl> 67.35242, 74.33088, 85.66392, 59.11824, 33.52025, 41.709…
## $ Si            <dbl> 504.2970, 188.5388, 340.0300, 527.0541, 286.8228, 264.04…
## $ Ti            <dbl> 422.3911, 342.9429, 374.1642, 293.2532, 526.9937, 649.07…
## $ V             <dbl> 15.90736, 16.26985, 20.34928, 18.03607, 17.27848, 21.026…
## $ Y             <dbl> 6.430633, 6.061316, 6.564285, 4.676019, 5.529114, 6.2046…
## $ Zr            <dbl> 9.476723, 9.251483, 10.502856, 8.016032, 12.094937, 14.8…
## $ S             <dbl> 572.3264, 768.5111, 666.9314, 708.7508, 832.4772, 680.10…
## $ P             <dbl> 166.5196, 182.1585, 220.5600, 202.0708, 196.9747, 186.14…
## $ Mo            <dbl> NA, NA, 2.6257140, NA, NA, NA, NA, NA, NA, NA, NA, 0.602…
## $ water_content <dbl> 30.58300, 30.35867, 31.20629, 28.42351, 30.85152, 29.921…

The function itraxR::itrax_reduce() can be used to reduce the Itrax data to match the resolution of some other data. The example below uses the same positions of the ICP-OES analyses of the data to summarise it. And here is the use of itrax_reduce() to reduce our Itrax XRF data (CD166_19_xrf), using the shape of the “icp” data (in icp). Note the requirement to remove text based columns (in this case file and label) from the data before this step is performed - if the reducing function cannot handle a data type (e.g. passing characters to mean()), errors will occur. If we wanted to add the standard deviation alongside the mean for each chunk, this can be done my modifying the default reducing function (mean()) to sd(), for example:

xrf <- CD166_19_xrf %>%
  filter(qc == TRUE) %>%
  select(-c(label, filename, uid)) %>%
  itrax_reduce(names = icp$SampleID,
               breaks_lower = icp$top,
               breaks_upper = icp$bot) %>%
  rename(SampleID = resample_names) %>%
  mutate(top = icp$top, 
         bot = icp$bot)

glimpse(xrf)
## Rows: 60
## Columns: 58
## $ SampleID         <chr> "B14", "A29", "B01", "B03", "A18", "A20", "B16", "A08…
## $ depth            <dbl> 334.500, 434.500, 444.500, 464.500, 674.500, 774.500,…
## $ MSE              <dbl> 1.707000, 1.585000, 1.562000, 1.499000, 1.644000, 1.5…
## $ cps              <dbl> 46735.90, 38494.00, 36829.50, 34929.50, 44762.00, 450…
## $ validity         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Al               <dbl> 79.5000, 71.5000, 78.1000, 57.0000, 77.0000, 87.5000,…
## $ Si               <dbl> 347.4, 929.3, 1338.6, 1043.9, 394.1, 424.5, 375.0, 41…
## $ P                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ S                <dbl> 54.500000, 7.600000, 6.000000, 10.600000, 89.200000, …
## $ Cl               <dbl> 1839.4000, 755.2000, 492.3000, 677.4000, 1781.5000, 1…
## $ Ar               <dbl> 651.4000, 543.7000, 446.4000, 398.8000, 530.5000, 503…
## $ K                <dbl> 2539.800, 3059.000, 3143.800, 3182.900, 2910.600, 288…
## $ Ca               <dbl> 234113.7, 157804.5, 97585.4, 115428.4, 202283.5, 2034…
## $ Sc               <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 10.100000, 0.…
## $ Ti               <dbl> 1723.80, 1936.00, 4692.80, 1964.60, 2398.60, 2549.20,…
## $ V                <dbl> 15.5000, 59.0000, 116.6000, 77.9000, 26.2000, 56.6000…
## $ Cr               <dbl> 309.600, 260.200, 329.800, 278.600, 343.100, 314.900,…
## $ Mn               <dbl> 743.800, 580.300, 794.100, 796.100, 597.400, 910.300,…
## $ Fe               <dbl> 32624.9, 43792.7, 74938.8, 44888.9, 40525.2, 45688.0,…
## $ Ni               <dbl> 98.3, 65.7, 77.2, 92.3, 86.3, 108.4, 123.6, 114.2, 17…
## $ Cu               <dbl> 191.6000, 90.3000, 94.3000, 80.6000, 171.4000, 202.80…
## $ Zn               <dbl> 145.4000, 163.5000, 171.5000, 177.8000, 161.8000, 179…
## $ Ga               <dbl> 9.50000, 80.80000, 110.20000, 72.10000, 19.60000, 23.…
## $ Ge               <dbl> 73.60000, 103.10000, 107.70000, 104.20000, 95.50000, …
## $ Br               <dbl> 494.0000, 287.1000, 204.8000, 260.3000, 483.4000, 431…
## $ Rb               <dbl> 196.6000, 295.8000, 404.3000, 375.9000, 201.0000, 221…
## $ Sr               <dbl> 11406.200, 6618.700, 4505.500, 5548.200, 10746.500, 1…
## $ Y                <dbl> 109.9000, 97.8000, 108.0000, 52.5000, 156.0000, 126.7…
## $ Zr               <dbl> 208.0, 413.4, 2082.7, 456.9, 257.6, 320.7, 294.2, 301…
## $ Pd               <dbl> 79.90000, 43.00000, 25.80000, 31.40000, 72.00000, 63.…
## $ Cd               <dbl> 111.50000, 88.60000, 44.80000, 56.30000, 83.50000, 88…
## $ I                <dbl> 57.10000, 54.80000, 27.50000, 28.90000, 70.60000, 61.…
## $ Cs               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ Ba               <dbl> 29.7000, 44.3000, 111.6000, 72.4000, 31.7000, 53.1000…
## $ Nd               <dbl> 34.30000, 20.90000, 46.10000, 32.50000, 27.50000, 31.…
## $ Sm               <dbl> 6.9000, 47.8000, 82.8000, 50.0000, 15.6000, 18.5000, …
## $ Yb               <dbl> 345.3000, 276.6000, 205.2000, 178.5000, 285.1000, 264…
## $ Ta               <dbl> 727.9000, 659.0000, 603.8000, 630.3000, 746.6000, 688…
## $ W                <dbl> 2288.200, 2044.000, 2001.000, 2056.400, 2153.300, 209…
## $ Pb               <dbl> 63.70000, 64.70000, 109.20000, 66.60000, 55.00000, 87…
## $ Bi               <dbl> 162.0000, 123.9000, 136.1000, 124.1000, 148.2000, 139…
## $ `Mo inc`         <dbl> 26754.90, 23616.20, 21588.20, 23656.80, 26309.50, 256…
## $ `Mo coh`         <dbl> 10207.900, 9466.900, 8911.000, 9174.500, 9985.400, 98…
## $ position         <dbl> 366.0400, 466.0400, 476.0400, 496.0400, 706.0400, 806…
## $ `sample surface` <dbl> 6.498, 6.000, 6.005, 6.080, 6.466, 6.484, 6.476, 6.51…
## $ `E-gain`         <dbl> 0.010267, 0.010267, 0.010267, 0.010267, 0.010267, 0.0…
## $ `E-offset`       <dbl> -0.009931, -0.009931, -0.009931, -0.009931, -0.009931…
## $ `F-slope`        <dbl> 0.0099, 0.0099, 0.0099, 0.0099, 0.0099, 0.0099, 0.009…
## $ `F-offset`       <dbl> 0.077115, 0.077115, 0.077115, 0.077115, 0.077115, 0.0…
## $ `Fe a*2`         <dbl> 158.6000, 74.1000, 132.1000, 167.6000, 158.4000, 118.…
## $ `Fe a+b`         <dbl> 82.2000, 139.4000, 111.5000, 146.1000, 136.0000, 146.…
## $ S1               <dbl> 306.600, 311.300, 475.100, 303.500, 318.900, 370.900,…
## $ S2               <dbl> 219.8000, 199.8000, 299.0000, 209.5000, 244.8000, 263…
## $ S3               <dbl> 252.700, 298.400, 430.000, 337.400, 261.100, 340.600,…
## $ Dt               <dbl> 0.10210000, 0.09900000, 0.09790000, 0.10170000, 0.102…
## $ qc               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ top              <dbl> 330, 430, 440, 460, 670, 770, 800, 840, 930, 1020, 10…
## $ bot              <dbl> 340, 440, 450, 470, 680, 780, 810, 850, 940, 1030, 11…

It is worth reading ?itraxR::itrax_reduce() as the behaviour can and should be modified depending on your exact use case. For example, in th situation above where none of the samples are contiguous it might be wise to modify the parameters of itraxR::itrax_reduce() to include edges = c(">=", "<=") so that the “edges” of the sub-samples are captured. This might not be the case for contiguous samples in order to avoid “double-counting”.

References

Roje, V. 2010. “Multi-Elemental Analysis of Marine Sediment Reference Material MESS-3: One-Step Microwave Digestion and Determination by High Resolution Inductively Coupled Plasma-Mass Spectrometry (HR-ICP-MS).” Chemical Papers 64: 41–50. https://doi.org/10.2478/s11696-010-0022-x.