7.3 Preparing Data

In order to perform empirical calibration it is necessary to have some quantitative data that relates directly to the Itrax XRF data. The quantitative data is unlikely to be at the same resolution as the Itrax XRF data. For example, it is typical to scan cores at between 0.2 and 1 mm, but typical to sub-sample for conventional XRF or ICP at between 5 - 10 mm. In addition, where Itrax scans are contiguous, sub-samples may not be, for example a 10 mm sub-sample might be taken every 80 mm. In this example we have sub-sampled and analysed the 60 samples defined in the previous section. Here they have been freeze-dried, lightly milled using agate stoneware, pressed into loose powder pellets and analysed with a Niton XL3t GOLDD+ ED-XRF using the “TestAllGeo” mode.

load("calibration_samples/CD166_hhxrf.RData")
glimpse(hhxrf)

## Rows: 60
## Columns: 38
## $ top      <dbl> 330, 430, 440, 460, 670, 770, 800, 840, 930, 1020, 1090, 1110…
## $ bot      <dbl> 340, 440, 450, 470, 680, 780, 810, 850, 940, 1030, 1100, 1120…
## $ SampleID <chr> "cd166-33", "cd166-43", "cd166-44", "cd166-46", "cd166-67", "…
## $ Mg       (err) NA(NA), NA(NA), NA(NA), NA(NA), 30000(10000), NA(NA), NA(NA),…
## $ Al       (err) 14000(2000), 22000(2000), 25000(2000), 28000(3000), 19000(200…
## $ Si       (err) 45000(1000), 67000(1000), 75000(1000), 93000(1000), 56000(100…
## $ P        (err) NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), 300(2…
## $ S        (err) 890(80), 1200(80), 1110(80), 1600(90), 1020(80), 1160(80), 11…
## $ Cl       (err) 11100(100), 5210(90), 13100(100), 10700(100), 5870(90), 3180(…
## $ K        (err) 4700(200), 8200(200), 8900(200), 9000(200), 5700(200), 6100(2…
## $ Ca       (err) 2.48(2)e5, 2.14(1)e5, 1.97(1)e5, 1.90(1)e5, 2.32(1)e5, 2.24(1…
## $ Sc       (err) NA(NA), NA(NA), 260(90), NA(NA), NA(NA), NA(NA), 300(100), NA…
## $ Ti       (err) 900(100), 1360(90), 990(40), 1540(90), 1400(100), 2000(100), …
## $ V        (err) 60(20), 70(20), 60(20), 50(20), 70(20), 70(30), 70(20), 80(30…
## $ Cr       (err) 30(20), NA(NA), NA(NA), 30(20), 30(20), NA(NA), NA(NA), 50(20…
## $ Mn       (err) 250(40), 320(30), 280(30), 270(40), 240(30), 420(40), 380(30)…
## $ Fe       (err) 9200(200), 11900(100), 12400(100), 12800(200), 10400(100), 13…
## $ Co       (err) NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), 50(30…
## $ Ni       (err) NA(NA), NA(NA), NA(NA), 50(20), NA(NA), 30(20), 40(10), NA(NA…
## $ Cu       (err) 30(10), 28(8), 28(6), 21(9), 25(7), 37(8), 38(7), 33(8), 50(1…
## $ Zn       (err) 51(8), 67(6), 60(5), 33(6), 21(4), 25(5), 23(4), 21(5), 45(8)…
## $ As       (err) NA(NA), NA(NA), 16(5), NA(NA), NA(NA), 4(2), 6(3), NA(NA), 6(…
## $ Se       (err) NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), 2(1), NA(NA),…
## $ Rb       (err) 20(2), 28(2), 30(1), 29(2), 18(1), 19(1), 17(1), 17(1), 14(2)…
## $ Sr       (err) 950(10), 824(7), 720(5), 646(7), 862(6), 854(7), 794(5), 845(…
## $ Zr       (err) 42(4), 48(3), 43(2), 53(3), 48(2), 54(3), 52(2), 58(3), 129(4…
## $ Nb       (err) 6(1), 5(1), 5.2(8), 4(1), 5.4(9), 8(1), 6.9(9), 9(1), 20(10),…
## $ Mo       (err) 6(2), 3(2), 3(1), NA(NA), 4(1), 3(2), 3(1), 3(1), 5(2), 3(2),…
## $ Ag       (err) 5(3), 5(3), NA(NA), NA(NA), NA(NA), 6(3), NA(NA), NA(NA), NA(…
## $ Cd       (err) 10(6), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA)…
## $ Sn       (err) 13(7), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA)…
## $ Ba       (err) 170(30), 130(30), 150(30), 220(30), 150(30), 140(30), 180(30)…
## $ W        (err) NA(NA), NA(NA), NA(NA), NA(NA), 20(10), 30(20), NA(NA), NA(NA…
## $ Au       (err) NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), 4(3),…
## $ Hg       (err) NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA…
## $ Pb       (err) 6(3), 13(3), 227(6), 10(3), 6(2), 10(2), 47(3), 10(2), 8(3), …
## $ Th       (err) 5(2), 4(2), 10(4), 4(2), 4(1), 13(5), 12(4), 13(5), NA(NA), 4…
## $ U        (err) NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA), NA(NA…

The function itraxR::itrax_reduce() can be used to reduce the Itrax data to match the resolution of some other data. The example below uses the same positions of the conventional XRF analyses of the data to summarise it. And here is the use of itrax_reduce() to reduce our Itrax XRF data (CD166_19_xrf), using the shape of the conventional XRF data (in xrf). Note the requirement to remove text based columns (in this case file and label) from the data before this step is performed - if the reducing function cannot handle a data type (e.g. passing characters to mean()), errors will occur. If we wanted to add the standard deviation alongside the mean for each chunk, this can be done my modifying the default reducing function (mean()) to sd(), for example:

xrf <- CD166_19_xrf %>%
  filter(qc == TRUE) %>%
  select(-c(label, filename, uid)) %>%
  itrax_reduce(names = hhxrf$SampleID,
               breaks_lower = hhxrf$top,
               breaks_upper = hhxrf$bot) %>%
  rename(SampleID = resample_names) %>%
  mutate(top = hhxrf$top, 
         bot = hhxrf$bot)

glimpse(xrf)

## Rows: 60
## Columns: 58
## $ SampleID         <chr> "cd166-33", "cd166-43", "cd166-44", "cd166-46", "cd16…
## $ depth            <dbl> 334.5, 434.5, 444.5, 464.5, 674.5, 774.5, 804.5, 844.…
## $ MSE              <dbl> 1.707, 1.585, 1.562, 1.499, 1.644, 1.581, 1.669, 1.65…
## $ cps              <dbl> 46735.9, 38494.0, 36829.5, 34929.5, 44762.0, 45024.2,…
## $ validity         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Al               <dbl> 79.5, 71.5, 78.1, 57.0, 77.0, 87.5, 71.1, 75.0, 71.0,…
## $ Si               <dbl> 347.4, 929.3, 1338.6, 1043.9, 394.1, 424.5, 375.0, 41…
## $ P                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ S                <dbl> 54.5, 7.6, 6.0, 10.6, 89.2, 28.3, 17.8, 20.1, 12.1, 2…
## $ Cl               <dbl> 1839.4, 755.2, 492.3, 677.4, 1781.5, 1499.4, 1565.9, …
## $ Ar               <dbl> 651.4, 543.7, 446.4, 398.8, 530.5, 503.7, 525.3, 552.…
## $ K                <dbl> 2539.8, 3059.0, 3143.8, 3182.9, 2910.6, 2885.7, 2597.…
## $ Ca               <dbl> 234113.7, 157804.5, 97585.4, 115428.4, 202283.5, 2034…
## $ Sc               <dbl> 0.0, 0.0, 0.0, 0.0, 10.1, 0.0, 0.0, 0.0, 0.0, 4.9, 9.…
## $ Ti               <dbl> 1723.8, 1936.0, 4692.8, 1964.6, 2398.6, 2549.2, 2194.…
## $ V                <dbl> 15.5, 59.0, 116.6, 77.9, 26.2, 56.6, 39.0, 52.1, 82.2…
## $ Cr               <dbl> 309.6, 260.2, 329.8, 278.6, 343.1, 314.9, 281.7, 347.…
## $ Mn               <dbl> 743.8, 580.3, 794.1, 796.1, 597.4, 910.3, 738.9, 723.…
## $ Fe               <dbl> 32624.9, 43792.7, 74938.8, 44888.9, 40525.2, 45688.0,…
## $ Ni               <dbl> 98.3, 65.7, 77.2, 92.3, 86.3, 108.4, 123.6, 114.2, 17…
## $ Cu               <dbl> 191.6, 90.3, 94.3, 80.6, 171.4, 202.8, 172.6, 185.2, …
## $ Zn               <dbl> 145.4, 163.5, 171.5, 177.8, 161.8, 179.2, 142.6, 176.…
## $ Ga               <dbl> 9.5, 80.8, 110.2, 72.1, 19.6, 23.1, 10.0, 27.2, 13.9,…
## $ Ge               <dbl> 73.6, 103.1, 107.7, 104.2, 95.5, 105.4, 68.3, 115.3, …
## $ Br               <dbl> 494.0, 287.1, 204.8, 260.3, 483.4, 431.7, 422.9, 409.…
## $ Rb               <dbl> 196.6, 295.8, 404.3, 375.9, 201.0, 221.6, 182.8, 188.…
## $ Sr               <dbl> 11406.2, 6618.7, 4505.5, 5548.2, 10746.5, 10305.0, 10…
## $ Y                <dbl> 109.9, 97.8, 108.0, 52.5, 156.0, 126.7, 159.1, 114.1,…
## $ Zr               <dbl> 208.0, 413.4, 2082.7, 456.9, 257.6, 320.7, 294.2, 301…
## $ Pd               <dbl> 79.9, 43.0, 25.8, 31.4, 72.0, 63.6, 64.2, 61.6, 41.7,…
## $ Cd               <dbl> 111.5, 88.6, 44.8, 56.3, 83.5, 88.5, 108.0, 109.8, 72…
## $ I                <dbl> 57.1, 54.8, 27.5, 28.9, 70.6, 61.9, 69.2, 74.6, 57.5,…
## $ Cs               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ Ba               <dbl> 29.7, 44.3, 111.6, 72.4, 31.7, 53.1, 10.6, 28.2, 18.9…
## $ Nd               <dbl> 34.3, 20.9, 46.1, 32.5, 27.5, 31.2, 28.2, 29.6, 37.0,…
## $ Sm               <dbl> 6.9, 47.8, 82.8, 50.0, 15.6, 18.5, 15.8, 16.7, 28.7, …
## $ Yb               <dbl> 345.3, 276.6, 205.2, 178.5, 285.1, 264.9, 262.9, 277.…
## $ Ta               <dbl> 727.9, 659.0, 603.8, 630.3, 746.6, 688.6, 706.4, 674.…
## $ W                <dbl> 2288.2, 2044.0, 2001.0, 2056.4, 2153.3, 2090.1, 2129.…
## $ Pb               <dbl> 63.7, 64.7, 109.2, 66.6, 55.0, 87.7, 40.4, 217.0, 49.…
## $ Bi               <dbl> 162.0, 123.9, 136.1, 124.1, 148.2, 139.9, 126.7, 147.…
## $ `Mo inc`         <dbl> 26754.9, 23616.2, 21588.2, 23656.8, 26309.5, 25681.1,…
## $ `Mo coh`         <dbl> 10207.9, 9466.9, 8911.0, 9174.5, 9985.4, 9881.3, 1007…
## $ position         <dbl> 366.04, 466.04, 476.04, 496.04, 706.04, 806.04, 836.0…
## $ `sample surface` <dbl> 6.498, 6.000, 6.005, 6.080, 6.466, 6.484, 6.476, 6.51…
## $ `E-gain`         <dbl> 0.010267, 0.010267, 0.010267, 0.010267, 0.010267, 0.0…
## $ `E-offset`       <dbl> -0.009931, -0.009931, -0.009931, -0.009931, -0.009931…
## $ `F-slope`        <dbl> 0.0099, 0.0099, 0.0099, 0.0099, 0.0099, 0.0099, 0.009…
## $ `F-offset`       <dbl> 0.077115, 0.077115, 0.077115, 0.077115, 0.077115, 0.0…
## $ `Fe a*2`         <dbl> 158.6, 74.1, 132.1, 167.6, 158.4, 118.7, 125.7, 105.0…
## $ `Fe a+b`         <dbl> 82.2, 139.4, 111.5, 146.1, 136.0, 146.1, 133.9, 106.2…
## $ S1               <dbl> 306.6, 311.3, 475.1, 303.5, 318.9, 370.9, 274.8, 380.…
## $ S2               <dbl> 219.8, 199.8, 299.0, 209.5, 244.8, 263.5, 231.1, 249.…
## $ S3               <dbl> 252.7, 298.4, 430.0, 337.4, 261.1, 340.6, 312.0, 380.…
## $ Dt               <dbl> 0.1021, 0.0990, 0.0979, 0.1017, 0.1022, 0.1008, 0.101…
## $ qc               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ top              <dbl> 330, 430, 440, 460, 670, 770, 800, 840, 930, 1020, 10…
## $ bot              <dbl> 340, 440, 450, 470, 680, 780, 810, 850, 940, 1030, 11…

It is worth reading ?itraxR::itrax_reduce() as the behavior can and should be modified depending on your exact use case. For example, in th situation above where none of the samples are contiguous it might be wise to modify the parameters of itraxR::itrax_reduce() to include edges = c(">=", "<=") so that the “edges” of the sub-samples are captured. This might not be the case for contiguous samples in order to avoid “double-counting”.