7.2 Preparing Data

In order to perform empirical calibration it is necessary to have some quantitative data that relates directly to the Itrax XRF data. The quantitative data is unlikely to be at the same resolution as the Itrax XRF data. For example, it is typical to scan cores at between 0.2 and 1 mm, but typical to sub-sample for WD-XRF or ICP at between 5 - 10 mm. In addition, where Itrax scans are contiguous, sub-samples may not be, for example a 10 mm sub-sample might be taken every 80 mm.

The function itraxR::itrax_reduction() can be used to reduce the data to match the resolution of some other data. The example below uses the hypothetical example of some determinations of Ti on sub-samples on the core sequence that are 10 mm in size, taken every 50 mm. Here is the hypothetical data:

glimpse(icp)
## Rows: 84
## Columns: 4
## $ SampleID <chr> "ICP_ID_1", "ICP_ID_2", "ICP_ID_3", "ICP_ID_4", "ICP_ID_5", "…
## $ top      <dbl> 1, 51, 101, 151, 201, 251, 301, 351, 401, 451, 501, 551, 601,…
## $ bot      <dbl> 11, 61, 111, 161, 211, 261, 311, 361, 411, 461, 511, 561, 611…
## $ Ti       <dbl> 0.26570435, -0.02524118, 1.11716923, 0.03451199, -0.49316392,…

And here is the use of itrax_reduce() to reduce our Itrax XRF data (CD166_19_xrf), using the shape of the “icp” data (in icp).

xrf_icp <- CD166_19_xrf %>%
  drop_na() %>%
  select(-c(label, filename)) %>%
  itrax_reduce(names = icp$SampleID,
               breaks_lower = icp$top,
               breaks_upper = icp$bot) %>%
  select(resample_names, Ti) %>%
  rename(Ti_XRF = Ti,
         SampleID = resample_names) %>%
  
  inner_join(., icp, by = "SampleID") %>%
  select(SampleID, Ti, Ti_XRF, everything())

xrf_icp %>% glimpse()
## Rows: 84
## Columns: 5
## $ SampleID <chr> "ICP_ID_1", "ICP_ID_2", "ICP_ID_3", "ICP_ID_4", "ICP_ID_5", "…
## $ Ti       <dbl> 0.26570435, -0.02524118, 1.11716923, 0.03451199, -0.49316392,…
## $ Ti_XRF   <dbl> 2101.300, 2401.300, 3410.100, 5984.286, 2571.200, 1873.500, 1…
## $ top      <dbl> 1, 51, 101, 151, 201, 251, 301, 351, 401, 451, 501, 551, 601,…
## $ bot      <dbl> 11, 61, 111, 161, 211, 261, 311, 361, 411, 461, 511, 561, 611…

Note the requirement to remove text based columns (in this case file and label) from the data before this step is performed - if the reducing function cannot handle a data type (e.g. passing characters to mean()), errors will occur. If we wanted to add the standard deviation alongside the mean for each chunk, this can be done my modifying the default reducing function (mean()) to sd(), for example:

CD166_19_xrf %>%
  drop_na() %>%
  select(-c(label, filename)) %>%
  itrax_reduce(names = icp$SampleID,
               breaks_lower = icp$top,
               breaks_upper = icp$bot,
               fun = sd) %>%
  select(resample_names, Ti) %>%
  rename(Ti_XRF_sd = Ti,
         SampleID = resample_names) %>%
  inner_join(., xrf_icp, by = "SampleID") %>%
  select(SampleID, Ti, Ti_XRF, Ti_XRF_sd, everything()) %>%
  glimpse()
## Rows: 84
## Columns: 6
## $ SampleID  <chr> "ICP_ID_1", "ICP_ID_2", "ICP_ID_3", "ICP_ID_4", "ICP_ID_5", …
## $ Ti        <dbl> 0.26570435, -0.02524118, 1.11716923, 0.03451199, -0.49316392…
## $ Ti_XRF    <dbl> 2101.300, 2401.300, 3410.100, 5984.286, 2571.200, 1873.500, …
## $ Ti_XRF_sd <dbl> 331.36789, 90.61770, 92.66601, 1646.19417, 213.96511, 231.13…
## $ top       <dbl> 1, 51, 101, 151, 201, 251, 301, 351, 401, 451, 501, 551, 601…
## $ bot       <dbl> 11, 61, 111, 161, 211, 261, 311, 361, 411, 461, 511, 561, 61…

It is worth reading ?itraxR::itrax_reduce() as the behaviour can and should be modified depending on your exact use case. For example, in th situation above where none of the samples are contiguous it might be wise to modify the parameters of itraxR::itrax_reduce() to include edges = c(">=", "<=") so that the “edges” of the sub-samples are captured. This might not be the case for contiguous samples in order to avoid “double-counting”.