5.1 Ratios and Normalisation

These data are compositional, and represent the changes in the relative proportions of all components of the matrix, measured and un-measured. As such it is likely the data will need transforming for certain types of multivariate analysis. As previously mentioned, these data are dimensionless, and as such do not represent a quantity, but are directly related to the absolute amount of a particular element in the matrix. It is often the case that ratios of elements are used to represent changes in composition — this is sometimes referred to as normalisation, or normalising one element against another.

For example, particular element ratios can be used to make some environmental inference. Log-ratios are usually preferable here as they are resilient to matrix effects. In this context, the Ca/Ti log-ratio is used to infer a relative measure of productivity (that it, it can identify the hemipelagic sediments in the sequence). Note the use of abs() has the effect of ensuring that switching the denominator and numerator has no effect.

CD166_19_xrf %>%
  mutate(`log(Fe/Ca)` = abs(log(Fe/Ca))) %>%
  filter(qc == TRUE) %>%
  
  ggplot(aes(x = depth, y = `log(Fe/Ca)`)) +
  geom_line() + 
  scale_x_reverse()

Sometimes z-scores are useful to use when units don’t have meaningful units. They scale the data in standard deviations from their mean, so give a useful quantity for the magnitude of change. For example:

CD166_19_xrf %>%
  mutate(`log(Fe/Ca)` = abs(log(Fe/Ca))) %>%
  mutate(zscore = (`log(Fe/Ca)` - mean(`log(Fe/Ca)`, na.rm = TRUE))/sd(`log(Fe/Ca)`, na.rm = TRUE)) %>%
  filter(qc == TRUE) %>%
  
  ggplot(aes(x = depth, y = zscore)) +
  geom_line() + 
  scale_x_reverse() +
  ylab("log(Fe/Ca) [z-score]")

It is trivial to calculate element ratios, to the extent that these can often simply be calculated where they are required rather than saving them to memory. For example, if a plot of the Compton divided by the Rayleigh scatter was desired, there is no need to save the computed value to a new variable (e.g. coh_inc <- df$Mo.coh/df$Mo.inc) — simply define the calculation during plotting. To calculate ratios for all elements at once, use mutate(across(any_ofelementsList)), where elementsList is a list of chemical elements extracted from data(PeriodicTable).

CD166_19_xrf %>% 
  mutate(across(any_of(elementsList)) /`Mo inc`)

This can be integrated into the work flow from the data tidying chapter, for example:

CD166_19_xrf %>% 
  # transform
  mutate(across(any_of(elementsList)) /`Mo inc`) %>%
  
  # identify acceptable observations
  filter(validity == TRUE) %>%
  
  # identify acceptable variables
  select(any_of(myElements), depth, label) %>% 
  
  # pivot
  tidyr::pivot_longer(!c("depth", "label"), names_to = "elements", values_to = "peakarea") %>% 
  mutate(elements = factor(elements, levels = c(elementsList, "coh/inc"))) %>%
  
  # plot
  ggplot(aes(x = peakarea, y = depth)) +
    tidypaleo::geom_lineh(aes(color = label)) +
    scale_y_reverse() +
    scale_x_continuous(n.breaks = 2) +
    facet_geochem_gridh(vars(elements)) +
    labs(x = "peak area / Mo. inc.", y = "Depth [mm]") +
    tidypaleo::theme_paleo() +
    theme(legend.position = "none")

Note that where zero values are encountered in divisions, dividing by zero will lead to Inf values. This can cause issues with plotting data, and these values should be cleaned into NA before plotting. For example:

ggarrange(
  
  CD166_19_xrf %>%
  mutate(`Fe/Sc` = Fe/Sc) %>%
  ggplot(aes(x = depth, y = `Fe/Sc`)) + 
  geom_line() + 
    scale_x_reverse() + 
    ggtitle("w/ Inf."),
  
  CD166_19_xrf %>%
  mutate(`Fe/Sc` = Fe/Sc) %>%
  mutate_if(is.numeric, list(~na_if(., Inf))) %>% # convert all Inf to NA
  ggplot(aes(x = depth, y = `Fe/Sc`)) + 
  geom_line() + 
    scale_x_reverse() + 
    ggtitle("na_if(., Inf)")
  
)