5.1 Ratios and Normalisation

These data are compositional, and represent the changes in the relative proportions of all components of the matrix, measured and un-measured. As such it is likely the data will need transforming for certain types of multivariate analysis. As previously mentioned, these data are dimensionless, and as such do not represent a quantity, but are directly related to the absolute amount of a particular element in the matrix. It is often the case that ratios of elements are used to represent changes in composition — this is sometimes referred to as normalisation, or normalising one element against another.

It is trivial to calculate element ratios, to the extent that these can often simply be calculated where they are required rather than saving them to memory. For example, if a plot of the Compton divided by the Rayleigh scatter was desired, there is no need to save the computed value to a new variable (e.g. coh_inc <- df$Mo.coh/df$Mo.inc) — simply define the calculation during plotting. To calculate ratios for all elements at once, use mutate(across(any_ofelementsList)), where elementsList is a list of chemical elements extracted from data(PeriodicTable).

CD166_19_xrf %>% 
  mutate(across(any_of(elementsList)) /`Mo inc`)

This can be integrated into the work flow from the data tidying chapter, for example:

CD166_19_xrf %>% 
  # transform
  mutate(across(any_of(elementsList)) /`Mo inc`) %>%
  
  # identify acceptable observations
  filter(validity == TRUE) %>%
  
  # identify acceptable variables
  select(any_of(myElements), depth, label) %>% 
  
  # pivot
  tidyr::pivot_longer(!c("depth", "label"), names_to = "elements", values_to = "peakarea") %>% 
  mutate(elements = factor(elements, levels = c(elementsList, "coh/inc"))) %>%
  
  # plot
  ggplot(aes(x = peakarea, y = depth)) +
    tidypaleo::geom_lineh(aes(color = label)) +
    scale_y_reverse() +
    scale_x_continuous(n.breaks = 2) +
    facet_geochem_gridh(vars(elements)) +
    labs(x = "peak area / Mo. inc.", y = "Depth [mm]") +
    tidypaleo::theme_paleo() +
    theme(legend.position = "none")

Note that where zero values are encountered in divisions, dividing by zero will lead to Inf values. This can cause issues with plotting data, and these values should be cleaned into NA before plotting. For example:

ggarrange(
  
  CD166_19_xrf %>%
  mutate(`Fe/Sc` = Fe/Sc) %>%
  ggplot(aes(x = depth, y = `Fe/Sc`)) + 
  geom_line() + 
    scale_x_reverse() + 
    ggtitle("w/ Inf."),
  
  CD166_19_xrf %>%
  mutate(`Fe/Sc` = Fe/Sc) %>%
  mutate_if(is.numeric, list(~na_if(., Inf))) %>% # convert all Inf to NA
  ggplot(aes(x = depth, y = `Fe/Sc`)) + 
  geom_line() + 
    scale_x_reverse() + 
    ggtitle("na_if(., Inf)")
  
)