5.1 Ratios and Normalisation
These data are compositional, and represent the changes in the relative proportions of all components of the matrix, measured and un-measured. As such it is likely the data will need transforming for certain types of multivariate analysis. As previously mentioned, these data are dimensionless, and as such do not represent a quantity, but are directly related to the absolute amount of a particular element in the matrix. It is often the case that ratios of elements are used to represent changes in composition — this is sometimes referred to as normalisation, or normalising one element against another.
It is trivial to calculate element ratios, to the extent that these can often simply be calculated where they are required rather than saving them to memory. For example, if a plot of the Compton divided by the Rayleigh scatter was desired, there is no need to save the computed value to a new variable (e.g.
coh_inc <- df$Mo.coh/df$Mo.inc) — simply define the calculation during plotting. To calculate ratios for all elements at once, use
elementsList is a list of chemical elements extracted from
CD166_19_xrf %>% mutate(across(any_of(elementsList)) /`Mo inc`)
This can be integrated into the work flow from the data tidying chapter, for example:
CD166_19_xrf %>% # transform mutate(across(any_of(elementsList)) /`Mo inc`) %>% # identify acceptable observations filter(validity == TRUE) %>% # identify acceptable variables select(any_of(myElements), depth, label) %>% # pivot tidyr::pivot_longer(!c("depth", "label"), names_to = "elements", values_to = "peakarea") %>% mutate(elements = factor(elements, levels = c(elementsList, "coh/inc"))) %>% # plot ggplot(aes(x = peakarea, y = depth)) + tidypaleo::geom_lineh(aes(color = label)) + scale_y_reverse() + scale_x_continuous(n.breaks = 2) + facet_geochem_gridh(vars(elements)) + labs(x = "peak area / Mo. inc.", y = "Depth [mm]") + tidypaleo::theme_paleo() + theme(legend.position = "none")
Note that where zero values are encountered in divisions, dividing by zero will lead to
Inf values. This can cause issues with plotting data, and these values should be cleaned into
NA before plotting. For example:
ggarrange( CD166_19_xrf %>% mutate(`Fe/Sc` = Fe/Sc) %>% ggplot(aes(x = depth, y = `Fe/Sc`)) + geom_line() + scale_x_reverse() + ggtitle("w/ Inf."), CD166_19_xrf %>% mutate(`Fe/Sc` = Fe/Sc) %>% mutate_if(is.numeric, list(~na_if(., Inf))) %>% # convert all Inf to NA ggplot(aes(x = depth, y = `Fe/Sc`)) + geom_line() + scale_x_reverse() + ggtitle("na_if(., Inf)") )