These data are compositional, and represent the changes in the relative proportions of all components of the matrix, measured and un-measured. As such it is likely the data will need transforming for certain types of multivariate analysis. As previously mentioned, these data are dimensionless, and as such do not represent a quantity, but are directly related to the absolute amount of a particular element in the matrix. It is often the case that ratios of elements are used to represent changes in composition — this is sometimes referred to as normalisation, or normalising one element against another.
For example, particular element ratios can be used to make some environmental inference. Log-ratios are usually preferable here as they are resilient to matrix effects. In this context, the Ca/Ti log-ratio is used to infer a relative measure of productivity (that it, it can identify the hemipelagic sediments in the sequence). Note the use of
abs() has the effect of ensuring that switching the denominator and numerator has no effect.
%>% CD166_19_xrf mutate(`log(Fe/Ca)` = abs(log(Fe/Ca))) %>% filter(qc == TRUE) %>% ggplot(aes(x = depth, y = `log(Fe/Ca)`)) + geom_line() + scale_x_reverse()
Sometimes z-scores are useful to use when units don’t have meaningful units. They scale the data in standard deviations from their mean, so give a useful quantity for the magnitude of change. For example:
%>% CD166_19_xrf mutate(`log(Fe/Ca)` = abs(log(Fe/Ca))) %>% mutate(zscore = (`log(Fe/Ca)` - mean(`log(Fe/Ca)`, na.rm = TRUE))/sd(`log(Fe/Ca)`, na.rm = TRUE)) %>% filter(qc == TRUE) %>% ggplot(aes(x = depth, y = zscore)) + geom_line() + scale_x_reverse() + ylab("log(Fe/Ca) [z-score]")
It is trivial to calculate element ratios, to the extent that these can often simply be calculated where they are required rather than saving them to memory. For example, if a plot of the Compton divided by the Rayleigh scatter was desired, there is no need to save the computed value to a new variable (e.g.
coh_inc <- df$Mo.coh/df$Mo.inc) — simply define the calculation during plotting. To calculate ratios for all elements at once, use
elementsList is a list of chemical elements extracted from
CD166_19_xrf %>% mutate(across(any_of(elementsList)) /`Mo inc`)
This can be integrated into the work flow from the data tidying chapter, for example:
%>% CD166_19_xrf # transform mutate(across(any_of(elementsList)) /`Mo inc`) %>% # identify acceptable observations filter(validity == TRUE) %>% # identify acceptable variables select(any_of(myElements), depth, label) %>% # pivot ::pivot_longer(!c("depth", "label"), names_to = "elements", values_to = "peakarea") %>% tidyrmutate(elements = factor(elements, levels = c(elementsList, "coh/inc"))) %>% # plot ggplot(aes(x = peakarea, y = depth)) + ::geom_lineh(aes(color = label)) + tidypaleoscale_y_reverse() + scale_x_continuous(n.breaks = 2) + facet_geochem_gridh(vars(elements)) + labs(x = "peak area / Mo. inc.", y = "Depth [mm]") + ::theme_paleo() + tidypaleotheme(legend.position = "none")
Note that where zero values are encountered in divisions, dividing by zero will lead to
Inf values. This can cause issues with plotting data, and these values should be cleaned into
NA before plotting. For example:
ggarrange( %>% CD166_19_xrf mutate(`Fe/Sc` = Fe/Sc) %>% ggplot(aes(x = depth, y = `Fe/Sc`)) + geom_line() + scale_x_reverse() + ggtitle("w/ Inf."), %>% CD166_19_xrf mutate(`Fe/Sc` = Fe/Sc) %>% mutate_if(is.numeric, list(~na_if(., Inf))) %>% # convert all Inf to NA ggplot(aes(x = depth, y = `Fe/Sc`)) + geom_line() + scale_x_reverse() + ggtitle("na_if(., Inf)") )