## 5.1 Ratios and Normalisation

These data are compositional, and represent the changes in the relative proportions of all components of the matrix, measured and un-measured. As such it is likely the data will need transforming for certain types of multivariate analysis. As previously mentioned, these data are dimensionless, and as such do not represent a quantity, but are directly related to the absolute amount of a particular element in the matrix. It is often the case that ratios of elements are used to represent changes in composition — this is sometimes referred to as normalisation, or normalising one element against another.

For example, particular element ratios can be used to make some environmental inference. Log-ratios are usually preferable here as they are resilient to matrix effects. In this context, the Ca/Ti log-ratio is used to infer a relative measure of productivity (that it, it can identify the hemipelagic sediments in the sequence). Note the use of `abs()`

has the effect of ensuring that switching the denominator and numerator has no effect.

```
CD166_19_xrf %>%
mutate(`log(Fe/Ca)` = abs(log(Fe/Ca))) %>%
filter(qc == TRUE) %>%
ggplot(aes(x = depth, y = `log(Fe/Ca)`)) +
geom_line() +
scale_x_reverse()
```

Sometimes z-scores are useful to use when units don’t have meaningful units. They scale the data in standard deviations from their mean, so give a useful quantity for the magnitude of change. For example:

```
CD166_19_xrf %>%
mutate(`log(Fe/Ca)` = abs(log(Fe/Ca))) %>%
mutate(zscore = (`log(Fe/Ca)` - mean(`log(Fe/Ca)`, na.rm = TRUE))/sd(`log(Fe/Ca)`, na.rm = TRUE)) %>%
filter(qc == TRUE) %>%
ggplot(aes(x = depth, y = zscore)) +
geom_line() +
scale_x_reverse() +
ylab("log(Fe/Ca) [z-score]")
```

It is trivial to calculate element ratios, to the extent that these can often simply be calculated where they are required rather than saving them to memory. For example, if a plot of the Compton divided by the Rayleigh scatter was desired, there is no need to save the computed value to a new variable (e.g. `coh_inc <- df$Mo.coh/df$Mo.inc`

) — simply define the calculation during plotting. To calculate ratios for all elements at once, use `mutate(across(any_ofelementsList))`

, where `elementsList`

is a list of chemical elements extracted from `data(PeriodicTable)`

.

```
CD166_19_xrf %>%
mutate(across(any_of(elementsList)) /`Mo inc`)
```

This can be integrated into the work flow from the data tidying chapter, for example:

```
CD166_19_xrf %>%
# transform
mutate(across(any_of(elementsList)) /`Mo inc`) %>%
# identify acceptable observations
filter(validity == TRUE) %>%
# identify acceptable variables
select(any_of(myElements), depth, label) %>%
# pivot
tidyr::pivot_longer(!c("depth", "label"), names_to = "elements", values_to = "peakarea") %>%
mutate(elements = factor(elements, levels = c(elementsList, "coh/inc"))) %>%
# plot
ggplot(aes(x = peakarea, y = depth)) +
tidypaleo::geom_lineh(aes(color = label)) +
scale_y_reverse() +
scale_x_continuous(n.breaks = 2) +
facet_geochem_gridh(vars(elements)) +
labs(x = "peak area / Mo. inc.", y = "Depth [mm]") +
tidypaleo::theme_paleo() +
theme(legend.position = "none")
```

Note that where zero values are encountered in divisions, dividing by zero will lead to `Inf`

values. This can cause issues with plotting data, and these values should be cleaned into `NA`

before plotting. For example:

```
ggarrange(
CD166_19_xrf %>%
mutate(`Fe/Sc` = Fe/Sc) %>%
ggplot(aes(x = depth, y = `Fe/Sc`)) +
geom_line() +
scale_x_reverse() +
ggtitle("w/ Inf."),
CD166_19_xrf %>%
mutate(`Fe/Sc` = Fe/Sc) %>%
mutate_if(is.numeric, list(~na_if(., Inf))) %>% # convert all Inf to NA
ggplot(aes(x = depth, y = `Fe/Sc`)) +
geom_line() +
scale_x_reverse() +
ggtitle("na_if(., Inf)")
)
```