Provenance studies rely on the identification of probable sources, such that the variability between two sources is greater than the internal variability of a single source (the so-called provenance postulate; Weigand, Harbottle, and Sayre 1977). This assumes that a unique signature can be identified for each source on the basis of several criteria.
nexus is designed for chemical fingerprinting and source tracking of ancient materials. It provides provides tools for exploration and analysis of compositional data in the framework of Aitchison (1986). If you are unfamiliar with the concepts and challenges of compositional data analysis, the following publications are a good place to start:
- Egozcue, J. J., Gozzi, C., Buccianti, A. & Pawlowsky-Glahn, V. (2024). Exploring Geochemical Data Using Compositional Techniques: A Practical Guide. Journal of Geochemical Exploration, 258 :107385. DOI: 10.1016/j.gexplo.2024.107385.
- Greenacre, M. & Wood, J. R. (2024). A Comprehensive Workflow for Compositional Data Analysis in Archaeometry, with Code in R. Archaeological and Anthropological Science.
- Grunsky, E., Greenacre, M. & Kjarsgaard, B. (2024). GeoCoDA: Recognizing and Validating Structural Processes in Geochemical Data. A Workflow on Compositional Data Analysis in Lithogeochemistry. Applied Computing and Geosciences, 22: 100149. DOI: 10.1016/j.acags.2023.100149.
Get started
## Install extra packages (if needed)
# install.packages("folio")
library(nexus)
#> Loading required package: dimensio
nexus provides a set of S4 classes that represent different special types of matrix. The most basic class represents a compositional data matrix, i.e. quantitative (positive) descriptions of the parts of some whole, carrying relative, rather than absolute, information (Aitchison 1986; Greenacre 2021).
It assumes that you keep your data tidy: each variable must be saved in its own column and each observation (sample) must be saved in its own row.
This class is of simple use as it inherits from base
matrix
:
## Mineral compositions of rock specimens
## Data from Aitchison 1986
data("hongite")
head(hongite)
#> A B C D E
#> H1 48.8 31.7 3.8 6.4 9.3
#> H2 48.2 23.8 9.0 9.2 9.8
#> H3 37.0 9.1 34.2 9.5 10.2
#> H4 50.9 23.8 7.2 10.1 8.0
#> H5 44.2 38.3 2.9 7.7 6.9
#> H6 52.3 26.2 4.2 12.5 4.8
## Coerce to compositional data
coda <- as_composition(hongite)
head(coda)
#> <CompositionMatrix: 6 x 5>
#> A B C D E
#> H1 0.488 0.317 0.038 0.064 0.093
#> H2 0.482 0.238 0.090 0.092 0.098
#> H3 0.370 0.091 0.342 0.095 0.102
#> H4 0.509 0.238 0.072 0.101 0.080
#> H5 0.442 0.383 0.029 0.077 0.069
#> H6 0.523 0.262 0.042 0.125 0.048
A CompositionMatrix
represents a closed
composition matrix: each row of the matrix sum up to 1 (only relative
changes are relevant in compositional data analysis).
The original row sums are kept internally, so that the source data can be restored:
## Coerce to count data
counts <- as_amounts(coda)
all.equal(hongite, as.data.frame(counts))
#> [1] TRUE
The parts
argument of the function
as_composition()
is used to define the columns to be used
as the compositional part. If parts
is NULL
(the default), all non-integer numeric columns (i.e. of type
double
) are used. In the case of a data.frame
coercion, additional columns are removed.
## Create a data.frame
X <- data.frame(
type = c("A", "A", "B", "A", "B", "C", "C", "C", "B"),
Ca = c(7.72, 7.32, 3.11, 7.19, 7.41, 5, 4.18, 1, 4.51),
Fe = c(6.12, 5.88, 5.12, 6.18, 6.02, 7.14, 5.25, 5.28, 5.72),
Na = c(0.97, 1.59, 1.25, 0.86, 0.76, 0.51, 0.75, 0.52, 0.56)
)
## Coerce to a compositional matrix
## (the 'type' column will be removed)
Y <- as_composition(X)
Working with (reference) groups
Provenance studies typically rely on two approaches, which can be used together:
- Identification of groups among the artifacts being studied, based on mineralogical or geochemical criteria (clustering).
- Comparison with so-called reference groups, i.e. known geological sources or archaeological contexts (classification).
When coercing a data.frame
to a
CompositionMatrix
object, nexus allows to
specify whether an observation belongs to a specific group (or not):
## Data from Wood and Liu 2023
data("bronze", package = "folio")
## Use the third column (dynasties) for grouping
coda <- as_composition(bronze, groups = 3)
groups(x)
and groups(x) <- value
allow
to retrieve or set groups of an existing CompositionMatrix
.
Missing values (NA
) or empty strings can be used to specify
that a sample does not belong to any group.
Once groups have been defined, they can be used by further methods
(e.g. plotting). Note that for better readability, you can
select
only some of the parts (e.g. major elements):
## Compositional bar plot
barplot(coda, select = is_element_major(coda), order_rows = "Cu", space = 0)
Log-ratio transformations
The package provides the following (inverse) transformations: centered log ratio (CLR, Aitchison 1986), additive log ratio (ALR, Aitchison 1986), isometric log ratio (ILR, Egozcue et al. 2003) and pivot log-ratio (PLR, Hron et al. 2017).
## CLR
clr <- transform_clr(coda)
## Back transform
back <- transform_inverse(clr)
all.equal(back, coda)
#> [1] TRUE
Multivariate Analysis
## Assume that about a third of the samples does not belong to any group
grp <- groups(coda)
grp[sample(length(grp), size = 100)] <- NA
## Set groups
groups(coda) <- grp
## Retrieve groups
# groups(coda)
Log-Ratio Analysis
## CLR
clr <- transform_clr(coda, weights = TRUE)
## PCA
lra <- pca(clr)
## Visualize results
viz_individuals(lra, color = c("#004488", "#DDAA33", "#BB5566"))
viz_hull(x = lra, border = c("#004488", "#DDAA33", "#BB5566"))
viz_variables(lra)
MANOVA
The log-transformed data can be assigned to a new column, allowing us
to keep working with the data in the context of the original
data.frame
:
## ILR
ilr <- transform_ilr(coda)
bronze$ilr <- ilr
## MANOVA
fit <- manova(ilr ~ groups(ilr), data = bronze)
summary(fit)
#> Df Pillai approx F num Df den Df Pr(>F)
#> groups(ilr) 2 0.45616 11.017 14 522 < 2.2e-16 ***
#> Residuals 266
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The MANOVA results suggest that there are statistically significant differences between groups.
Linear Discriminant Analysis
## Back transform results
transform_inverse(discr$means, origin = ilr)
#> Cu Sn Pb Zn Au
#> Shang 0.8479508 0.09551869 0.05401428 7.745021e-05 1.183186e-05
#> Western Zhou 0.8619662 0.10585629 0.02860148 8.775183e-05 2.704117e-05
#> Eastern Zhou 0.7631232 0.10283111 0.12774647 5.505982e-05 2.803980e-05
#> Ag As Sb
#> Shang 0.0006701310 0.001512276 0.0002445183
#> Western Zhou 0.0007847595 0.002181024 0.0004954170
#> Eastern Zhou 0.0013095426 0.003868748 0.0010378138
References
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Londres, UK ; New York, USA: Chapman and Hall.
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barceló-Vidal, C. (2003). Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology, 35(3): 279-300. DOI: 10.1023/A:1023818214614.
Greenacre, M. (2021). Compositional Data Analysis. Annual Review of Statistics and Its Application, 8(1): 271-299. DOI: 10.1146/annurev-statistics-042720-124436.
Hron, K., Filzmoser, P., de Caritat, P., Fišerová, E. and Gardlo, A. (2017). Weighted Pivot Coordinates for Compositional Data and Their Application to Geochemical Mapping. Mathematical Geosciences, 49(6): 797-814. DOI : 10.1007/s11004-017-9684-z.
Weigand, P. C., Harbottle, G. and Sayre, E. (1977). Turquoise Sources and Source Analysisis: Mesoamerica and the Southwestern U.S.A. In J. Ericson & T. K. Earle (Eds.), Exchange Systems in Prehistory, 15-34. New York, NY: Academic Press.