# Importing filtered genotypic data
<- read.table("data/FilteredBarley.txt", sep = "\t", header = TRUE,
matrix row.names = 1, check.names = FALSE)
# SNP matrix has to have individuals in rows and markers as columns for the
# posterior functions
<- t(matrix)
matrix
# Importing metadata
<- read_excel("data/BarleyMetadata.xlsx")
metadata # Ensuring IDs match
<- metadata[metadata$Individual %in% rownames(matrix),] metadata
15 Module 4.2: Genetic Diversity
SNP data provides us with a genome-wide view of variation within individuals and populations. Calculating certain diversity parameters from this data helps us better understand the genetic diversity held within a population and between its subpopulations. These can later be used for diversity-based plant breeding.
15.1 Diversity parameters
We can easily calculate the most relevant diversity parameters using the genDivSNPReady()
from our package.
genDivSNPReady(geno, plots = FALSE, interactive = TRUE)
: Returns diversity parameters calculated with snpReady
package.
geno
: our genotype matrixplots
: defaults to FALSE, if TRUE, a graphical output of the results is producedinteractive
: defaults to TRUE, produces a dynamic table, if FALSE, output remains as data frame
The function returns a list object with two data frames, one with the diversity parameters for each marker, and one with the diversity parameters for each accession. If plots = TRUE
, a third object is generated with the different plots.
# Obtaining genetic diversity parameters
<- genDivSNPReady(matrix, plots = TRUE)
SNPReadyParams
# Printing marker diversity parameters
$markers SNPReadyParams
# Printing individual's diversity parameters
$accessions SNPReadyParams
# Diversity plots
$plots SNPReadyParams
15.2 By population
These parameters can also be calculated by population in order to compare the diversity between them. We will use the HeBySubgroups()
function from our package.
HeBySubgroups(geno, subgroups, plot = FALSE, interactive = TRUE)
: returns expected heterozygosity (He) by groups, including an optional plot
geno
: our genotype matrixsubgroups
: a vector with our factor informationplots
: defaults to FALSE, if TRUE, a graphical output of the results is producedinteractive
: defaults to TRUE, produces a dynamic table, if FALSE, output remains as data frame
# Defining our populations from country information
<- as.factor(metadata$countryOfOriginCode[metadata$Individual
popSet %in% rownames(matrix)])
# Calculating parameters by population
<- HeBySubgroups(matrix, popSet, plot = TRUE)
He
# Plotting results
$plot He
# Printing results
$df He
15.3 AMOVA
AMOVA or Analysis of Molecular Variance can be run from a genetic distance matrix to evaluate genetic variation within populations, between populations and among populations. It helps us understand the structure of variation in our sample. We will be using the genDistPop()
and AMOVA()
functions from our package for this. They use frameworks from adegenet
and poppr
to carry out the AMOVA.
genDistPop(geno, subgroups, method = 1, PCoA = FALSE)
: returns a genetic distance matrix and optional Principal Coordinate Analysis from the distance matrix.
geno
: our genotype matrixsubgroups
: a vector with our factor informationmethod
: defaults to 1 (Nei’s distance), allows for values 1-5 (Nei, Edwards, Reynolds, Rogers, Provesti)PCoA
: defaults to FALSE, if TRUE, performs a principal coordinates analysis of a Euclidean distance matrix
AMOVA()
:
geno
: our genotype matrixsubgroups
: a vector with our factor information
# Calculating our genetic distance matrix and PCoA
<- genDistPop(matrix, popSet, PCoA = TRUE) genDist
Converting data from a genind to a genpop object...
...done.
# Printing results
genDist
$genDist
CHN ETH TUR
CHN 0.0000000
ETH 0.1785492 0.0000000
TUR 0.1075913 0.1388646 0.0000000
$PCoA
Duality diagramm
class: pco dudi
$call: dudi.pco(d = genDist, scannf = FALSE, nf = 3)
$nf: 2 axis-components saved
$rank: 2
eigen values: 0.005458 0.001513
vector length mode content
1 $cw 2 numeric column weights
2 $lw 3 numeric row weights
3 $eig 2 numeric eigen values
data.frame nrow ncol content
1 $tab 3 2 modified array
2 $li 3 2 row coordinates
3 $l1 3 2 row normed scores
4 $co 2 2 column coordinates
5 $c1 2 2 column normed scores
other elements: NULL
$PCoAPlot
# PCoA
$PCoAPlot genDist
# Running AMOVA
<- AMOVA(matrix, popSet) amovaResult
Replaced 248144 missing values.
No missing values detected.
# Printing results
amovaResult
$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
$results
Df Sum Sq Mean Sq
Between samples 2 92405.09 46202.5455
Within samples 485 346712.00 714.8701
Total 487 439117.09 901.6778
$componentsofcovariance
Sigma %
Variations Between samples 291.6988 28.97952
Variations Within samples 714.8701 71.02048
Total variations 1006.5689 100.00000
$statphi
Phi
Phi-samples-total 0.2897952