15 Module 4.2: Genetic Diversity

SNP data provides us with a genome-wide view of variation within individuals and populations. Calculating certain diversity parameters from this data helps us better understand the genetic diversity held within a population and between its subpopulations. These can later be used for diversity-based plant breeding.

15.1 Diversity parameters

We can easily calculate the most relevant diversity parameters using the genDivSNPReady() from our package.

genDivSNPReady(geno, plots = FALSE, interactive = TRUE): Returns diversity parameters calculated with snpReady package.

geno: our genotype matrix
plots: defaults to FALSE, if TRUE, a graphical output of the results is produced
interactive: defaults to TRUE, produces a dynamic table, if FALSE, output remains as data frame

The function returns a list object with two data frames, one with the diversity parameters for each marker, and one with the diversity parameters for each accession. If plots = TRUE, a third object is generated with the different plots.

# Importing filtered genotypic data
matrix <- read.table("data/FilteredBarley.txt", sep = "\t", header = TRUE, 
                     row.names = 1, check.names = FALSE)
# SNP matrix has to have individuals in rows and markers as columns for the 
# posterior functions
matrix <- t(matrix)

# Importing metadata
metadata <- read_excel("data/BarleyMetadata.xlsx")
# Ensuring IDs match
metadata <- metadata[metadata$Individual %in% rownames(matrix),]

# Obtaining genetic diversity parameters
SNPReadyParams <- genDivSNPReady(matrix, plots = TRUE)

# Printing marker diversity parameters
SNPReadyParams$markers

# Printing individual's diversity parameters
SNPReadyParams$accessions

# Diversity plots
SNPReadyParams$plots

15.2 By population

These parameters can also be calculated by population in order to compare the diversity between them. We will use the HeBySubgroups() function from our package.

HeBySubgroups(geno, subgroups, plot = FALSE, interactive = TRUE): returns expected heterozygosity (He) by groups, including an optional plot

geno: our genotype matrix
subgroups: a vector with our factor information
plots: defaults to FALSE, if TRUE, a graphical output of the results is produced
interactive: defaults to TRUE, produces a dynamic table, if FALSE, output remains as data frame

# Defining our populations from country information
popSet <- as.factor(metadata$countryOfOriginCode[metadata$Individual 
                                                 %in% rownames(matrix)])

# Calculating parameters by population
He <- HeBySubgroups(matrix, popSet, plot = TRUE)

# Plotting results
He$plot

# Printing results
He$df

15.3 AMOVA

AMOVA or Analysis of Molecular Variance can be run from a genetic distance matrix to evaluate genetic variation within populations, between populations and among populations. It helps us understand the structure of variation in our sample. We will be using the genDistPop() and AMOVA() functions from our package for this. They use frameworks from adegenet and poppr to carry out the AMOVA.

genDistPop(geno, subgroups, method = 1, PCoA = FALSE): returns a genetic distance matrix and optional Principal Coordinate Analysis from the distance matrix.

geno: our genotype matrix
subgroups: a vector with our factor information
method: defaults to 1 (Nei’s distance), allows for values 1-5 (Nei, Edwards, Reynolds, Rogers, Provesti)
PCoA: defaults to FALSE, if TRUE, performs a principal coordinates analysis of a Euclidean distance matrix

AMOVA():

geno: our genotype matrix
subgroups: a vector with our factor information

# Calculating our genetic distance matrix and PCoA
genDist <- genDistPop(matrix, popSet, PCoA = TRUE)


 Converting data from a genind to a genpop object... 

...done.

# Printing results
genDist

$genDist
          CHN       ETH       TUR
CHN 0.0000000                    
ETH 0.1785492 0.0000000          
TUR 0.1075913 0.1388646 0.0000000

$PCoA
Duality diagramm
class: pco dudi
$call: dudi.pco(d = genDist, scannf = FALSE, nf = 3)

$nf: 2 axis-components saved
$rank: 2
eigen values: 0.005458 0.001513
  vector length mode    content       
1 $cw    2      numeric column weights
2 $lw    3      numeric row weights   
3 $eig   2      numeric eigen values  

  data.frame nrow ncol content             
1 $tab       3    2    modified array      
2 $li        3    2    row coordinates     
3 $l1        3    2    row normed scores   
4 $co        2    2    column coordinates  
5 $c1        2    2    column normed scores
other elements: NULL

$PCoAPlot

# PCoA
genDist$PCoAPlot

# Running AMOVA
amovaResult <- AMOVA(matrix, popSet)


 Replaced 248144 missing values.


 No missing values detected.

# Printing results
amovaResult

$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                 Df    Sum Sq    Mean Sq
Between samples   2  92405.09 46202.5455
Within samples  485 346712.00   714.8701
Total           487 439117.09   901.6778

$componentsofcovariance
                                Sigma         %
Variations  Between samples  291.6988  28.97952
Variations  Within samples   714.8701  71.02048
Total variations            1006.5689 100.00000

$statphi
                        Phi
Phi-samples-total 0.2897952