assignPOP is an R package that helps perform population assignment using a machine-learning framework. It employs supervised machine-learning methods to evaluate the discriminatory power of your data collected from source populations, and is able to analyze large genetic, non-genetic, or integrated (genetic plus non-genetic) data sets. This framework is designed for solving the upward bias issue discussed in previous studies1, 2. Other features are listed as follows.

Travis-CI Build Status CRAN status GitHub release license


Conceptual Framework

The ability to assign individuals of unknown origins to source populations relies on a robust baseline (or training data) collected from the source populations. However, noise in training data or lack of distinct features could lower the assignment ability. It is important to evaluate baseline data via cross-validation before using the baseline to predict source populations of unknown individuals. The diagrams below illustrate how the assignPOP fits into such assignment framework, and the workflow for evaluting baseline data as well as predicting source populations of unknonw individuals.


Analytical Workflow


Resampling Cross-validation Workflow

The diagram below illustrates the workflow of resampling cross-validation, in which individuals from each population are divided into training and test sets, and assignment test repeats through resampling training individuals. Multiple proportions of training individuals by multiple proportions of training loci (when analyzing genetic data) can be specified in one single analysis (assign.MC() or assign.kfold()). For example, the diagram shows top 10% or 50% of high FST loci or all loci are used as training loci. This helps evaluate whether using top 10%, 50% of high FST loci or overall loci results in similar assignment accuracies. For more details, see Perform population assignment in the Data Analysis page.


Package Citation

Chen, K-Y, Marschall, E.A., Sovic, M.G., Fries, A.C., Gibbs, H.L., Ludsin, S.A. (2018). assignPOP: An R package forpopulation assignment using genetic, non-genetic, or integrated data in a machine-learning framework. Methods in Ecology and Evolution. 9:439–446. https://doi.org/10.1111/2041-210X.12897


References


  1. Anderson, E. C. (2010). Assessing the Power of Informative Subsets of Loci for Population Assignment: Standard Methods Are Upwardly Biased. Molecular Ecology Resources 10(4): 701–710.

  2. Waples, R. S. (2010). High-Grading Bias: Subtle Problems with Assessing Power of Selected Subsets of Loci for Population Assignment. Molecular Ecology 19(13): 2599–2601.