-
Notifications
You must be signed in to change notification settings - Fork 2
crossword manual
output_folder : output folder for all figures and files relative the working directory. output_folder = "crossword_output_01"
iterations : number of iterations. Each iteration will produce a separate subdirectory in the output folder named for the iteration number. For every iteration the QTN are reselected based on the paramters given below unless a QTN file is supplied. Iterations = 2
input_folder : input folder containing all files that will be used as an input, such as ‘chr_siz.txt’ below. The ‘system.file()’ function should be replaced when not using the example test data. If input data is spread across directories, a common directory can be specified and paths relative to that directory can be defined on a per file basis. input_folder=system.file("extdata",package="crossword")
im_type : the format of output images; the following can be used: svg (default), pdf, jpeg, png, tiff, bmp. PNG is generally the best choice for Windows. im_type="svg"
by_chromosomes : for graphing haplotypes, if TRUE, the graphs will be sorted by chromosomes, if FALSE, they will be sorted by individual. by_chromosomes = TRUE
heterozygous : for graphing haplotypes, if TRUE, homologus chromosomes will have different colors. This is applicable when crossing highly heterozygous parents. heterozygous = FALSE
gff : a gene annotation GFF file generally acquired from phytozome.org or a comparable data repository. Required only if you want recombination frequency to correlate with gene density or to select QTN based on gene density (see biased_selection below). Otherwise, supply a LOC file (see below), which will run much faster. gff = "peanut.gff"
chr_stat : a file containing the chromosomes lengths in bp. If not specified (‘NA’), this file is derived from GFF file using last gene on each chromosome. If no GFF file, each chromosome gets a length of 100 Mb. See default example file. chr_stat = "chr_siz.txt"
chr_length : a file containing the chromosomes length in cm. If not specified (‘NA’), all chromosomes get length of 100 cM. See default example file. chr_length = "chr_len.txt"
window_size : Sliding window sizes in bp for defining variable recombination frequencies. window_size = 100000
input : the file containing parental genotypes. It may be VCF or HAPMAP formats (ex: "parental_genotypes.hapmap" ). Names of accessions in this file MUST match those used as founders or parents in the parameter arguments used below. input = "peanut.vcf"
outcross : the frequency of outcrossing ("0" for full selfing, "1" for obligate outcrossing). outcross=0.1 [only used if Clevel is >0 in "advance" function]
input_loci : the file containing genetic distances between loci. If provided (ex: "parental_genotypes.loc"), it will be used instead of gff. The genetic position of every variant in the parental genotypes file should be supplied. (Otherwise, variants will be filtered to only the supplied set.) For uniform recombination, this LOC file would just be a simple conversion of the physical coordinate scales to the genetic distance via a constant bp/cM coefficient (ex., at 1,000,000bp per 1cM, 3,598,383 -> 3.598). input_loci = NA
homo : If TRUE, only homozygous loci will be kept for parental genotypes. Since heterozygous calls in a highly selfing crop are generally miscalls, this value can be set to ignore those. If using known heterozygous parents, this value should be set to FALSE. homo = TRUE
phenotyping_method : can be "QTN_random", "high_low_parents" or "QTN_supplied". 1) "QTN_random" - QTN are selected at random and alleles are assigned effects at random. (These can still be set to be biased toward genic regions, see below.) 2) "high_low_parents" - QTN are selected at random positions but allelic effects are assigned such that high parent alleles are always positive and low parent negative. Only loci polymorphic between the parents can be used. 3) "QTN_supplied" - a file must be given containing QTN information. See "qtn_effects.efc" for example. Note, VCF and supplied file must use the same SNP names. phenotyping_method = "QTN_random"
qtn : number of QTN. qtn = 20
h2 : expected heritability. See hertitability mode below. Heritability is applied on a per individual basis. If selection or phenotyping is performed at the family level, the "line-mean" heritability will be much larger. For very large families, the "line-mean" heritability will approach 1 regardless of the value of this parameter. h2 = 0.6
heritability_mode : heritability mode that will be used, "absolute" or "average". The supplied heritability is used to calculate the relative porportion of residual variance in a theorectical recombinant inbred population assuming free recombination between QTN, complete homozygousity, and very large population size. 1) "absolute" - all QTN are assumed to be segregating in the theorectical population. 2) "average" - The average number of expected QTN and effect sizes are evaluated for all possible crosses between founders. See crossword manuscript for more details. heritability_mode = "average"
effect_distribution : effect distributions, either "equal" or "gamma". effect_distribution = "equal"
min_qtn_freq : minimum QTN frequency for random selected QTNs, based on perental genotypes. min_qtn_freq = 0
highest_P : highest parent in case of "high_low_parents" phenotyping mode. highest_P=NA
lowest_P : lowest parent in case of "high_low_parents" phenotyping mode. lowest_P=NA
high_to_low_percentage : the percentage of QTN for which the lower parent has the positive effect allele. high_to_low_percentage=0
input_effects : user-supplied file only used with "QTN_supplied" method. input_effects = "qtn_effects.efc"
biased_selection: If TRUE, selected QTNs will be biased for the regions with high gene density. biased_selection = TRUE
dominant : specifies that all loci are dominant. If set to ‘FALSE’, additive genetic variance is assumed. Partial dominance is not suppored. dominant = FALSE
population : A collection of individuals that are defined by their relatedness to one another and by their haplotype structure relative to a set of founder lines. Such populations have four levels: 1) “population” - the total population; 2) “cross” – cross=1 if only one cross was done to create population, otherwise 1 through n if there are multiple crosses that share a parent; 3) “family” – set of individuals that were selfed at a particular stage of advancement; 4) “individual” – each individual in a population.
genotypes : A list of individuals and their genotypes that are connected through a shared ID with individuals in a population.
phenotypes : A list of individuals and their phenotypes that are connected through a shared ID with individuals in a population.
Description: performs a cross between parents. The cross can be one parent by one parent or one parent by a list of parents.
Returns: population
Parameters:
P1: the first parent to be crossed.
P2: the second parent/parents to be crossed.
N: number of individual per crosses
Example: pop1 = cross(P1="grg",P2=list("tr","tg"),N=10) #note: “list” function must be used.
Description: advances the haplotypes to successive generations. The population size is taken from the supplied population and remains constant during advancement.
Returns: population
Parameters:
population: population to be advanced
F: number of generations to advance to. NOTE: This value must be set to 2 or greater. In effect, the initial result of a cross is considered the F1, and the F parameter is advancing the population to the nth generation.
level: This parameter defines how a genotype is advanced. "Individual" will assure each individual is represented in the next generation. Use “individual” for single seed descent. The other values allow for population drift through random selection at the specified level: “family”, “cross” or “population” for bulk advancement within the families, crosses or the whole population, respectively.
Clevel: This parameter is used to define outcrossing barriers. “Individual” for selfing population; “family”, “cross” or “population” for out crosses across the families, crosses or the whole population, respectively. Outcrossing can occur at all levels lower than the supplied level as well. For example, some outcrossing will occur at “family” level if clevel = “population”.
Example: pop2 = advance(pop = pop1,F=5,level="individual",clevel="individual")
Description: creates families from each individual in a population. Individuals are selfed and the resulting progeny can be selected or advanced further at the "family" level (or any other level).
Returns: population
Parameters:
pop: population to create families from
S: number of individuals per family
is_clone: create families as clones. In contrast with dh function below, this option will preserve heterozygosity.
Example: pop3 = create_families(pop=pop2,S=5)
Description: performs selection on a population based on either supplied phenotypes or default calculations. Four files containing phenotypes and true genetic values for individuals and level averages are produced as side-effects of running the function.
Returns: population. Resultant population will be sorted such that highest elements at each level will be "1" and lowest elements will be "n" ("n" being the total elements at that level). Crosses and families are sorted by the average value across all individuals in that particular cross or family.
Parameters:
pop: population to select from
N: absolute number (ex. “5”) or percentage (ex. “10%”) of selected individuals/families/crosses. Percentage sign must be used for percent.
level: the level at which to evaluate for selection and retain: individuals, families, crosses, or across entire population.
method: the selection method: "top", "bottom" or "random". Will be based on numerical value of phenotype.
phenotype: an optional input phenotype. If NA, the phenotypes will be calculated. See "genomic_prediction".
Example: pop6 = select(pop = pop3,N = 3, level = "individual",method="top")
Note, to get side-effect files containing phenotypes for all lines, run: Example: popY = select(pop = popX, N = 100%, level = "individual", method="top")
Description: performs a double haplodization of each individual in a population.
Returns: population
Parameters:
pop: a population to be double haploidized.
Example: pop10 = dh(pop=pop1)
Description: performs selection based on Marker Assistant Selection.
Returns: population
Parameters:
pop: population to select from
marker: a list of markers containing the marker id, the allele and frequency threshold, ex. [("Aradu.A01_8315233","A",0.5),("Araip.B10_125281670","C",0.5)]. If the threshold is 0.5,heterozygous or better will be selected.
level: the level to be select from, “individual”, “family”, or “cross”. For example, if “family” is used, the allele frequency is assessed across each family, and only families having the appropriate threshold will be selected. In this example, all individuals within a family will be selected regardless of their individual status. Such behavior replicates pooled genotyping decisions.\
Example: pop3_b = mas(pop = pop3,marker = [("Aradu.A01_8315233","A",0.5),("Araip.B10_125281670","C",0.5)], level = "family")
Description: creates a population from the selected parents. This function simply bundles individuals into a population that can then be used by other functions.
Returns: population
Parameters:
P: a list of parents.
Example: pop10 = create_population(P=list("tg","tr","grg"))
Description: picks an individual from a selection based on the selection criteria. As opposed to the "select" function, the pick_individual function allows granular selections and is useful for populations that have resulted from selection. In such cases, each level will be sorted based on its average phenotype, see example. See "select" function. Generally, this function would only be used for picking parents in a second round of crossing.
Returns: an individual
Parameters:
pop: population name
cross: the cross to be select.
family: the family to be select.
individual: the individual to be selected.
Example: topInd_Fam2 = pick_individual(pop=pop10,cross=NA,family=2,individual=1) # this would select the top individual from the second best family in the population.
Description: exports a population to an output file. This is useful for stopping a simulation and restarting it with its original haplotype structure intact. For example, this function would be used if one wanted to used an external genomic prediction algorithm prior to selection.
Returns: NA
Parameters:
pop: the name of the population to be exported.
output: the output file will be exported to the directory supplied in header and named with the value given
Example: haplotypes_out(pop = pop2,output = "haplo1_out")
Description: imports a population from an input file.
Returns: population
Parameters:
input: the input population file as produced by haplotype_out.
Example: pop5 = haplotypes_in(input = "haplo1_out")
Description: creates a graph of each chromosome based on parental contributions. The graph will be exported to directory supplied in header. See header for additional parameters controlling output.
Returns: NA
Parameters:
haplotypes: the population to be drawn.
Example: draw_haplotypes(haplotypes=pop2)
Description: creates a pie graph of each individual based on parental contributions. See header for additional parameters controlling output.
Returns: NA
Parameters:
pop: the population to be drawn.
Example: draw_population(pop = pop3)
### get_phenotypeDescription: calculates the phenotypes of all individuals in a population. Unlike "select" function, "get_phenotype" does not sort at any level.
Returns: phenotype object that contains both phenotypes with environmental variation and true genetic values
Parameters:
pop: population\
Example: pheno2 = get_phenotype(pop = pop2)
THIS FUNCTION IS BEING DEPRECIATED, 'select' will produce this output for all lines as a side-effect if run as follows:
popY = select(pop = popX, N = 100%, level = "individual", method="top")
Description: exports phenotypes to an output file. Automatically exports phenotypic value and true genetic value. See header for additional parameters controlling output.
Returns: NA
Parameters:
pheno: the phenotypes variable to be exported.
file: output file.\
Example: phenotype_out(pheno = pheno3, file = "pheno3_predicted")
Description: exports a genotype to an output file in a hapmap format.
Returns: NA
Parameters:
genotypes: the genotypes variable to be exported.
output: an output file.
level: the level at which the population will be genotyped: individuals, families, crosses, or population. If the major allele frequency is >90%, it is called homozygous, else heterozygous.
pop: the input population.
A_parent: the parent AA. For input in r/qtl.
B_parent: the parent BB. For input in r/qtl.
Example: genotypes_out(genotypes=ms3,output="pop4_marker_set")
Description: combines a list of populations
Returns: population
Parameters:
pops: list of populations
Example: pop9 = combine_populations(list(pop1,pop2))
Description: calculates the phenotypes of a population based on genotypes and phenotypes of a training population. Phenotypic predictions can then be supplied to the "select" function, which will use predictions for selection instead of simulating them directly.
Returns: phenotypes
Parameters:
train_geno: genotypes to use for training
train_pheno: phenotypes to use for training
predict_geno_pop: population to be predicted
method: Currently, “rrBLUP” is available. “rrBLUP” is ridge regression BLUP model implemented in package “rrBLUP”.
level: Level at which to train the model. Generally, "family" or "individual" will be used, but “cross” or “population” are also possible.
pop: the pop which was used to create the training sets.\
Example: pheno3 = genomic_prediction(train_geno=ms2,train_pheno=pheno2,predict_geno_pop=ms3,method="rrBLUP", level = “cross”, pop = pop3)
Description: retrieves the genotypes of a population either for all polymorphisms or a subset. Resultant genotypes can be given to "genotypes_out", or they can be used later in the simulation to get genotypes for the same set of markers in another population in the same simulation.
Returns: genotypes
Parameters:
pop: population
pre-selected_markers: a genotype variable or a file of pre-selected markers to be selected from the population.
Example: ms1 = get_genotypes(pop = pop3)
Description: creates marker sub sets from genotpe object of N number of random marker with a certain allele frequencey cutoff, MAF
Returns: genotypes
Parameters:
genotypes: genotypes
N: Number of markers to be selected
MAF: Minor allele frequency
no_qtn: filtering out the pre-selected QTNs
biased_selection: if TRUE, selection is biased toward gene high density regions.
Example: ms2 = create_marker_set(pop = pop3, N=200, MAF=0.1)
ms3 = get_genotypes(pop = pop3, ms2)
Description: the tool counts sequence length from each entry in a fasta file and produced a tab-delimited file usable by crossword.
Returns: NA
Parameters:
fa: the genome file in fasta format
output: an output file
Description: the tool converts vcf files to hapmap format.
Returns: NA
Parameters:
input: an input vcf file
output: name of output hapmap file
Description: this tool creates simulated genomes based on genotypes. If the user supplies genotypes for which a subset of variants have been chosen as markers, ONLY those markers will be converted in the resultant genomes. Therefore, generally, only variant data from resequencing should be used in simulations that create full genome sequences.
Returns: NA
Parameters:
fa: the genome file
genotype: genotypes to be simulated
output: an output directory
haploid_only: only produce a haploid consensus sequence if TRUE; otherwise the two sets of chromosomes will be simulated. FALSE
max_total_size_in_gb: maximum total size available for the process on the disk in giga bytes
Description: the tool creates a simulated paired_end fastq files based on genotypes. If the user supplies genotypes for which a subset of variants have been chosen as markers, ONLY those markers will be converted in the resultant genomes. Therefore, generally, only variant data from resequencing should be used in simulations that create full genome sequences.
Returns: NA
Parameters:
single_reads: TRUE if single-end reads need to be simulated. By default, paired-end reads are simulated.
input: reference genome
genotypes: genotypes to be simulated
read_len: read length
fold: sequencing depth, the value is divided by number of individual in the genotypes file, ex. 100 value on a file of 50 individual creates 2X sequencing depth/individual.
art_binary_location: the location of ART software if the user wants to change the default one
Description: the tool filters the input hapmap or vcf file to selected parents in a hapmap format.
Returns: NA
Parameters:
input: an input vcf/hapmap file
output: an output hapmap file
selected_parents_file: a file containing a simple list of one parent per line.