-
Notifications
You must be signed in to change notification settings - Fork 8
User Preference
Nadia Tahiri, PhD edited this page Nov 14, 2023
·
8 revisions
The aPhyloGeo
software can be encapsulated in other applications and applied to other data by providing a YAML file. This file will include a set of parameters for easy handling.
file_name: './datasets/example/geo.csv'
specimen: 'id'
names: ['id', 'ALLSKY_SFC_SW_DWN', 'T2M', 'PRECTOTCORR', 'QV2M', 'WS10M']
bootstrap_threshold: 0
dist_threshold: 60
window_size: 200
step_size: 100
bootstrap_amount: 100
data_names: ['ALLSKY_SFC_SW_DWN_newick', 'T2M_newick', 'QV2M_newick', 'PRECTOTCORR_newick', 'WS10M_newick']
reference_gene_dir: './datasets/example'
reference_gene_file: 'sequences.fasta'
makeDebugFiles: True
alignment_method: '1' # 1:pairwiseAligner, 2:MUSCLE, 3:CLUSTALW, 4:MAFFT
distance_method: '1' # 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)
fit_method: '1' # 1:Wider Fit by elongating with Gap (starAlignment), 2:Narrow-fit prevent elongation with gap when possible
tree_type: '1' # 1: BioPython consensus tree, 2: FastTree application
rate_similarity: 90
method_similarity: '1' # 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity
There are 11 main options accessible to the user in the YAML configuration file:
- Bootstrap Threshold: Number of replicates threshold to be generated for each sub-MSA (each position of the sliding window)
- Distance Threshold: Distance threshold between genetic tree and climatic tree for each sub-MSA (each position of the sliding window)
- Window Length: Size of the sliding window
- Step: Sliding window advancement step
-
Distance Choice: Distance selection
- '0' for all distances (options '1', '2', and '3')
- '1' for Least Square (LS) distance (version 1.0)
- '2' for Robinson and Foulds (RF) distance (+ normalization
$2n-6$ with$n$ is the number of leaves on each tree) - '3' for Euclidean distance
- Distance Threshold: LS distance threshold at which the results are most significant
-
Alignment Method: Algorithm selection for sequence alignment
- '1' for pairwiseAligner
- '2' for MUSCLE
- '3' for CLUSTALW
- '4' for MAFFT
-
Fit Method: Gap selection elongation
- '1' for Wider Fit by elongating with Gap (starAlignment)
- '2' for Narrow-fit prevent elongation with gap when possible
-
Tree Inference Method: The choice of inference methods
- '1' for BioPython consensus tree
- '2' for FastTree application
- Rate Similarity: The rate similarity between sequences to reduce and remove the sub-MSA with a high value of similarity
-
Method Similarity: The choice of similarity methods
- '1' for Hamming distance
- '2' for Levenshtein distance
- '3' for Damerau-Levenshtein distance
- '4' for Jaro similarity
- '5' for Jaro-Winkler similarity
- '6' for Smith–Waterman similarity
- '7' for Jaccard similarity
- '8' for Sørensen-Dice similarity
Please email us at: [email protected] for any questions or feedback.
Wiki
Available analyses
Misc