| get_seeds_local {rCRUX} | R Documentation |
get_seeds_local is a local interpretation of get_seeds_remote() that avoids
querying NCBI's primer BLAST
tool. Although it is slower than remotely generating blast seeds, it is not
subject to the arbitrary throttling of jobs that require significant memory.
It creates a 'get_seeds_local' directory at output_directory_path if one
doesn't yet exist, then creates a subdirectory inside output_directory_path
named after metabarcode_name. It creates four permenant files inside that
directory. Several temporary files are generated during the run for keeping
track of primers being blasted and needing to be blasted, and raw blast output.
One represents the unfiltered output and another represents the output after
filtering with user modifiable parameters and with appended taxonomy. Also
generated is a summary of unique taxonomic ranks after filtering, and a
fasta of the primers used for blast.
get_seeds_local(
forward_primer_seq,
reverse_primer_seq,
output_directory_path,
metabarcode_name,
accession_taxa_sql_path,
blast_db_path,
mismatch = 6,
minimum_length = 5,
maximum_length = 500,
primer_specificity_database = "nt",
num_fprimers_to_blast = 50,
num_rprimers_to_blast = 50,
max_to_blast = 2,
align = "10000000",
...,
return_table = TRUE
)
forward_primer_seq |
which which turns degenerate primers into into a list of all possible non degenerate primers and converts the primer(s) into to a fasta file to be past to run_primer_blastn. (e.g. forward_primer_seq <- "TAGAACAGGCTCCTCTAG" or forward_primer_seq <- c("TAGAACAGGCTCCTCTAG", "GGWACWGGWTGAACWGTWTAYCCYCC") |
reverse_primer_seq |
which which turns degenerate primers into into a list of all possible non degenerate primers and converts the primer(s) into to a fasta file to be past to run_primer_blastn. (e.g reverse_primer_seq <- "TTAGATACCCCACTATGC" or reverse_primer_seq <- c("TTAGATACCCCACTATGC", "TANACYTCNGGRTGNCCRAARAAYCA") |
output_directory_path |
the parent directory to place the data in (e.g. "/path/to/output/12S_V5F1_local_111122_e300_111122"). |
metabarcode_name |
used to name the files. get_seeds_local appends metabarcode_name to the beginning of each of the files it generates (e.g. metabarcode_name <- "12S_V5F1"). |
accession_taxa_sql_path |
the path to sql created by taxonomizr (e.g. accession_taxa_sql_path <- "/my/accessionTaxa.sql") |
blast_db_path |
a directory containing one or more blast-formatted database. For multiple blast databases, separate them with a space and add an extra set of quotes. (e.g blast_db_path <- "/my/ncbi_nt/nt" or blast_db_path <- '"/my/ncbi_nt/nt /my/ncbi_ref_euk_rep_genomes/ref_euk_rep_genomes"') |
mismatch |
the highest acceptable mismatch value per hit. get_seeds_local removes each row with a mismatch greater than the specified value. The default is mismatch = 6 |
minimum_length |
get_seeds_local removes each row that has a value less than minimum_length in the product_length column. The default is minimum_length = 5 |
maximum_length |
get_seeds_local removes each row that has a value greater than maximum_length in the product_length column The default is maximum_length = 500 |
num_fprimers_to_blast |
is the maximum number of possible forward primers to blast. This is relevant for degenerate primers, all possible primers from a degenerate sequence are enumerated, and the user can choose a number to be randomly sampled and used for primer blast. The default is num_fprimers_to_blast = 50 |
num_rprimers_to_blast |
is the maximum number of possible reverse primers to blast. This is relevant for degenerate primers, all possible primers from a degenerate sequence are enumerated, and the user can choose a number to be randomly sampled and used for primer blast. The default is num_rprimers_to_blast = 50 |
max_to_blast |
is the number of primers to blast simultaneously. The default is max_to_blast = 2. - Increasing this number will decrease overall run time, but increase the amount of RAM required. |
align |
is the maximum number of subject hits to return per query blasted. The default is align = '10000000'. - to few alignments will result in no matching pairs of forward and reverse primers. To many alignments can result in an error due to RAM limitations. |
task |
(passed to |
word_size |
(passed to |
evalue |
(passed to |
coverage |
(passed to |
perID |
(passed to |
reward |
(passed to |
ncbi_bin |
passed to |
get_seeds_local passes the forward and reverse primer sequence for a given
PCR product to run_primer_blastn(). In the case of a non degenerate
primer set only two primers will be passed to run_primer_blast. In the case
of a degenerate primer set, get_seeds_local will get all possible versions of
the degenerate primer(s) (using primerTree's enumerate_primers() function),
randomly sample a user defined number of forward and reverse primers, and
generate a fasta file. The selected primers are subset and passed to
run_primer_blastn which queries each primer against a blast formatted database
using the task "blastn_short".
Output is cashed after each sucessful run of run_primer_blastn, so if a run is interrupted the user can resubmit the command and pick up where they left off. The user can modify parameters for the run with the exception of num_fprimers_to_blast and num_rprimers_to_blast.
The returned blast hits for each
sequence are matched and checked to see if they generate plausible amplicon
(e.g. amplify the same accession and are in the correct orientation to produce
a PCR product). These hits are written to a file with the suffix
_unfiltered_get_seeds_local_output.csv. These hits are further filtered for
length and number of mismatches. Taxonomy is appended to these filtered hits
using get_taxonomizr_from_accession(). The results are written to
to a file with the suffix _filtered_get_seeds_local_output_with_taxonomy.csv.
The number of unique instances for each rank in the taxonomic path for the
filtered hits are tallied (NAs are counted once per rank) and written to a
file with the suffix _filtered_get_seeds_local_unique_taxonomic_rank_counts.txt
Note:
Information about the blastn parameters can be found in run_primer_blast, and
by accessing blastn -help. Default parameters are optimized to provide
results similar to that generated through remote blast via primer-blast as
implemented in iterative_primer_search().
# Non degenerate primer example: 12S_V5F1 (Riaz et al. 2011)
forward_primer_seq = "TAGAACAGGCTCCTCTAG"
reverse_primer_seq = "TTAGATACCCCACTATGC"
output_directory_path <- "/my/directory/12S_V5F1_local_111122_species_750"
metabarcode_name <- "12S_V5F1"
accession_taxa_sql_path <- "/my/directory/accessionTaxa.sql"
blast_db_path <- "/my/directory/ncbi_nt/nt"
get_seeds_local(forward_primer_seq,
reverse_primer_seq,
output_directory_path,
metabarcode_name,
accession_taxa_sql_path,
blast_db_path,
minimum_length = 80,
maximum_length = 150)
# adjusting the minimum_length and maximum_length parameters reduces the number of total hits by removing reads that could result from off target amplification
# Degenerate primer example - mlCOIintF/jgHC02198 (Leray et al. 2013)
# Note: this will take considerable time and computational resources
forward_primer_seq <- "GGWACWGGWTGAACWGTWTAYCCYCC"
reverse_primer_seq <- "TANACYTCNGGRTGNCCRAARAAYCA"
output_directory_path <- "/my/directory/CO1_local"
metabarcode_name <- "CO1"
get_seeds_local(forward_primer_seq,
reverse_primer_seq,
output_directory_path,
metabarcode_name,
accession_taxa_sql_path,
blast_db_path,
minimum_length = 200,
maximum_length = 400,
aligns = '10000',
num_rprimers_to_blast = 200,
num_rprimers_to_blast = 2000,
max_to_blast = 10)
# Non Degenerate but high return primer example - 18S (Amaral-Zettler et al. 2009)
# Note: this will take considerable time and computational resources
forward_primer_seq <- "GTACACACCGCCCGTC"
reverse_primer_seq <- "TGATCCTTCTGCAGGTTCACCTAC"
output_directory_path <- "/my/directory/18S_local"
metabarcode_name <- "18S"
get_seeds_local(forward_primer_seq,
reverse_primer_seq,
output_directory_path,
metabarcode_name,
accession_taxa_sql_path,
blast_db_path,
minimum_length = 250,
maximum_length = 350,
max_to_blast = 1)
# blasting two primers at a time can max out a system's RAM, however blasting one at a time is more feasable for personal computers with 16 GB RAM