| get_seeds_remote {rCRUX} | R Documentation |
get_seeds_remote combines modified versions of primerTree::primer_search()
and primerTree's parse_primer to make iterative_primer_search()
which is called to query NCBI's
primer BLAST
tool, filters the results, then aggregates them into a single data.frame.
It creates a directory get_seeds_remote in the output_directory_path.
It creates three files inside that directory. One represents the unfiltered
output and another represents the output after filtering with user modifiable
parameters and with appended taxonomy. Also generated is a summary of unique
taxonomic ranks after filtering.
get_seeds_remote(
forward_primer_seq,
reverse_primer_seq,
output_directory_path,
metabarcode_name,
accession_taxa_sql_path,
organism,
mismatch = 3,
minimum_length = 5,
maximum_length = 500,
primer_specificity_database = "nt",
...,
return_table = TRUE
)
forward_primer_seq |
passed to |
reverse_primer_seq |
passed to |
output_directory_path |
the parent directory to place the data in. (e.g. "/path/to/output/12S_V5F1_remote_111122") |
metabarcode_name |
is passed to |
accession_taxa_sql_path |
the path to sql created by taxonomizr (e.g. accession_taxa_sql_path <- "/my/accessionTaxa.sql") |
organism |
a vector of character vectors. Each character vector is passed in turn to primer_search, which passes them to NCBI. get_seeds_remote aggregates all of the results into a single file. (e.g. organism = c("1476529", "7776")) - note increasing taxonomic rank (e.g. increasing from order to class) for this parameter can maximize primer hits, but can also lead to API run throttling due to memory limitations |
mismatch |
the highest acceptable mismatch value. parse_primer_hits
returns a table with a mismatch column. get_seeds_remote removes each
row with a mismatch greater than the specified value.
The default is mismatch = 3 - Note this is smaller than |
minimum_length |
|
maximum_length |
|
primer_specificity_database |
passed to |
... |
additional arguments passed to primer_search, see
|
num_permutations |
the number of primer permutations to search, if the degenerate bases cause more than this number of permutations to exist, this number will be sampled from all possible permutations. The default is num_permutations = 50 - Note for very degenerate bases, searches may be empty due to poor mutual matches for a given forward and reverse primer combination. |
HITSIZE |
a primer BLAST search parameter set high to maximize the number of observations returned. The default HITSIZE = 50000 - note increasing this parameter can maximize primer hits, but can also lead to API run throttling due to memory limitations |
NUM_TARGETS_WITH_PRIMERS |
a primer BLAST search parameter set high to maximize the number of observations returned. The default is NCBI NUM_TARGETS_WITH_PRIMERS = 1000 - - note increasing this parameter can maximize primer hits, but can also lead to API run throttling due to memory limitations |
get_seeds_remote passes the forward and reverse primer sequence for a given
PCR product to iterative_primer_search() along with the taxid(s) of
the organism(s) to blast, the database to search, and many additional possible
parameters to NCBI's primer blast tool (see Note below). Degenerate primers
are converted into all possible non degenerate sets and a user defined maximum
number of primer combinations is passed to to the API. Multiple taxids are
searched independently, as are multiple database searches (e.g. nt and
refseq_representative_genomes). The data are parsed and stored in a dataframe,
which is also written to a file with the suffix
_unfiltered_get_seeds_remote_output.csv.
These hits are further filtered using filter_primer_hits() to
calculate and append amplicon size to the dataframe. Only hits that pass with default
or user modified length and number of mismatches parameters are retained.
Taxonomy is appended to these filtered hits using
get_taxonomizr_from_accession(). The results are written to
to file with the suffix _filtered_get_seeds_remote_output_with_taxonomy.csv.
The number of unique instances for each rank in the taxonomic path for the
filtered hits are tallied (NAs are counted once per rank) and written to a
file with the suffix _filtered_get_seeds_local_remote_taxonomic_rank_counts.txt
Note:
get_seeds_remote passes many parameters to NCBI's primer blast tool.
You can match the parameters to the fields available in the GUI
here. First, use your
browser to view the page source. Search for the field you are interested in
by searching for the title of the field. It should be enclosed in a tag.
Inside the label tag, it says for = "<name_of_parameter>". Copy the string
after for = and add it to get_seeds_remote as the name of a parameter, setting
it equal to whatever you like.
As of 2022-08-16, the primer blast GUI contains some options that are not
implemented by primerTree::primer_search() and by extension iterative_primer_search()
primer_search doesn't include explicit documentation of allowed options, but
it will quickly report if an option isn't allowed, so trial and error will
not be very time consuming.
Note:
See iterative_primer_search() and modifiedPrimerTree_Functions
for additional run parameters not included below.
Check NCBI's primer blast for additional search options**
get_seeds_remote passes many parameters to NCBI's primer blast tool. You can match the parameters to the fields available in the GUI here. First, use your browser to view the page source. Search for the field you are interested in by searching for the title of the field. It should be enclosed in a tag. Inside the label tag, it says for = "<name_of_parameter>". Copy the string after for = and add it to get_seeds_remote as the name of a parameter, setting it equal to whatever you like.
As of 2022-08-16, the primer blast GUI contains some options that are not implemented by primer_search. The table below documents some of the available options.
| Name | Default |
| PRIMER_SPECIFICITY_DATABASE | nt |
| EXCLUDE_ENV | unchecked |
| ORGANISM | Homo sapiens |
| TOTAL_PRIMER_SPECIFICITY_MISMATCH | 1 |
| PRIMER_3END_SPECIFICITY_MISMATCH | 1 |
| TOTAL_MISMATCH_IGNORE | 6 |
| MAX_TARGET_SIZE | 4000 |
| HITSIZE | 50000 |
| EVALUE | 30000 |
| WORD_SIZE | 7 |
| NUM_TARGETS_WITH_PRIMERS | 1000 |
| MAX_TARGET_PER_TEMPLATE | 100 |
a data.frame containing the same information as the .csv it generates
forward_primer_seq = "TAGAACAGGCTCCTCTAG"
reverse_primer_seq = "TTAGATACCCCACTATGC"
output_directory_path <- "/my/directory/12S_V5F1_remote_111122_modified_params"
metabarcode_name <- "12S_V5F1"
accession_taxa_sql_path <- "/my/directory/accessionTaxa.sql"
get_seeds_remote(forward_primer_seq,
reverse_primer_seq,
output_directory_path,
metabarcode_name,
accession_taxa_sql_path,
HITSIZE ='1000000',
evalue='100000',
word_size='6',
MAX_TARGET_PER_TEMPLATE = '5',
NUM_TARGETS_WITH_PRIMERS ='500000', minimum_length = 50,
MAX_TARGET_SIZE = 200,
organism = c("1476529", "7776"), return_table = FALSE)
# This results in approximately 111500 blast seed returns (there is some variation due to database updates, etc.), note the default generated approximately 1047.
# This assumes the user is not throttled by memory limitations.