run_blastdbcmd_blastn_and_aggregate_resuts {rCRUX}R Documentation

Runs run_blastdbcmd(), run_blastn(), and aggregates and saves the results

Description

It uses run_blastdbcmd() to find a seed sequence that corresponds to the accession number and forward and reverse stops recorded in the seeds table. run_blastdbcmd() outputs sequences as .fasta-formatted strings, which run_blastdbcmd_blastn_and_aggregate_resuts concatenates into a multi-line fasta, then passes to run_blastn() as an argument. The output of run_blastn() is de-replicated by accession, and only the longest read per replicates is retained in the output table. The run state is saved and passed back to blast_datatable().

Usage

run_blastdbcmd_blastn_and_aggregate_resuts(
  sample_indices = sample_indices,
  save_dir,
  blast_seeds_m,
  ncbi_bin = NULL,
  db,
  too_many_ns,
  db_dir,
  blastdbcmd_failed,
  unsampled_indices,
  output_table,
  wildcards,
  num_rounds,
  ...
)

Arguments

sample_indices

the indices to sample

save_dir

a directory in which to create files representing the current state

blast_seeds_m

blast seeds table but with blast status update

ncbi_bin

passed to run_blastdbcmd() run_blastn() is the path to blast+ tools if not in the user's path. Specify only if blastn and blastdbcmd are not in your path. The default is ncbi_bin = NULL - if not specified in path do the following: ncbi_bin = "/my/local/ncbi-blast-2.10.1+/bin/".

db

the type of blast db - e.g. nt

too_many_ns

a vector of indices that result in a fasta with too many Ns

db_dir

path to the blast db

blastdbcmd_failed

the indicies not found in your blast db

unsampled_indices

the indices that need to be sampled

output_table

the table of results

wildcards

is a character vector that represents the minimum number of consecutive Ns the user will tolerate in a given seed or hit sequence. The default is wildcards = "NNNN"

num_rounds

number of rounds of blast

...

additional arguments passed to run_blastn()


[Package rCRUX version 0.0.1.000 ]