Skip to contents

`read_taxonomy_annotate()` reads in one or many CSV files produced by the command line function sourmash taxonomy annotate. Genome matches can be filtered with `intersect_bp_threshold`, whose default value is 0 base pairs. It adds the column `n_unique_kmers`, the abundance-weighted number of unique k-mers overlapping between a query and its match. Because the output columns from sourmash gather, and by extension sourmash taxonomy, sometimes increase with new versions of sourmash, this function will emit a warning when there are columns missing in the CSV file. This warning can be safely ignored but marks that your results were generated with an earlier version of sourmash, and if you were to re-run sourmash gather, you would have additional information in the output.

Usage

read_taxonomy_annotate(
  file,
  intersect_bp_threshold = 0,
  separate_lineage = T,
  ...
)

Arguments

file

Path to CSV file or files output by sourmash taxonomy annotate.

intersect_bp_threshold

Integer. Gather matches must have an intersect_bp greater than or equal to this value.

separate_lineage

Boolean. Read in lineage as a single column or separate each taxonomic level to its own column.

...

Arguments passed to read_csv().

Value

A tibble.

Examples

if (FALSE) {
read_taxonomy_annotate()
}