Skip to contents

The results from running sourmash gather on each of six stool microbiome metagenomes against the GTDB rs207 representatives database and then assigning taxonomy using sourmash taxonomy annotate.

Usage

gut_taxonomy_annotate_df

Format

## `gut_taxonomy_annotate_df` A data frame with 1,062 rows and 40 columns:

intersect_bp

Numeric. Estimated number of intersected base pairs between a metagenome and a genome in a database.

f_orig_query

Numeric. Fraction of the original query that belongs to the match.

f_match

Numeric. Fraction of the matched genome in the leftover query.

f_unique_to_query

Numeric. Fraction of the query that uniquely belongs to the match.

f_unique_weighted

Numeric. Abundance-weighted fraction of the query that uniquely belongs to the match.

average_abund

Numeric. Average abundance of k-mers in the metagenome that were in the match.

median_abund

Numeric. Median abundance of k-mers in the metagenome that were in the match.

std_abund

Numeric. Standard deviation of abundance of k-mers in the metagenome that were in the match.

filename

Character. File path for the database on the computer that sourmash gather was executed on.

name

Character. Name of matched genome in the sourmash gather database.

genome_accession

Character. Genome accession solved by cutting of the name variable at the first space.

md5

Character. MD5 hash for the matched genome sketch.

f_match_orig

Numeric. Fraction of the matched genome in the original query prior to gather subtraction.

unique_intersect_bp

Numeric. Estimated number of uniquely intersected base pairs between a metagenome and a genome in a database.

gather_result_rank

Numeric. Rank of match in gather results.

remaining_bp

Numeric. Remaining base pairs in the query after the match is removed.

query_filename

Character. File name for the query derived from the query sketch.

query_name

Character. Name of the query.

query_md5

Character. MD5 hash for the query.

query_bp

Character. Number of base pairs in the query.

ksize

Character. K-mer size used for sourmash gather.

moltype

Character. Molecule type used for sourmash gather.

scaled

Numeric. Scaled value that sourmash gather was performed at.

query_n_hashes

Numeric. Number of hashes (k-mers) in the query.

query_abundance

Logical. Whether hash (k-mer) abundance information was a part of the query sketch.

query_containment_ani

Numeric. Containment between the query and the match.

match_containment_ani

Numeric. Containment between the match and the query.

average_containment_ani

Numeric. Average of the two containment metrics query_containment_ani and match_containment_ani.

max_containment_ani

Numeric. Maximum containment between the two containment metrics query_containment_ani and match_containment_ani.

potential_false_negative

Logical. Whether the containment estimate is a potential false negative.

lineage

Character. Full taxonomic lineage of the gather match. Each taxonomic level is separated by a semicolon.

domain

Character. Domain in the taxonomic lineage.

phylum

Character. Phylum in the taxonomic lineage.

class

Character. Class in the taxonomic lineage.

order

Character. Order in the taxonomic lineage.

family

Character. Family in the taxonomic lineage.

genus

Character. Genus in the taxonomic lineage.

species

Character. Species in the taxonomic lineage.

strain

Character. Strain in the taxonomic lineage.

n_unique_kmers

Numeric. Abundance-weighted number of unique k-mers attributable to the gather match.

...

Source

<https://github.com/Arcadia-Science/sourmashconsumr/blob/main/data-raw/00_sourmash_commands.sh>