Gut microbiome sourmash gather data frame
gut_gather_df.Rd
The results from running sourmash gather on each of six stool microbiome metagenomes against the GTDB rs207 representatives database.
Format
## `gut_gather_df` A data frame with 1,062 rows and 30 columns:
- intersect_bp
Numeric. Estimated number of intersected base pairs between a metagenome and a genome in a database.
- f_orig_query
Numeric. Fraction of the original query that belongs to the match.
- f_match
Numeric. Fraction of the matched genome in the leftover query.
- f_unique_to_query
Numeric. Fraction of the query that uniquely belongs to the match.
- f_unique_weighted
Numeric. Abundance-weighted fraction of the query that uniquely belongs to the match.
- average_abund
Numeric. Average abundance of k-mers in the metagenome that were in the match.
- median_abund
Numeric. Median abundance of k-mers in the metagenome that were in the match.
- std_abund
Numeric. Standard deviation of abundance of k-mers in the metagenome that were in the match.
- filename
Character. File path for the database on the computer that sourmash gather was executed on.
- name
Character. Name of matched genome in the sourmash gather database.
- genome_accession
Character. Genome accession solved by cutting of the name variable at the first space.
- md5
Character. MD5 hash for the matched genome sketch.
- f_match_orig
Numeric. Fraction of the matched genome in the original query prior to gather subtraction.
- unique_intersect_bp
Numeric. Estimated number of uniquely intersected base pairs between a metagenome and a genome in a database.
- gather_result_rank
Numeric. Rank of match in gather results.
- remaining_bp
Numeric. Remaining base pairs in the query after the match is removed.
- query_filename
Character. File name for the query derived from the query sketch.
- query_name
Character. Name of the query.
- query_md5
Character. MD5 hash for the query.
- query_bp
Character. Number of base pairs in the query.
- ksize
Character. K-mer size used for sourmash gather.
- moltype
Character. Molecule type used for sourmash gather.
- scaled
Numeric. Scaled value that sourmash gather was performed at.
- query_n_hashes
Numeric. Number of hashes (k-mers) in the query.
- query_abundance
Logical. Whether hash (k-mer) abundance information was a part of the query sketch.
- query_containment_ani
Numeric. Containment between the query and the match.
- match_containment_ani
Numeric. Containment between the match and the query.
- average_containment_ani
Numeric. Average of the two containment metrics query_containment_ani and match_containment_ani.
- max_containment_ani
Numeric. Maximum containment between the two containment metrics query_containment_ani and match_containment_ani.
- potential_false_negative
Logical. Whether the containment estimate is a potential false negative.
...