String Functions Docmuentation

This page describes the functions in string_functions.py and their usage.

convert_dbxref_to_dict(string)

Takes a string from the Dbxref additional fields and parses into dict.

Parameters:

Name Type Description Default
string str

String in the Dbxref format, as below.
'feature1:entry1,feature2:entry2'

required

Returns:

Type Description
dict

Unpacked string into a dict with key:value format, as below.
{'feature1': 'entry1', 'feature2': 'entry2'}

Examples:

>>> convert_dbxref_to_dict('Ensembl:123104,UniProtKB:Q7D56')
{'Ensembl': '123104', 'UniProtKB': 'Q7D56'}

convert_dict_to_fields_gff(dictionary)

Takes a dict and converts it to a string for the additional fields section of GFF format.

Parameters:

Name Type Description Default
dictionary dict

a dict with key:value format, as below.
'feature1 "entry1"; feature2 "entry2"'

required

Returns:

Type Description
str

String in GFF additional fields format, as below:
'feature1=entry1;feature2=entry2'

Examples:

>>> convert_dict_to_fields_gff({'gene_name': 'Hh', 'gene_id': '123'})
'gene_name=Hh;gene_id=123'

convert_dict_to_fields_gtf(dictionary)

Takes a dict and converts it to a string for the additional fields section of GTF format.

Parameters:

Name Type Description Default
dictionary dict

a dict with key:value format, as below.
'feature1 "entry1"; feature2 "entry2"'

required

Returns:

Type Description
str

String in GTF additional fields format, as below:
'feature1=entry1;feature2=entry2'

Examples:

>>> convert_dict_to_fields_gtf({'gene_name': 'Hh', 'gene_id': '123'})
'gene_name "Hh"; gene_id "123"'

convert_fields_to_dict_gff(string)

Takes a string from the GFF additional fields (column 8) and parses into a dict.

Parameters:

Name Type Description Default
string str

String in the GFF additional fields format, as below.
'feature1=entry1;feature2=entry2'

required

Returns:

Type Description
dict

Unpacked string into a dict with key:value format, as below.
{'feature1': 'entry1', 'feature2': 'entry2'}

Examples:

>>> convert_fields_to_dict_gff('gene_name=Hh;gene_id=123')
{'gene_name': 'Hh', 'gene_id': '123'}

convert_fields_to_dict_gtf(string)

Takes a string from the GTF additional fields (column 8) and parses into a dict.
Tolerates strings with and without a terminal ';' separator.

Parameters:

Name Type Description Default
string str

String in the GTF additional fields format, as below.
'feature1 "entry1"; feature2 "entry2";'

required

Returns:

Type Description
dict

Unpacked string into a dict with key:value format, as below.
{'feature1': 'entry1', 'feature2': 'entry2'}

Examples:

>>> convert_fields_to_dict_gtf('gene_name "Hh"; gene_id "123"')
{'gene_name': 'Hh', 'gene_id': '123'}

make_gene_list(lst, filename)

Makes a gene list file using a list of genes and a string filename or path.
File will display a list of genes, one per line.

Parameters:

Name Type Description Default
lst list

list of gene ids to be saved.

required
filename str

name or path to destination file.

required

Examples:

>>> make_gene_list(['Hh', 'bcd', 'nos'], 'fly_genes.txt')
>>> with open('fly_genes.txt', 'r') as file:
>>>     for line in file:
>>>         print(line)
'Hh'
'bcd'
'nos'

prefixify(species)

Converts a string of format "Genus_species" to "Gspe" species prefix format.

Note

If the species string lacks an underscore, will return the entire original string.

Parameters:

Name Type Description Default
species str

name of species as Genus_species

required

Returns:

Type Description
str

species prefix, e.g. Gspe

Examples:

>>> prefixify('Genus_species')
'Gspe'
>>> prefixify('MmusDrerXlae')
'MmusDrerXlae'