String Functions Docmuentation
This page describes the functions in scanpy_wrappers.py
and their usage.
Functions here default to showing their source code for greater transparency and interoperability with base scanpy functions.
ScanpyMetaObject
Object that collects standardized collections of scanpy functions.
You can access the specific scanpy data in this object using ScanpyMetaObject.adata
.
Use this variable like you would use adata
in any scanpy operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matrix |
GxcFile | ExcFile
|
BioFile object of the matrix file. |
required |
sampledict |
SampleDict
|
a SampleDict object from the BioFileDocket. |
required |
Source code in utils/scanpy_wrappers.py
|
|
cellgene_filter(min_genes=100, min_cells=20)
Filters data by minimum number of genes expressed per cell and minimum expressing cells per gene.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_genes |
int
|
minimum number of genes expressed per cell as a cutoff. |
100
|
min_cells |
int
|
minimum number of cells expressed per gene as a cutoff. |
20
|
Source code in utils/scanpy_wrappers.py
84 85 86 87 88 89 90 91 92 |
|
export_top_genes(key)
Exports the top genes list to a text file, assigning the object as an attribute of the parent object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
a key for the gene list, ending in |
required |
Raises:
Type | Description |
---|---|
Exception
|
when the key does not end in |
Source code in utils/scanpy_wrappers.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
get_top_genes(datatype, top_number=200, tofile=True)
Get the top_number genes per cluster across all clusters without repetition.
Also get the top marker gene per cluster.
Save these values as object attributes with a special key to the parent object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datatype |
str
|
descriptor for the data type, to be included in the outfile name. |
required |
top_number |
int
|
number of top genes to pull out from each cluster. |
200
|
tofile |
bool
|
whether to save a file. (Not implemented yet.) |
True
|
Returns:
Type | Description |
---|---|
tuple[str, str]
|
keys of the top gene list files. |
Source code in utils/scanpy_wrappers.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
map_cellannots(cellannot)
Adds an additional .obs feature for cell annotation.
Imports from a file that has two columns: cell_barcode
and cell_type
.
The cell_barcode
field should be an exact match for cells in the data.
Cells without a matching barcode are given the celltype
label 'Unlabeled'
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cellannot |
CellAnnotFile
|
CellAnnotFile object, usually in the same BioFileDocket. |
required |
Source code in utils/scanpy_wrappers.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
map_cellannots_multispecies(msd)
Adds an additional .obs feature for cell annotation from a MultiSpeciesBioFileDocket.
Imports from a file that has two columns: cell_barcode
and cell_type
.
The cell_barcode
field should be an exact match for cells in the data.
Cells without a matching barcode are given the celltype
label 'Unlabeled'
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
msd |
MultiSpeciesBioFileDocket
|
the docket containing information about files from all species in the dataset. |
required |
Source code in utils/scanpy_wrappers.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
map_gene_to_id(idmm, gene_list, from_id, to_id, check_ids=True)
Maps IDs from one type into another using an IdmmFile object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idmm |
IdmmFile
|
the IdmmFile object you're converting IDs between. |
required |
gene_list |
list of str
|
list of genes in the dataset you want to convert. |
required |
from_id |
str
|
starting feature column in the idmm. |
required |
to_id |
str
|
ending feature in the idmm. |
required |
check_ids |
bool
|
whether to check if the ids in gene_list are present in the parent object. |
True
|
Source code in utils/scanpy_wrappers.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
|
normalize(max_n_genes_by_counts=7000, target_sum=10000.0)
Normalizes data using a max_n_genes_by_counts cutoff and a target sum.
Modifies underlying adata
using cutoff and normalization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_n_genes_by_counts |
int
|
removes cells that exceed this number of counds. |
7000
|
target_sum |
float
|
target number of counts per cell, using scientific notation. |
10000.0
|
Source code in utils/scanpy_wrappers.py
95 96 97 98 99 100 101 102 103 104 105 106 |
|
pca_basic(svd_solver='arpack', color=[], plot=True)
Runs a simple PCA, displaying the first two components and variance plot.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
svd_solver |
str
|
SVD sovling function, passed to |
'arpack'
|
color |
list
|
list of coloring schemes for points in the data; creates one plot per color scheme. |
[]
|
plot |
bool
|
whether or not to make a plot. |
True
|
Source code in utils/scanpy_wrappers.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
rank_genes(on='leiden', method='t-test', plot=True, n_genes=25, sharey=False)
Runs sc.tl.rank_genes_groups
based on passed parameters.
Parameters can be changed to alter the ranking scheme.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on |
str
|
clustering feature to use. Defaults to 'leiden'. |
'leiden'
|
method |
str
|
how to compare groups for gene ranking. Defaults to 't-test'. |
't-test'
|
plot |
bool
|
whether or not to plot the data. |
True
|
n_genes |
int
|
number of genes to plot. |
25
|
sharey |
bool
|
pass to |
False
|
Source code in utils/scanpy_wrappers.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
read(delimiter='\t', cache=True, transpose=True, filter_set=set())
Reads the contained matrix into a scanpy.adata object, transposing if needed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
delimiter |
str
|
a delimeter for the data, such as ',' or ' '. Defaults to ' '. |
'\t'
|
cache |
bool
|
whether or not to create a cache file for the matrix. Defaults to True. |
True
|
transpose |
bool
|
whether or not to transpose the data matrix after loading. Defaults to True. |
True
|
filter_set |
set
|
a set of gene ids to keep from the data, removing those that don't match. |
set()
|
Source code in utils/scanpy_wrappers.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
regress_scale(how=['total_counts'], max_value=10)
Runs regression of specific features and scales data to a maximum value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
how |
list
|
feature to regress out. Defaults to 'total_counts'. |
['total_counts']
|
max_value |
float | int
|
maximum value to scale data. |
10
|
Source code in utils/scanpy_wrappers.py
128 129 130 131 132 133 134 135 136 137 |
|
umap_leiden(n_neighbors=50, n_pcs=40, legend_loc='on data', save=True, plot=True)
Runs Leiden clustering, followed by UMAP, on the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_neighbors |
int
|
number of neighbors to pass to |
50
|
n_pcs |
int
|
number of Principal Components to use from PCA, passed to |
40
|
legend_loc |
str
|
where the legend should go (e.g. |
'on data'
|
save |
bool
|
whether to save the plot. |
True
|
plot |
bool
|
whether to make the plot. |
True
|
Source code in utils/scanpy_wrappers.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
|
variable_filter(min_mean=0.0125, max_mean=3, min_disp=0.1, max_disp=10, plot=True)
Filters genes to use for dimensionality reduction using max/min of mean and max/min of dispersion, plotting optionally.
Modifies underlying adata
and saves original data under adata.raw
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_mean |
float | int
|
minimum mean expression of genes. |
0.0125
|
max_mean |
float | int
|
maximum mean expression of genes. |
3
|
min_disp |
float | int
|
minimum dispersion of gene expression. |
0.1
|
max_disp |
float | int
|
maximum dispersion of gene expression. |
10
|
plot |
bool
|
whether or not to generate a scatter plot showing cutoffs. |
True
|
Source code in utils/scanpy_wrappers.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
violin(x='total_counts', y='n_genes_by_counts', plot=True)
Runs sc.pp.calculate_qc_metrics, sc.pl.violin, and sc.pl.scatter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
str
|
label of the feature to be plotted on the x axis of the violin & scatter plot. |
'total_counts'
|
y |
str
|
label of the feature to be plotted on the y axis of the violin & scatter plot. Defaults to 'n_genes_by_counts'. |
'n_genes_by_counts'
|
Source code in utils/scanpy_wrappers.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
diagonalize_df(df)
Sorts values of a 2d dataframe along their diagonal for prettier plotting.
Source code in utils/scanpy_wrappers.py
324 325 326 327 328 329 330 331 332 |
|