Documentation
The Budoids_class class
- class MidlineIdentifier.Budoids_class.Budoid(args)[source]
Bases:
objectBudoid object class
- Attributes:
Methods
FindPath(**kwagrs)Identify the morphological midline of the structure.
Preprocessing of single cell dataset.
RMOutliers([plot])Remove cells that fall out of the structure segmentation.
ProjectCells([alpha, plot])Project cells onto the nearest coordinate on the morphological midline.
FindOrientation(**kwargs)Orient the coords based on the provided genelists.
run_wrapper([save])A wrapper function to process the datset.
FindDEG(groupby, condition[, method, ...])Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset.
FindSVG(coords[, sample])Finds spatially variable genes (SVGs) for each of the identity classes in a dataset.
Concat(object_list)Merge multiple objects.
- ADProcess()[source]
Preprocessing of single cell dataset. A wrapper function of
Preprocessing(). SeePreprocessing()for more detail.
- FindDEG(groupby, condition, method='DESeq2', corr_method='benjamini-hochberg', **kwagrs)[source]
Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset. A wrapper function of
FindDEG()- Parameters:
- groupby
str The key of the observations grouping to consider.
- method
str(default: ‘DESeq2’) Method used to calcualte DEGs.
DESeq2andDESeq2_pbapply pydeseq2, the python implementation of the DESeq2 method.DESeq2calculates DEGs on single cell level whileDESeq2_pbgenerate pseudobulk expression based oncondition.t-test,'t-test_overestim_var'overestimates variance of each group,'wilcoxon'uses Wilcoxon rank-sum,'logreg'uses logistic regression.If method is one of
['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], This function directly callsscanpy.tl.rank_genes_groups().- condition
str Required for
DESeq2_pbmethod.- kwagrs
Additonal arguments to pass to
scanpy.tl.rank_genes_groups()
- groupby
Examples
>>> import PSUils as ps >>> budoid = ps.io.ReadObj('testdata/Budoid_1A/Budoids.pkl') >>> groupby = 'condition' >>> cond = 'loc' >>> budoid.data.adata.obs >>> # test DESeq2_pb method >>> budoid.FindDEG(groupby, cond, method = 'DESeq2_pb', groups = 'P', reference = 'D') >>> # test wilcoxon method >>> budoid.FindDEG(groupby, method = 'wilcoxon')
- FindOrientation(**kwargs)[source]
Orient the coords based on the provided genelists. A wrapper function of
FindOrientation(). SeeFindOrientation()for more detail.- Parameters:
- kwagrs
Additonal arguments to pass to
FindOrientation()
Examples
To define the proximal (start) and distal (end) ends of the midline using an example datset. If the dataset with both proximal score and distal score greater than self-defined threshold (Thre = 0.01), it will be considered as polarized; Otherwise, it will be considered as non-polarized.
>>> import PSUils as ps >>> budoid = ps.io.ReadObj('testdata/Budoid_1A/Budoids.pkl') >>> start_genes = ['Sox9','Acan','Col2a1','Col9a1','Col9a2','Col11a1'] >>> end_genes = ['Col1a1', 'Col3a1'] >>> coords = 'major_coor_scaled' # previsouly stored midline coordinates >>> budoid.FindOrientation(coords, start_genes, end_genes) >>> adata = budoid.data.adata >>> Thre = 0.01 # self-defined threshold >>> max_s, max_e = adata.uns['start_score'], adata.uns['end_score'] >>> if max_s > Thre and max_e > Thre: ... idx = adata.obs['major_coor_used'] > 0.5 ... adata.obs.loc[idx, 'loc'] = 'Proximal' ... adata.obs.loc[idx, 'loc'] = 'Distal' >>> else: adata.obs['loc'] = 'Round'
- FindPath(**kwagrs)[source]
Identify the morphological midline of the structure. A wrapper function of
FindPath(). SeeFindPath()for more detail.- Parameters:
- kwagrs
Additonal arguments to pass to
FindPath()
- FindSVG(coords, sample='sample', **kwargs)[source]
Finds spatially variable genes (SVGs) for each of the identity classes in a dataset. This should be done on the sample level. A wrapper function of
FindSVG().
- ProjectCells(alpha=0.01, plot=True)[source]
Project cells onto the nearest coordinate on the morphological midline. We developed a scoring scheme which takes into account the distance between coordinates and cells and the number of cells associated with the coordinates. The score of coordinate-cell pair \((i,c)\) is defined as
\[S_{ic} = D_{ic} e^{αN_{i}}\]where \(D_{ic}\) represents the Euclidian distance, \(N_i\) is the number of cells associated with \(i\) and \(α\) is the scaling factor. Each cell was then projected to the coordinate with the highest score.
The adata_class class
- class MidlineIdentifier.adata_class.Adata(fad, sample, outdir)[source]
Bases:
objectAdata object to wrap anndata
- Attributes:
- adata
anndata.AnnData anndata object to store the single cell data. Compatible with all scanpy functions.
- outdir
str output directory to save files
- adata
Methods
Preprocessing of single cell dataset.
EnrichBins(genes, coords[, nbin, score_name])Perform gene set enrichment in bins.
FindOrientation([coords, start_genes, ...])Orient the coords based on the provided genelists.
FindDEG(groupby, condition, method, **kwargs)Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset.
FindSVG(coords, sample[, layer, ...])Finds spatially variable genes (SVGs) for each of the identity classes in a dataset.
- EnrichBins(genes, coords, nbin=4, score_name='score', **kwargs)[source]
Perform gene set enrichment in bins. This function calls
scanpy.tl.score_genes(). Result will be stored into.uns[score_name].
- FindDEG(groupby, condition, method, **kwargs)[source]
Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset.
- Parameters:
- adata
anndata.AnnData Annotated data matrix.
- groupby
str The key of the observations grouping to consider.
- method
str(default: ‘DESeq2’) Method used to calcualte DEGs.
'DESeq2'and'DESeq2_pb'use pydeseq2, the python implementation of the DESeq2 method.DESeq2calculates DEGs on single cell level whileDESeq2_pbgenerate pseudobulk expression based oncondition.'t-test','t-test_overestim_var'overestimates variance of each group,'wilcoxon'uses Wilcoxon rank-sum,'logreg'uses logistic regression.If method is one of
['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], This function directly callsscanpy.tl.rank_genes_groups().- kwargs
Additonal arguments to pass to
scanpy.tl.rank_genes_groups()
- adata
- Returns:
- FindOrientation(coords='major_coor_scaled', start_genes=['Sox9', 'Acan', 'Col2a1', 'Col9a1', 'Col9a2', 'Col11a1'], end_genes=['Col1a1', 'Col3a1'], plot=True, **kwargs)[source]
Orient the coords based on the provided genelists. This allows cross-dataset/structure comparisons. Result will be stored into
.uns[start_score]and.uns[end_score].- Parameters:
- coords
str The key of the observations to consider. Must be one of the
.obs.columns- start_genes
list The list of gene names used to calculate the start
- end_genes
list The list of gene names used to calculate the end
- plot
bool(default: True) Set to
Trueby default. If True, save teh plot into'Orientation_score.pdf'in the output directory (.outdir)- kwargs
Additonal arguments to pass to
EnrichBins()
- coords
- FindSVG(coords, sample, layer='counts', min_exp_gene=0, min_exp_cell=0)[source]
Finds spatially variable genes (SVGs) for each of the identity classes in a dataset. This function incoporate
SpatialDE(). Raw counts should be used.- Parameters:
- sample
str(default: ‘sample’) Sample identifier. Must be one of the .obs.columns
- coords
str Spatial coordinates for each cell. Can be one of the
.obs.columnsor apandas.DataFramewith rows as cells and columns as spatial dimensions.- layer
str(default: ‘counts’) Key from adata.layers whose value will be used to. If None, .`adata.layers[‘counts’]` will be used.
- min_exp_gene
int(default: ‘0’) Filter genes whose expression lower than this
- min_exp_cell
int(default: ‘0’) Filter cells whose total expression lower than this
- sample
- PolarizationScoring(genes, norm=False, coords='major_coor_used', bootstrapping=False, n_bs=1000, random_state=1234)[source]
Calculate the polarization score for genes.
- Preprocessing()[source]
Preprocessing of single cell dataset. This function calls
scanpy.pp.normalize_total()andscanpy.pp.log1p().Raw data, normalized data and log data will be stored into
.layers['counts'],.layers["norm_counts"]and.layers["lognorm_counts"]respectively.
The image_class class
- class MidlineIdentifier.image_class.Img(fimg, dc, do, outdir)[source]
Bases:
objectImage object
- Attributes:
- img
np.ndarray Original image
- imgb
np.ndarray Binary image
- dc
int Disk size for image closing
- do
int Disk size for image openning
- paddings
numpy.ndarray Record if paddings has been performed for the four borders of the image.
- outdir
str output directory to save files
- ar
float The aspect ratio of the major structure
- measure
pandas.DataFrame - seg
numpy.ndarray The segmentation of the major structure
- ridge
numpy.ndarray The segmentation of the major structure
- path
numpy.ndarray The morphological midline of the major structure in the image
- start
tuple The start of the morphological midline
- end
tuple The end of the morphological midline
- img
Methods
Padding([tolerance, num_pads])Image padding.
Get a closed segmentation of the major structure
Image_measurement([plot])Measure properties of all image regions.
Remove small regions (noise) by setting the corresponding pixels to false.
Return the start and end of the mophological midline.
Return the aspect ratio of the structure.
FindRidge(**kwargs)Filter the Euclidean distance transform of the image with the Meijering neuriteness filter.
FindPath([plot])Identify the morphological midline of the major structure in the image.The midline will be store in
.path.- FindPath(plot=True)[source]
Identify the morphological midline of the major structure in the image.The midline will be store in
.path.- Parameters:
- plot
bool(default: True) If True, save teh plot into
'Major_axis_sk_on_ridge.pdf'in the output directory (.outdir)- kwargs
Additonal arguments to pass to
skimage.filters.meijering()
- plot
- FindRidge(**kwargs)[source]
Filter the Euclidean distance transform of the image with the Meijering neuriteness filter. This function calls
scipy.ndimage.distance_transform_edt()andskimage.filters.meijering().- Parameters:
- kwargs
Additonal arguments to pass to
skimage.filters.meijering()
- GetAspectRatio()[source]
Return the aspect ratio of the structure.
- Returns:
- :
- AspectRatio
float Aspect ratio of the major structure
- Image_measurement(plot=True)[source]
Measure properties of all image regions. Measurements will be stored in
.measure- Parameters:
- plot
bool(default: True) If True, save teh plot into
'region_measure.pdf'in the output directory (.outdir)
- plot
- Padding(tolerance=50, num_pads=100)[source]
Image padding. Paddings will be performed if the target structure locates too close to the image borders. This will allow better structure segmentation.
If padding is performed, the paddings attribute will be modified accordingly.
The MidlineIdentifier.io module
The MidlineIdentifier.plotting module
- MidlineIdentifier.plotting.trend_plot(budoid, features, groupby, coords='major_coor_used', save=False, **kwargs)[source]
Makes a trend plot of the expression values of var_names as a function of coords
For each var_name and each groupby category a dot is plotted. Each dot represents two values: mean expression within each category (visualized by color) and fraction of cells expressing the var_name in the category (visualized by the size of the dot). If groupby is not given, the dotplot assumes that all data belongs to a single category.
This function use
seaborn.lmplot(). If you need more flexibility, you should useseaborn.lmplot()directly.- Parameters:
- feature
str|list Column name in .var DataFrame that stores gene symbols. By default var_names refer to the index column of the .var DataFrame.
- groupby
str The key of the observation grouping to consider. Must be one of obs.columns
- coords
str(default: ‘major_coor_used’) To which the gene expression should be consider to. Must be one of obs.columns.
- save
bool(default: False) If True or a str, save the figure. A string is appended to the default filename. Infer the filetype if ending on {‘.pdf’, ‘.png’, ‘.svg’}.
- kwargs
Additonal arguments to pass to
seaborn.lmplot()
- feature
- Returns:
- :
seaborn.lmplot()object.
Examples
Create a trend plot using the given markers using an example dataset grouped by the category ‘batch’.
import PSUils as ps budoid1 = ps.io.ReadObj('testdata/Budoid_1A/Budoids.pkl') budoid2 = ps.io.ReadObj('testdata/Budoid_3H/Budoids.pkl') budoid1.Concat(budoid2) markers = ['Col9a2','Col3a1'] sc.pl.dotplot(budoid1, markers, groupby='batch')
The MidlineIdentifier.utilis module
- MidlineIdentifier.utilis.EuclideanDist(pts, pt)[source]
Calculate the Euclidean distance between one point and other point(s).
- Parameters:
- pts
array_like - pt
array_likeof size one
- pts
- Returns:
- :
- dist
floatornumpy.ndarray Euclidean distance
- MidlineIdentifier.utilis.ParseArgs(args)[source]
Parse arguments from the commandline.
- Parameters:
- args
list List of arguments
- args
- Returns:
- :
- fad, fimg, args.diskClosing, args.diskOpening, outdir, sample
- MidlineIdentifier.utilis.ScaleMinMax(x)[source]
Scale the input vector into the range between zero and one.
- Parameters:
- x
array_like
- x
- Returns:
- :
array_likeScaled x
- MidlineIdentifier.utilis.grouped_obs(adata, groupby, method, layer=None, gene_symbols=None)[source]
Get average exp by condition.
- Parameters:
- adata
anndata.AnnData Annotated data matrix.
- groupby
str The key of the observations grouping to consider.
- method
str Method used to aggregate the expression. Must be one of
['sum','mean']- layer
str(default: None) Key from adata.layers whose value will be used to. If None, adata.X will be used.
- gene_symbols
list|None(default: None) Genes to aggregate. If None, calculation will be done for all genes
- adata
- Returns:
- :
pandas.DataFrameA gene by group dataframe