Documentation

The Budoids_class class

class MidlineIdentifier.Budoids_class.Budoid(args)[source]

Bases: object

Budoid object class

Attributes:
imgImg

Image object

dataAdata

Single cell object

samplestr

Sample identifier. Will be store in adata.obs[‘sample’]. Useful when concanating multiple Adata objects.

outdirstr

Output directory where files will be saved

Methods

FindPath(**kwagrs)

Identify the morphological midline of the structure.

ADProcess()

Preprocessing of single cell dataset.

RMOutliers([plot])

Remove cells that fall out of the structure segmentation.

ProjectCells([alpha, plot])

Project cells onto the nearest coordinate on the morphological midline.

FindOrientation(**kwargs)

Orient the coords based on the provided genelists.

run_wrapper([save])

A wrapper function to process the datset.

FindDEG(groupby, condition[, method, ...])

Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset.

FindSVG(coords[, sample])

Finds spatially variable genes (SVGs) for each of the identity classes in a dataset.

Concat(object_list)

Merge multiple objects.

ADProcess()[source]

Preprocessing of single cell dataset. A wrapper function of Preprocessing(). See Preprocessing() for more detail.

Concat(object_list)[source]

Merge multiple objects.

Parameters:
object_listlist of Budoid

A list of Budoid to merge

FindDEG(groupby, condition, method='DESeq2', corr_method='benjamini-hochberg', **kwagrs)[source]

Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset. A wrapper function of FindDEG()

Parameters:
groupbystr

The key of the observations grouping to consider.

methodstr (default: ‘DESeq2’)

Method used to calcualte DEGs. DESeq2 and DESeq2_pb apply pydeseq2, the python implementation of the DESeq2 method. DESeq2 calculates DEGs on single cell level while DESeq2_pb generate pseudobulk expression based on condition. t-test, 't-test_overestim_var' overestimates variance of each group, 'wilcoxon' uses Wilcoxon rank-sum, 'logreg' uses logistic regression.

If method is one of ['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], This function directly calls scanpy.tl.rank_genes_groups().

conditionstr

Required for DESeq2_pb method.

kwagrs

Additonal arguments to pass to scanpy.tl.rank_genes_groups()

Examples

>>> import PSUils as ps
>>> budoid = ps.io.ReadObj('testdata/Budoid_1A/Budoids.pkl')

>>> groupby = 'condition'
>>> cond = 'loc'
>>> budoid.data.adata.obs

>>> # test DESeq2_pb method
>>> budoid.FindDEG(groupby, cond, method = 'DESeq2_pb', groups = 'P', reference = 'D')

>>> # test wilcoxon method
>>> budoid.FindDEG(groupby, method = 'wilcoxon')
FindOrientation(**kwargs)[source]

Orient the coords based on the provided genelists. A wrapper function of FindOrientation(). See FindOrientation() for more detail.

Parameters:
kwagrs

Additonal arguments to pass to FindOrientation()

Examples

To define the proximal (start) and distal (end) ends of the midline using an example datset. If the dataset with both proximal score and distal score greater than self-defined threshold (Thre = 0.01), it will be considered as polarized; Otherwise, it will be considered as non-polarized.

>>> import PSUils as ps
>>> budoid = ps.io.ReadObj('testdata/Budoid_1A/Budoids.pkl')

>>> start_genes = ['Sox9','Acan','Col2a1','Col9a1','Col9a2','Col11a1']
>>> end_genes = ['Col1a1', 'Col3a1']
>>> coords = 'major_coor_scaled' # previsouly stored midline coordinates

>>> budoid.FindOrientation(coords, start_genes, end_genes)

>>> adata = budoid.data.adata
>>> Thre = 0.01 # self-defined threshold
>>> max_s, max_e = adata.uns['start_score'], adata.uns['end_score']

>>> if max_s > Thre and max_e > Thre:
... idx = adata.obs['major_coor_used'] > 0.5
... adata.obs.loc[idx, 'loc'] = 'Proximal'
... adata.obs.loc[idx, 'loc'] = 'Distal'
>>> else:
    adata.obs['loc'] = 'Round'
FindPath(**kwagrs)[source]

Identify the morphological midline of the structure. A wrapper function of FindPath(). See FindPath() for more detail.

Parameters:
kwagrs

Additonal arguments to pass to FindPath()

FindSVG(coords, sample='sample', **kwargs)[source]

Finds spatially variable genes (SVGs) for each of the identity classes in a dataset. This should be done on the sample level. A wrapper function of FindSVG().

Parameters:
samplestr (Default: sample)

Sample identifier. Must be one of the .obs.columns

kwagrs

Additonal arguments to pass to FindSVG()

ProjectCells(alpha=0.01, plot=True)[source]

Project cells onto the nearest coordinate on the morphological midline. We developed a scoring scheme which takes into account the distance between coordinates and cells and the number of cells associated with the coordinates. The score of coordinate-cell pair \((i,c)\) is defined as

\[S_{ic} = D_{ic} e^{αN_{i}}\]

where \(D_{ic}\) represents the Euclidian distance, \(N_i\) is the number of cells associated with \(i\) and \(α\) is the scaling factor. Each cell was then projected to the coordinate with the highest score.

Parameters:
alphafloat (default: 0.01)

alpha (\(α\)) that control the level of penalty.

plotbool (default: True)

If True, save teh plot into 'Cells_remove.pdf' in the output directory (.outdir)

RMOutliers(plot=True)[source]

Remove cells that fall out of the structure segmentation.

Parameters:
plotbool

If True, save the plot into 'Cells_remove.pdf' in the output directory (.outdir)

run_wrapper(save=True, **kwagrs)[source]

A wrapper function to process the datset.

Parameters:
savebool (default: True)

If True, save the processed data into pickle file.

kwagrs

Additonal arguments to pass to SaveObj()

The adata_class class

class MidlineIdentifier.adata_class.Adata(fad, sample, outdir)[source]

Bases: object

Adata object to wrap anndata

Attributes:
adataanndata.AnnData

anndata object to store the single cell data. Compatible with all scanpy functions.

outdirstr

output directory to save files

Methods

Preprocessing()

Preprocessing of single cell dataset.

EnrichBins(genes, coords[, nbin, score_name])

Perform gene set enrichment in bins.

FindOrientation([coords, start_genes, ...])

Orient the coords based on the provided genelists.

FindDEG(groupby, condition, method, **kwargs)

Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset.

FindSVG(coords, sample[, layer, ...])

Finds spatially variable genes (SVGs) for each of the identity classes in a dataset.

EnrichBins(genes, coords, nbin=4, score_name='score', **kwargs)[source]

Perform gene set enrichment in bins. This function calls scanpy.tl.score_genes(). Result will be stored into .uns[score_name].

Parameters:
geneslist | str

The list of gene names used for score calculation

coordsstr

The key of the observations to consider. Must be one of the .obs.columns

nbinint

The number of bins

score_namestr

Name of the field to be added in .uns

kwargs

Additonal arguments to pass to scanpy.tl.score_genes()

FindDEG(groupby, condition, method, **kwargs)[source]

Finds differentially expressed genes (DEGs) for each of the identity classes in a dataset.

Parameters:
adataanndata.AnnData

Annotated data matrix.

groupbystr

The key of the observations grouping to consider.

methodstr (default: ‘DESeq2’)

Method used to calcualte DEGs. 'DESeq2' and 'DESeq2_pb' use pydeseq2, the python implementation of the DESeq2 method. DESeq2 calculates DEGs on single cell level while DESeq2_pb generate pseudobulk expression based on condition. 't-test', 't-test_overestim_var' overestimates variance of each group, 'wilcoxon' uses Wilcoxon rank-sum, 'logreg' uses logistic regression.

If method is one of ['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], This function directly calls scanpy.tl.rank_genes_groups().

kwargs

Additonal arguments to pass to scanpy.tl.rank_genes_groups()

Returns:
:
pandas.DataFrame
FindOrientation(coords='major_coor_scaled', start_genes=['Sox9', 'Acan', 'Col2a1', 'Col9a1', 'Col9a2', 'Col11a1'], end_genes=['Col1a1', 'Col3a1'], plot=True, **kwargs)[source]

Orient the coords based on the provided genelists. This allows cross-dataset/structure comparisons. Result will be stored into .uns[start_score] and .uns[end_score].

Parameters:
coordsstr

The key of the observations to consider. Must be one of the .obs.columns

start_geneslist

The list of gene names used to calculate the start

end_geneslist

The list of gene names used to calculate the end

plotbool (default: True)

Set to True by default. If True, save teh plot into 'Orientation_score.pdf' in the output directory (.outdir)

kwargs

Additonal arguments to pass to EnrichBins()

FindSVG(coords, sample, layer='counts', min_exp_gene=0, min_exp_cell=0)[source]

Finds spatially variable genes (SVGs) for each of the identity classes in a dataset. This function incoporate SpatialDE(). Raw counts should be used.

Parameters:
samplestr (default: ‘sample’)

Sample identifier. Must be one of the .obs.columns

coordsstr

Spatial coordinates for each cell. Can be one of the .obs.columns or a pandas.DataFrame with rows as cells and columns as spatial dimensions.

layerstr (default: ‘counts’)

Key from adata.layers whose value will be used to. If None, .`adata.layers[‘counts’]` will be used.

min_exp_geneint (default: ‘0’)

Filter genes whose expression lower than this

min_exp_cellint (default: ‘0’)

Filter cells whose total expression lower than this

PolarizationScoring(genes, norm=False, coords='major_coor_used', bootstrapping=False, n_bs=1000, random_state=1234)[source]

Calculate the polarization score for genes.

Parameters:
geneslist | str

genes to calculate

bootstrappingbool (defaultFalse)

Whether to bootstrapping or not

n_bsint (default100)

The number of bootstrapping to perform

random_stateint (default1234)

Set random seed for reproducibility

Preprocessing()[source]

Preprocessing of single cell dataset. This function calls scanpy.pp.normalize_total() and scanpy.pp.log1p().

Raw data, normalized data and log data will be stored into .layers['counts'], .layers["norm_counts"] and .layers["lognorm_counts"] respectively.

The image_class class

class MidlineIdentifier.image_class.Img(fimg, dc, do, outdir)[source]

Bases: object

Image object

Attributes:
imgnp.ndarray

Original image

imgbnp.ndarray

Binary image

dcint

Disk size for image closing

doint

Disk size for image openning

paddingsnumpy.ndarray

Record if paddings has been performed for the four borders of the image.

outdirstr

output directory to save files

arfloat

The aspect ratio of the major structure

measurepandas.DataFrame
segnumpy.ndarray

The segmentation of the major structure

ridgenumpy.ndarray

The segmentation of the major structure

pathnumpy.ndarray

The morphological midline of the major structure in the image

starttuple

The start of the morphological midline

endtuple

The end of the morphological midline

Methods

Padding([tolerance, num_pads])

Image padding.

Segmentation()

Get a closed segmentation of the major structure

Image_measurement([plot])

Measure properties of all image regions.

RMSmallRegion()

Remove small regions (noise) by setting the corresponding pixels to false.

GetStartEnd()

Return the start and end of the mophological midline.

GetAspectRatio()

Return the aspect ratio of the structure.

FindRidge(**kwargs)

Filter the Euclidean distance transform of the image with the Meijering neuriteness filter.

FindPath([plot])

Identify the morphological midline of the major structure in the image.The midline will be store in .path.

FindPath(plot=True)[source]

Identify the morphological midline of the major structure in the image.The midline will be store in .path.

Parameters:
plotbool (default: True)

If True, save teh plot into 'Major_axis_sk_on_ridge.pdf' in the output directory (.outdir)

kwargs

Additonal arguments to pass to skimage.filters.meijering()

FindRidge(**kwargs)[source]

Filter the Euclidean distance transform of the image with the Meijering neuriteness filter. This function calls scipy.ndimage.distance_transform_edt() and skimage.filters.meijering().

Parameters:
kwargs

Additonal arguments to pass to skimage.filters.meijering()

GetAspectRatio()[source]

Return the aspect ratio of the structure.

Returns:
:
AspectRatiofloat

Aspect ratio of the major structure

GetStartEnd()[source]

Return the start and end of the mophological midline.

Returns:
:
starttuple of ( :class:`int, int)`

Start pixel coordiantes of the morphological midline

endtuple of ( :class:`int, int)`

End pixel coordiantes of the morphological midline

Image_measurement(plot=True)[source]

Measure properties of all image regions. Measurements will be stored in .measure

Parameters:
plotbool (default: True)

If True, save teh plot into 'region_measure.pdf' in the output directory (.outdir)

Padding(tolerance=50, num_pads=100)[source]

Image padding. Paddings will be performed if the target structure locates too close to the image borders. This will allow better structure segmentation.

If padding is performed, the paddings attribute will be modified accordingly.

Parameters:
toleranceint (default: 50)

The number of pixel to tolerant.

num_padsint (default: 100)

The number of pixel to pad

RMSmallRegion()[source]

Remove small regions (noise) by setting the corresponding pixels to false. Only the largest segment was kept for structural segmentation.

Segmentation()[source]

Get a closed segmentation of the major structure

The MidlineIdentifier.io module

MidlineIdentifier.io.ReadObj(filename)[source]

Read .pkl-formatted pickle file.

Parameters:
filenamestr

File name of data file.

MidlineIdentifier.io.SaveObj(obj, filename=None)[source]

Save object into pkl-formatted pickle file.

Parameters:
objBudoid

Budoid object

filenamestr

File name of data file.

The MidlineIdentifier.plotting module

MidlineIdentifier.plotting.trend_plot(budoid, features, groupby, coords='major_coor_used', save=False, **kwargs)[source]

Makes a trend plot of the expression values of var_names as a function of coords

For each var_name and each groupby category a dot is plotted. Each dot represents two values: mean expression within each category (visualized by color) and fraction of cells expressing the var_name in the category (visualized by the size of the dot). If groupby is not given, the dotplot assumes that all data belongs to a single category.

This function use seaborn.lmplot(). If you need more flexibility, you should use seaborn.lmplot() directly.

Parameters:
featurestr | list

Column name in .var DataFrame that stores gene symbols. By default var_names refer to the index column of the .var DataFrame.

groupbystr

The key of the observation grouping to consider. Must be one of obs.columns

coordsstr (default: ‘major_coor_used’)

To which the gene expression should be consider to. Must be one of obs.columns.

savebool (default: False)

If True or a str, save the figure. A string is appended to the default filename. Infer the filetype if ending on {‘.pdf’, ‘.png’, ‘.svg’}.

kwargs

Additonal arguments to pass to seaborn.lmplot()

Returns:
:
seaborn.lmplot() object.

Examples

Create a trend plot using the given markers using an example dataset grouped by the category ‘batch’.

import PSUils as ps

budoid1 = ps.io.ReadObj('testdata/Budoid_1A/Budoids.pkl')
budoid2 = ps.io.ReadObj('testdata/Budoid_3H/Budoids.pkl')
budoid1.Concat(budoid2)

markers = ['Col9a2','Col3a1']
sc.pl.dotplot(budoid1, markers, groupby='batch')

The MidlineIdentifier.utilis module

MidlineIdentifier.utilis.EuclideanDist(pts, pt)[source]

Calculate the Euclidean distance between one point and other point(s).

Parameters:
ptsarray_like
ptarray_like of size one
Returns:
:
distfloat or numpy.ndarray

Euclidean distance

MidlineIdentifier.utilis.ParseArgs(args)[source]

Parse arguments from the commandline.

Parameters:
argslist

List of arguments

Returns:
:
fad, fimg, args.diskClosing, args.diskOpening, outdir, sample
MidlineIdentifier.utilis.ScaleMinMax(x)[source]

Scale the input vector into the range between zero and one.

Parameters:
xarray_like
Returns:
:
array_like

Scaled x

MidlineIdentifier.utilis.grouped_obs(adata, groupby, method, layer=None, gene_symbols=None)[source]

Get average exp by condition.

Parameters:
adataanndata.AnnData

Annotated data matrix.

groupbystr

The key of the observations grouping to consider.

methodstr

Method used to aggregate the expression. Must be one of ['sum','mean']

layerstr (default: None)

Key from adata.layers whose value will be used to. If None, adata.X will be used.

gene_symbolslist | None (default: None)

Genes to aggregate. If None, calculation will be done for all genes

Returns:
:
pandas.DataFrame

A gene by group dataframe

Module contents