SPANN Model

class spann.model.SPANN_model(x_dim, z_dim, enc, dec, class_num, device)[source]

SPANN model, using scRNA reference dataset to annotate spatial transcriptome dataset

X_dim:

input dimension for the encoder

Z_dim:

latent feature dimension

Enc:

encoder parameters, for example, [['fc', 1024, 1, 'relu'], ['fc', 16, '', '']]

Dec:

decoder parameters, dec is a set {cm_dec_params, spa_dec_params, rna_dec_params}, each element shapes like enc

Class_num:

number of scRNA-seq cell types

Device:

device on which SPAMM model is aranged

train(source_cm_dl, target_cm_dl, source_sp_ds, target_sp_ds, spatial_coor, test_source_cm_dl, test_target_cm_dl, source_labels, cell_types, lr=0.0002, lambda_recon=2000, lambda_kl=0.5, lambda_spa=0.1, lambda_cd=0.001, lambda_nb=0.1, mu=0.6, temp=0.1, k=20, resolution=0.5, novel_cell_test=True, maxiter=6000, miditer1=2000, miditer2=5000, miditer3=4000, test_freq=1000)[source]

Training and evaluating function for SPANN applying on test spatial dataset with ground truth labels

Source_cm_dl:

torch dataloader of the common genes scRNA-seq data

Target_cm_dl:

torch dataloader of the common genes spatial data

Source_sp_ds:

torch dataset of the scRNA-seq specific genes data

Target_sp_ds:

torch dataset of the spatial specific genes data

Spatial_coor:

pandas dataframe of the raw spatial coordinates, column names ['X','Y']

Test_source_cm_dl:

torch dataloader of the common genes scRNA-seq data for test

Test_target_cm_dl:

torch dataloader of the common genes spatial data for test

Source_labels:

integer labels for scRNA-seq data

Cell_types:

list of all cell types exist in scRNA-seq dataset

Lr:

learning rate for VAE and classifier, default=1e-4

Lamnda_recon:

training weight for the reconstruction loss, default=2000

Lambda_kl:

training weight for the KL-divergence loss, default=0.5

Lambda_spa:

training weight for the adjacency loss, default=0.1, we recomend to set it smaller when the spatial expression pattern is weak

Lambda_cd:

training weight for the unbalanced optimal transport alignment loss, default=0.001, we recomend to set it higher when the gap between scRNA-seq and spatial datasets is big

Lambda_nb:

training weight for the neighbor loss, default=0.1

Mu:

updating speed of beta by moving average, default=0.5

Resolution:

the expected minimum proportion of known cells, 0 means no constraints on novel cell discovery and 1 means no novel cells, default=0.5

Novel_cell_test:

whether to apply dip test and compute BC co-efficient to adaptively select resolution, default=True

Maxiter:

maximum iteration, default=6000

Miditer1:

after which iteration SPANN starts to conduct UOT alignment, default=2000

Miditer2:

after which iteration SPANN starts to train with neighbor loss, default=5000

Miditer3:

after which iteration SPANN starts to train with adjacency loss, default=4000

Test_freq:

test frequency, default=1000

Returns:

two AnnData objects, adata_source and adata_target, containing the spatial coordinates, latent embeddings, E-scores, predictions and etc.

train_eval(source_cm_dl, target_cm_dl, source_sp_ds, target_sp_ds, spatial_coor, test_source_cm_dl, test_target_cm_dl, source_labels, target_labels, cell_types, common_cell_type, lr=0.0002, lambda_recon=2000, lambda_kl=0.5, lambda_spa=0.1, lambda_cd=0.001, lambda_nb=0.1, mu=0.6, temp=0.1, k=20, resolution=0.5, novel_cell_test=True, maxiter=6000, miditer1=2000, miditer2=5000, miditer3=4000, test_freq=1000)[source]

Training and evaluating function for SPANN applying on test spatial dataset with ground truth labels

Source_cm_dl:

torch dataloader of the common genes scRNA-seq data

Target_cm_dl:

torch dataloader of the common genes spatial data

Source_sp_ds:

torch dataset of the scRNA-seq specific genes data

Target_sp_ds:

torch dataset of the spatial specific genes data

Spatial_coor:

pandas dataframe of the raw spatial coordinates, column names ['X','Y']

Test_source_cm_dl:

torch dataloader of the common genes scRNA-seq data for test or validate

Test_target_cm_dl:

torch dataloader of the common genes spatial data for test or validate

Source_labels:

integer labels for scRNA-seq data

Target_labels:

integer labels for spatial data

Cell_types:

list of all cell types exist in scRNA-seq or spatial datasets

Common_cell_types:

list of cell types exist in both scRNA-seq and spatial datasets

Lr:

learning rate for VAE and classifier, default=2e-4

Lamnda_recon:

training weight for the reconstruction loss, default=2000

Lambda_kl:

training weight for the KL-divergence loss, default=0.5

Lambda_spa:

training weight for the adjacency loss, default=0.1, we recomend to set it smaller when the spatial expression pattern is weak

Lambda_cd:

training weight for the unbalanced optimal transport alignment loss, default=0.001, we recomend to set it higher when the gap between scRNA-seq and spatial datasets is big

Lambda_nb:

training weight for the neighbor loss, default=0.1

Mu:

updating speed of beta by moving average, default=0.5

Resolution:

the expected minimum proportion of known cells, 0 means no constraints on novel cell discovery and 1 means no novel cells, default=0.5

Novel_cell_test:

whether to apply dip test and compute BC co-efficient to adaptively select resolution, default=True

Maxiter:

maximum iteration, default=6000

Miditer1:

after which iteration SPANN starts to conduct UOT alignment, default=2000

Miditer2:

after which iteration SPANN starts to train with neighbor loss, default=5000

Miditer3:

after which iteration SPANN starts to train with adjacency loss, default=4000

Test_freq:

test frequency, default=1000

Returns:

two AnnData objects, adata_source and adata_target, containing the spatial coordinates, latent embeddings, E-scores, predictions and etc.