SPANN Model

class spann.model.SPANN_model(x_dim, z_dim, enc, dec, class_num, device)[source]

SPANN model, using scRNA reference dataset to annotate spatial transcriptome dataset

X_dim:: input dimension for the encoder
Z_dim:: latent feature dimension
Enc:: encoder parameters, for example, [['fc', 1024, 1, 'relu'], ['fc', 16, '', '']]
Dec:: decoder parameters, dec is a set {cm_dec_params, spa_dec_params, rna_dec_params}, each element shapes like enc
Class_num:: number of scRNA-seq cell types
Device:: device on which SPAMM model is aranged

train(source_cm_dl, target_cm_dl, source_sp_ds, target_sp_ds, spatial_coor, test_source_cm_dl, test_target_cm_dl, source_labels, cell_types, lr=0.0002, lambda_recon=2000, lambda_kl=0.5, lambda_spa=0.1, lambda_cd=0.001, lambda_nb=0.1, mu=0.6, temp=0.1, k=20, resolution=0.5, novel_cell_test=True, maxiter=6000, miditer1=2000, miditer2=5000, miditer3=4000, test_freq=1000)[source]

Training and evaluating function for SPANN applying on test spatial dataset with ground truth labels

Source_cm_dl:: torch dataloader of the common genes scRNA-seq data
Target_cm_dl:: torch dataloader of the common genes spatial data
Source_sp_ds:: torch dataset of the scRNA-seq specific genes data
Target_sp_ds:: torch dataset of the spatial specific genes data
Spatial_coor:: pandas dataframe of the raw spatial coordinates, column names ['X','Y']
Test_source_cm_dl:: torch dataloader of the common genes scRNA-seq data for test
Test_target_cm_dl:: torch dataloader of the common genes spatial data for test
Source_labels:: integer labels for scRNA-seq data
Cell_types:: list of all cell types exist in scRNA-seq dataset
Lr:: learning rate for VAE and classifier, default=1e-4
Lamnda_recon:: training weight for the reconstruction loss, default=2000
Lambda_kl:: training weight for the KL-divergence loss, default=0.5
Lambda_spa:: training weight for the adjacency loss, default=0.1, we recomend to set it smaller when the spatial expression pattern is weak
Lambda_cd:: training weight for the unbalanced optimal transport alignment loss, default=0.001, we recomend to set it higher when the gap between scRNA-seq and spatial datasets is big
Lambda_nb:: training weight for the neighbor loss, default=0.1
Mu:: updating speed of beta by moving average, default=0.5
Resolution:: the expected minimum proportion of known cells, 0 means no constraints on novel cell discovery and 1 means no novel cells, default=0.5
Novel_cell_test:: whether to apply dip test and compute BC co-efficient to adaptively select resolution, default=True
Maxiter:: maximum iteration, default=6000
Miditer1:: after which iteration SPANN starts to conduct UOT alignment, default=2000
Miditer2:: after which iteration SPANN starts to train with neighbor loss, default=5000
Miditer3:: after which iteration SPANN starts to train with adjacency loss, default=4000
Test_freq:: test frequency, default=1000
Returns:: two AnnData objects, adata_source and adata_target, containing the spatial coordinates, latent embeddings, E-scores, predictions and etc.

train_eval(source_cm_dl, target_cm_dl, source_sp_ds, target_sp_ds, spatial_coor, test_source_cm_dl, test_target_cm_dl, source_labels, target_labels, cell_types, common_cell_type, lr=0.0002, lambda_recon=2000, lambda_kl=0.5, lambda_spa=0.1, lambda_cd=0.001, lambda_nb=0.1, mu=0.6, temp=0.1, k=20, resolution=0.5, novel_cell_test=True, maxiter=6000, miditer1=2000, miditer2=5000, miditer3=4000, test_freq=1000)[source]

Training and evaluating function for SPANN applying on test spatial dataset with ground truth labels

Source_cm_dl:: torch dataloader of the common genes scRNA-seq data
Target_cm_dl:: torch dataloader of the common genes spatial data
Source_sp_ds:: torch dataset of the scRNA-seq specific genes data
Target_sp_ds:: torch dataset of the spatial specific genes data
Spatial_coor:: pandas dataframe of the raw spatial coordinates, column names ['X','Y']
Test_source_cm_dl:: torch dataloader of the common genes scRNA-seq data for test or validate
Test_target_cm_dl:: torch dataloader of the common genes spatial data for test or validate
Source_labels:: integer labels for scRNA-seq data
Target_labels:: integer labels for spatial data
Cell_types:: list of all cell types exist in scRNA-seq or spatial datasets
Common_cell_types:: list of cell types exist in both scRNA-seq and spatial datasets
Lr:: learning rate for VAE and classifier, default=2e-4
Lamnda_recon:: training weight for the reconstruction loss, default=2000
Lambda_kl:: training weight for the KL-divergence loss, default=0.5
Lambda_spa:: training weight for the adjacency loss, default=0.1, we recomend to set it smaller when the spatial expression pattern is weak
Lambda_cd:: training weight for the unbalanced optimal transport alignment loss, default=0.001, we recomend to set it higher when the gap between scRNA-seq and spatial datasets is big
Lambda_nb:: training weight for the neighbor loss, default=0.1
Mu:: updating speed of beta by moving average, default=0.5
Resolution:: the expected minimum proportion of known cells, 0 means no constraints on novel cell discovery and 1 means no novel cells, default=0.5
Novel_cell_test:: whether to apply dip test and compute BC co-efficient to adaptively select resolution, default=True
Maxiter:: maximum iteration, default=6000
Miditer1:: after which iteration SPANN starts to conduct UOT alignment, default=2000
Miditer2:: after which iteration SPANN starts to train with neighbor loss, default=5000
Miditer3:: after which iteration SPANN starts to train with adjacency loss, default=4000
Test_freq:: test frequency, default=1000
Returns:: two AnnData objects, adata_source and adata_target, containing the spatial coordinates, latent embeddings, E-scores, predictions and etc.