SPANN Model
- class spann.model.SPANN_model(x_dim, z_dim, enc, dec, class_num, device)[source]
SPANN model, using scRNA reference dataset to annotate spatial transcriptome dataset
- X_dim:
input dimension for the encoder
- Z_dim:
latent feature dimension
- Enc:
encoder parameters, for example, [['fc', 1024, 1, 'relu'], ['fc', 16, '', '']]
- Dec:
decoder parameters, dec is a set {cm_dec_params, spa_dec_params, rna_dec_params}, each element shapes like enc
- Class_num:
number of scRNA-seq cell types
- Device:
device on which SPAMM model is aranged
- train(source_cm_dl, target_cm_dl, source_sp_ds, target_sp_ds, spatial_coor, test_source_cm_dl, test_target_cm_dl, source_labels, cell_types, lr=0.0002, lambda_recon=2000, lambda_kl=0.5, lambda_spa=0.1, lambda_cd=0.001, lambda_nb=0.1, mu=0.6, temp=0.1, k=20, resolution=0.5, novel_cell_test=True, maxiter=6000, miditer1=2000, miditer2=5000, miditer3=4000, test_freq=1000)[source]
Training and evaluating function for SPANN applying on test spatial dataset with ground truth labels
- Source_cm_dl:
torch dataloader of the common genes scRNA-seq data
- Target_cm_dl:
torch dataloader of the common genes spatial data
- Source_sp_ds:
torch dataset of the scRNA-seq specific genes data
- Target_sp_ds:
torch dataset of the spatial specific genes data
- Spatial_coor:
pandas dataframe of the raw spatial coordinates, column names ['X','Y']
- Test_source_cm_dl:
torch dataloader of the common genes scRNA-seq data for test
- Test_target_cm_dl:
torch dataloader of the common genes spatial data for test
- Source_labels:
integer labels for scRNA-seq data
- Cell_types:
list of all cell types exist in scRNA-seq dataset
- Lr:
learning rate for VAE and classifier, default=1e-4
- Lamnda_recon:
training weight for the reconstruction loss, default=2000
- Lambda_kl:
training weight for the KL-divergence loss, default=0.5
- Lambda_spa:
training weight for the adjacency loss, default=0.1, we recomend to set it smaller when the spatial expression pattern is weak
- Lambda_cd:
training weight for the unbalanced optimal transport alignment loss, default=0.001, we recomend to set it higher when the gap between scRNA-seq and spatial datasets is big
- Lambda_nb:
training weight for the neighbor loss, default=0.1
- Mu:
updating speed of beta by moving average, default=0.5
- Resolution:
the expected minimum proportion of known cells, 0 means no constraints on novel cell discovery and 1 means no novel cells, default=0.5
- Novel_cell_test:
whether to apply dip test and compute BC co-efficient to adaptively select resolution, default=True
- Maxiter:
maximum iteration, default=6000
- Miditer1:
after which iteration SPANN starts to conduct UOT alignment, default=2000
- Miditer2:
after which iteration SPANN starts to train with neighbor loss, default=5000
- Miditer3:
after which iteration SPANN starts to train with adjacency loss, default=4000
- Test_freq:
test frequency, default=1000
- Returns:
two AnnData objects, adata_source and adata_target, containing the spatial coordinates, latent embeddings, E-scores, predictions and etc.
- train_eval(source_cm_dl, target_cm_dl, source_sp_ds, target_sp_ds, spatial_coor, test_source_cm_dl, test_target_cm_dl, source_labels, target_labels, cell_types, common_cell_type, lr=0.0002, lambda_recon=2000, lambda_kl=0.5, lambda_spa=0.1, lambda_cd=0.001, lambda_nb=0.1, mu=0.6, temp=0.1, k=20, resolution=0.5, novel_cell_test=True, maxiter=6000, miditer1=2000, miditer2=5000, miditer3=4000, test_freq=1000)[source]
Training and evaluating function for SPANN applying on test spatial dataset with ground truth labels
- Source_cm_dl:
torch dataloader of the common genes scRNA-seq data
- Target_cm_dl:
torch dataloader of the common genes spatial data
- Source_sp_ds:
torch dataset of the scRNA-seq specific genes data
- Target_sp_ds:
torch dataset of the spatial specific genes data
- Spatial_coor:
pandas dataframe of the raw spatial coordinates, column names ['X','Y']
- Test_source_cm_dl:
torch dataloader of the common genes scRNA-seq data for test or validate
- Test_target_cm_dl:
torch dataloader of the common genes spatial data for test or validate
- Source_labels:
integer labels for scRNA-seq data
- Target_labels:
integer labels for spatial data
- Cell_types:
list of all cell types exist in scRNA-seq or spatial datasets
- Common_cell_types:
list of cell types exist in both scRNA-seq and spatial datasets
- Lr:
learning rate for VAE and classifier, default=2e-4
- Lamnda_recon:
training weight for the reconstruction loss, default=2000
- Lambda_kl:
training weight for the KL-divergence loss, default=0.5
- Lambda_spa:
training weight for the adjacency loss, default=0.1, we recomend to set it smaller when the spatial expression pattern is weak
- Lambda_cd:
training weight for the unbalanced optimal transport alignment loss, default=0.001, we recomend to set it higher when the gap between scRNA-seq and spatial datasets is big
- Lambda_nb:
training weight for the neighbor loss, default=0.1
- Mu:
updating speed of beta by moving average, default=0.5
- Resolution:
the expected minimum proportion of known cells, 0 means no constraints on novel cell discovery and 1 means no novel cells, default=0.5
- Novel_cell_test:
whether to apply dip test and compute BC co-efficient to adaptively select resolution, default=True
- Maxiter:
maximum iteration, default=6000
- Miditer1:
after which iteration SPANN starts to conduct UOT alignment, default=2000
- Miditer2:
after which iteration SPANN starts to train with neighbor loss, default=5000
- Miditer3:
after which iteration SPANN starts to train with adjacency loss, default=4000
- Test_freq:
test frequency, default=1000
- Returns:
two AnnData objects, adata_source and adata_target, containing the spatial coordinates, latent embeddings, E-scores, predictions and etc.