scrna3/6 Jupyter Notebook lamindata

Query artifacts

Here, we’ll query artifacts and inspect their metadata.

This guide can be skipped if you are only interested in how to leverage the overall collection.

import lamindb as ln
import bionty as bt

ln.track("agayZTonayqA0000")
Hide code cell output
→ connected lamindb: testuser1/test-scrna
→ notebook imports: bionty==0.51.1 lamindb==0.76.11
→ created Transform('agayZTon'), started new Run('6wWV9UyK') at 2024-10-04 09:11:20 UTC

Query artifacts by provenance metadata

users = ln.User.lookup()
ln.Transform.filter(created_by=users.testuser1).search("scrna").df()
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
1 Nv48yAceNSh80000 None True scRNA-seq scrna.ipynb None notebook None None None None None 2024-10-04 09:10:05.955177+00:00 1
2 ManDYgmftZ8C0000 None True Standardize and append a batch of data scrna2.ipynb None notebook None None None None None 2024-10-04 09:11:02.933837+00:00 1
3 agayZTonayqA0000 None True Query artifacts scrna3.ipynb None notebook None None None None None 2024-10-04 09:11:20.690396+00:00 1
transform = ln.Transform.get(uid="Nv48yAceNSh80000")
ln.Artifact.filter(transform=transform).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 dRTEhUtzYjyWKsZc0000 None True Human immune cells from Conde22 None .h5ad dataset 57612943 t_YJQpYrAyAGhs7Ir68zKj None 1648 sha1-fl AnnData 1 True 1 1 1 2024-10-04 09:10:55.651032+00:00 1

Query artifacts by biological metadata

organism = bt.Organism.lookup()
tissues = bt.Tissue.lookup()
query = ln.Artifact.filter(
    organisms=organism.human,
    tissues=tissues.bone_marrow,
)
query.df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id

Inspect artifact metadata

query_set = ln.Artifact.filter().all()
artifact1, artifact2 = query_set[0], query_set[1]
artifact1.describe()
Hide code cell output
Artifact(uid='dRTEhUtzYjyWKsZc0000', is_latest=True, description='Human immune cells from Conde22', suffix='.h5ad', type='dataset', size=57612943, hash='t_YJQpYrAyAGhs7Ir68zKj', n_observations=1648, _hash_type='sha1-fl', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-10-04 09:10:55 UTC)
  Provenance
    .storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna'
    .transform = 'scRNA-seq'
    .run = 2024-10-04 09:10:05 UTC
    .created_by = 'testuser1'
  Usage
    .input_of_runs = 2024-10-04 09:11:02 UTC
  Labels
    .tissues = 'blood', 'thoracic lymph node', 'spleen', 'lung', 'mesenteric lymph node', 'lamina propria', 'liver', 'jejunal epithelium', 'omentum', 'bone marrow', ...
    .cell_types = 'classical monocyte', 'T follicular helper cell', 'memory B cell', 'alveolar macrophage', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'effector memory CD8-positive, alpha-beta T cell, terminally differentiated', 'alpha-beta T cell', 'CD4-positive helper T cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'macrophage', ...
    .experimental_factors = '10x 3' v3', '10x 5' v2', '10x 5' v1'
    .ulabels = 'D496', '621B', 'A29', 'A36', 'A35', '637C', 'A52', 'A37', 'D503', '640C', ...
  Features
    'assay' = '10x 3' v3', '10x 5' v2', '10x 5' v1'
    'cell_type' = 'classical monocyte', 'T follicular helper cell', 'memory B cell', 'alveolar macrophage', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'effector memory CD8-positive, alpha-beta T cell, terminally differentiated', 'alpha-beta T cell', 'CD4-positive helper T cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'macrophage', ...
    'donor' = 'D496', '621B', 'A29', 'A36', 'A35', '637C', 'A52', 'A37', 'D503', '640C', ...
    'tissue' = 'blood', 'thoracic lymph node', 'spleen', 'lung', 'mesenteric lymph node', 'lamina propria', 'liver', 'jejunal epithelium', 'omentum', 'bone marrow', ...
  Feature sets
    'var' = 'MIR1302-2HG', 'FAM138A', 'OR4F5', 'None', 'OR4F29', 'OR4F16', 'LINC01409', 'FAM87B', 'LINC01128', 'LINC00115', 'FAM41C'
    'obs' = 'donor', 'tissue', 'cell_type', 'assay'
artifact1.view_lineage()
_images/400164df522a210ed5827da170385400f1280c8059dab8ab33a85a680310c6ad.svg
artifact2.describe()
Hide code cell output
Artifact(uid='3y74NLUh3y3T32S20000', is_latest=True, description='10x reference adata', suffix='.h5ad', type='dataset', size=853388, hash='mIKkPaZAA3EdtZLeFuWNEg', n_observations=70, _hash_type='md5', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-10-04 09:11:16 UTC)
  Provenance
    .storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna'
    .transform = 'Standardize and append a batch of data'
    .run = 2024-10-04 09:11:02 UTC
    .created_by = 'testuser1'
  Labels
    .cell_types = 'B cell, CD19-positive', 'dendritic cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'cytotoxic T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'CD14-positive, CD16-negative classical monocyte', 'CD38-positive naive B cell', 'CD38-high pre-BCR positive cell'
  Features
    'cell_type' = 'B cell, CD19-positive', 'dendritic cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'cytotoxic T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'CD14-positive, CD16-negative classical monocyte', 'CD38-positive naive B cell', 'CD38-high pre-BCR positive cell'
  Feature sets
    'var' = 'TLE5', 'S1PR4', 'CD164', 'SMIM24', 'DCAF10', 'RAB13', 'TPM3', 'HES4', 'HAX1', 'GSTK1', 'SNX2', 'GTF3C6', 'ADD3', 'ACAA1', 'MATK', 'ZYX', 'JAML', 'CD3E', 'TNFRSF4', 'EXOG'
    'obs' = 'cell_type'
artifact2.view_lineage()
_images/8e4ec8ab97a797177c005ea773b89b37aa5456539c099b16ae0a786b12c178fd.svg

Compare features

Here we compute shared genes:

artifact1_genes = artifact1.features["var"]
artifact2_genes = artifact2.features["var"]

shared_genes = artifact1_genes & artifact2_genes
len(shared_genes)
Hide code cell output
749
shared_genes.list("symbol")[:10]
Hide code cell output
['HES4',
 'TNFRSF4',
 'SSU72',
 'PARK7',
 'RBP7',
 'SRM',
 'MAD2L2',
 'AGTRAP',
 'TNFRSF1B',
 'EFHD2']

Compare cell types

artifact1_celltypes = artifact1.cell_types.all()
artifact2_celltypes = artifact2.cell_types.all()

shared_celltypes = artifact1_celltypes & artifact2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
Hide code cell output
['CD16-positive, CD56-dim natural killer cell, human']

Load the individual artifacts

We could either load the artifacts into memory or access them in backed mode through .open() to lazily load their content.

Let’s load them into memory:

adata1 = artifact1.load()
adata2 = artifact2.load()

We can now subset the two collections by shared cell types:

adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]
adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]