New AI Resource

Releasing the GDPx2 Functional Genomics Dataset

934 GB of RNA-seq data exploring dose-dependent drug responses of primary human cell lines

Today we're excited to make public the GDPx2 dataset! GDPx2 is the second functional genomics dataset released by Ginkgo Datapoints. You can access all of our current releases on the Ginkgo Datapoints data portal.

What's in the dataset?

GDPx2 includes transcriptional profiles for 4 human cell lines:

Melanocytes 

Aortic smooth muscle cells

Dermal fibroblasts

Skeletal muscle myoblasts

Each cell line has undergone 15 different treatments (10 test compounds and 5 controls) at a range of 6 concentrations. The effect of each drug treatment was measured by collecting an RNA-seq profile of about 2 M reads using our DRUG-seq assay for high-throughput transcriptomics.

Treatments

Controls

Corticosterone

Idarubicin

Mitoxantrone

Beclomethasone

Cycloheximide

Thapsigargin

Calcimycin

Rigosertib

Nocodazole

Alisertib

DMSO

Dexamethasone

Trichostatin A

Brefeldin A

Dabrafenib

The cell-treatment pairs in GDPx2 represent only a portion of a much larger dataset characterizing 85 diverse pharmacologically active small molecules. If you find GDPx2 useful, the full 4 TB of data covering 12,216 total transcriptional profiles can be requested under the appropriate license and terms for research or commercial use. 

We anticipate that GDPx2 will be of interest for teams exploring the range of pharmacologically relevant transcriptional responses that human cells exhibit and the relationships between them. Use GDPx2 for AI/ML-assisted target identification or exploration and modeling of transcriptional co-regulation in heterologous settings.

Why do we release datasets?

The AI stack for biotech is advancing quickly. A community of developers and service providers is hard at work to bring online new tools for curating data, training models, running inference, and deriving biologically meaningful results. Much of this work is happening in public, supported by cutting-edge academic research and open-source efforts. By aligning our offerings with and contributing to emerging common standards, we seek to share our work in a way that benefits everyone.

At Ginkgo Datapoints, our role in the AI ecosystem is data provider. The automation infrastructure of the Ginkgo foundry allows us to efficiently generate large datasets for functional genomics and other applications. The GDPx2 dataset is just a small sample of the scale of data we can bring to your AI project. If you're looking for a large dataset tailored to your application - get in touch!