New AI Resource

Releasing the GDPx3 Cell Painting Dataset

High-content images of human primary cells for functional genomics applications

Today we're excited to make public GDPx3, the third functional genomics dataset released by Ginkgo Datapoints. You can access all of our current releases on the Ginkgo Datapoints data portal.

What's in the dataset?

GDPx3 contains images of human cells that have been stained with fluorescent dyes to label the nucleus, mitochondria, cytoskeleton and other major cellular structures. This release includes images of 3 varieties of human primary cells and the A549 human lung adenocarcinoma cell line.

  • Aortic endothelial cells

  • Aortic smooth muscle cells (VSMCs)

  • Dermal fibroblasts

  • A549 cells

The cells were treated with pharmacologically active compounds from the LOPAC 1280 library at a range of concentrations to allow resolution of dose-dependent effects. The dataset includes technical replicates and compounds with known cytophysiological effects to allow comparisons within and across plates.

More details of the composition of the dataset including the compounds included, the cell painting protocol used, and quality control metrics are provided with the data download. The 13 compounds featured in GDPx3 are part of a larger 46-compound set that is available for licensing. Contact us to learn more about larger and custom datasets for your project.

220 GB of imaging data

13

pharmacologically active compounds

2

time points

Multiple

concentrations

Multiple

technical replicates

Phenotypic profiles of pharmacologically active compounds

Example images from the GDPx3 dataset illustrating the range of phenotypes captured by cell painting.

  • Tetrandrine is a calcium channel blocker.

  • Brefeldin A is an ER-Golgi trafficking inhibitor.

  • Latrunculin Breaks in the actin cytoskeleton and causes cell death.

  • CA-074-ME is an anti-inflammatory known to cause abundant Golgi staining.

What can you do with cell painting data?

  • Mechanism of Action (MoA) Prediction. Compare morphological profiles of unknown compounds to those of known ones to infer shared mechanisms.

  • Phenotypic Screening. Screen large libraries of compounds and identify those with specific desirable phenotypic changes (e.g. cell differentiation, apoptosis)

  • Toxicity Prediction. Detect early signs of cytotoxicity or off-target effects that correlate with cell stress, death, or abnormal morphology.

  • Drug Repurposing. Match morphological profiles of known drugs with different indications.

  • Machine Learning Applications. Train models to classify compounds, predict MoA, or cluster phenotypes.

  • Target Deconvolution. Correlate morphological profiles with known gene perturbations (e.g., CRISPR or RNAi datasets).

  • Pathway Analysis & Biomarker Discovery. Link phenotypes to signaling pathways and potential biomarkers.

  • Dealer's Choice. As AI continues to drive rapid innovation, we're excited to see totally novel use-cases for large biological datasets.

Why do we release datasets?

The AI stack for biotech is advancing quickly. A community of developers and service providers is hard at work to bring online new tools for curating data, training models, running inference, and deriving biologically meaningful results. Much of this work is happening in public, supported by cutting-edge academic research and open-source efforts. By aligning our offerings with and contributing to emerging common standards, we seek to share our work in a way that benefits everyone.

Ginkgo Datapoints is the next-gen CRO supercharging the new era of discovery. We bring together AI, high throughput automation, and deep cell engineering expertise to generate large datasets for high-content imaging, functional genomics and other applications. The GDPx3 dataset is just a small sample of the types of data we generate for our partners. If you're looking for AI/ML-ready data at any scale tailored to your application - get in touch!