Quick Start

This page shows the traditional first run for CellPainting-Claw.

This page walks through one standard classical profiling run from input data to final profile outputs.

The run shown here does five concrete things in order:

  • check the configured data source and stage a small demo download

  • run CellProfiler extraction so the measurement tables are available

  • merge those tables into one single-cell table

  • use pycytominer to aggregate, annotate, normalize, and feature-select the classical profiles

  • write summary tables and PCA views for quick inspection

DeepProfiler is not part of this page.

All output cells below are real recorded outputs from the current repository demo assets and current runtime.

Install

From the repository root:

conda env create -f environment/cellpainting-claw.environment.yml
conda activate cellpainting-claw
pip install -e .[data-access]

Repository Root

Run the remaining commands on this page from the repository root.

cd /path/to/CellPainting-Claw

Run Variables

Run these three lines once in your terminal from the repository root before the commands below.

  • CONFIG points to the demo project config file

  • DATA_ROOT is the directory for the data-access demo outputs

  • RUN_ROOT is the directory for the classical profiling demo outputs

The later commands reuse these names as $CONFIG, $DATA_ROOT, and $RUN_ROOT. If you prefer, you can replace them with the full paths directly in each command.

CONFIG=configs/project_config.demo.json
DATA_ROOT=demo/workspace/outputs/quick_start_data
RUN_ROOT=demo/workspace/outputs/quick_start_classical

Prepare Input Data

This section prepares input data before classical profiling starts.

If your images are still in the Cell Painting Gallery, the usual sequence is:

  1. check which dataset and source are configured

  2. build a download plan for the subset you want

  3. download that subset into local storage

If your input files are already present locally, you can skip this section and move to the CellProfiler steps below.

The commands below were run against the live Cell Painting Gallery with the demo config.

Inspect Configured Sources

cellpainting-skills run \
  --config "$CONFIG" \
  --skill data-inspect-availability \
  --output-dir "$DATA_ROOT/01_inspect"
Default dataset: cpg0016-jump
Gallery datasets discovered: 42
Sources discovered under the default dataset: 14
Required data-access packages were available in the recorded runtime

Files written in $DATA_ROOT/01_inspect:

  • data_access_summary.json

  • pipeline_skill_manifest.json

Download Plan

cellpainting-skills run \
  --config "$CONFIG" \
  --skill data-plan-download \
  --dataset-id cpg0016-jump \
  --source-id source_4 \
  --max-files 4 \
  --output-dir "$DATA_ROOT/02_plan"
Resolved dataset: cpg0016-jump
Resolved source: source_4
Resolved Gallery prefix: cpg0016-jump/source_4/
Planned steps: 1
File cap in this example plan: 4

Files written in $DATA_ROOT/02_plan:

  • download_plan.json

  • pipeline_skill_manifest.json

Download A Small Local Input Slice

cellpainting-skills run \
  --config "$CONFIG" \
  --skill data-download \
  --dataset-id cpg0016-jump \
  --source-id source_4 \
  --subprefix workspace/load_data_csv/2021_04_26_Batch1/BR00117035 \
  --output-dir "$DATA_ROOT/03_download_small"
Downloaded prefix: cpg0016-jump/source_4/workspace/load_data_csv/2021_04_26_Batch1/BR00117035/
Matched files: 2
Downloaded files: 2
Downloaded filenames: load_data.csv, load_data_with_illum.csv

Files written in $DATA_ROOT/03_download_small:

  • downloads/download_manifest.json

  • downloads/load_data.csv

  • downloads/load_data_with_illum.csv

This bounded download only demonstrates how remote Gallery data can be staged locally. The classical profiling steps below continue from the repository demo assets, which already include a minimal CellProfiler result set.

Measurement Tables

This step makes the CellProfiler measurement tables available for the rest of the classical profiling path.

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cp-extract-measurements \
  --output-dir "$RUN_ROOT/01_measurements"
Demo mode: bundled measurement tables were reused
Exposed tables: Image.csv, Cells.csv, Nuclei.csv
Public skill entrypoint: cp-extract-measurements

Files available in this step:

  • Image.csv

  • Cells.csv

  • Nuclei.csv

  • pipeline_skill_manifest.json

In the public demo checkout, these tables come from the bundled demo backend.

In this public demo checkout, the original profiling backend script is not packaged. For the recorded demo run, this skill therefore reuses the bundled measurement tables instead of rerunning CellProfiler. In a user-owned workspace, the same skill remains the public entrypoint for the measurement stage.

Single-Cell Table

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cp-build-single-cell-table \
  --image-csv-path demo/backend/profiling_backend/outputs/cellprofiler/Image.csv \
  --object-table-path demo/backend/profiling_backend/outputs/cellprofiler/Cells.csv \
  --object-table Cells \
  --output-dir "$RUN_ROOT/02_single_cell"
Single-cell rows written: 4
Columns written: 16
Object table used for the merge: Cells

Files written in $RUN_ROOT/02_single_cell:

  • single_cell.csv.gz

  • pipeline_skill_manifest.json

This step merges the CellProfiler tables into one single-cell feature table. The pycytominer steps below use that merged table as their input.

Classical Profiles

Aggregate Profiles

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-aggregate-profiles \
  --single-cell-path "$RUN_ROOT/02_single_cell/single_cell.csv.gz" \
  --output-dir "$RUN_ROOT/03_cyto_aggregate"
Aggregated profile rows: 2
Aggregated profile columns: 14

Files written in $RUN_ROOT/03_cyto_aggregate:

  • pycytominer/aggregated.parquet

  • pipeline_skill_manifest.json

Annotate Profiles

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-annotate-profiles \
  --aggregated-path "$RUN_ROOT/03_cyto_aggregate/pycytominer/aggregated.parquet" \
  --output-dir "$RUN_ROOT/04_cyto_annotate"
Annotated profile rows: 2
Annotated profile columns: 17

Files written in $RUN_ROOT/04_cyto_annotate:

  • pycytominer/annotated.parquet

  • pipeline_skill_manifest.json

Normalize Profiles

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-normalize-profiles \
  --annotated-path "$RUN_ROOT/04_cyto_annotate/pycytominer/annotated.parquet" \
  --output-dir "$RUN_ROOT/05_cyto_normalize"
Normalized profile rows: 2
Normalized profile columns: 17

Files written in $RUN_ROOT/05_cyto_normalize:

  • pycytominer/normalized.parquet

  • pipeline_skill_manifest.json

Select Profile Features

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-select-profile-features \
  --normalized-path "$RUN_ROOT/05_cyto_normalize/pycytominer/normalized.parquet" \
  --output-dir "$RUN_ROOT/06_cyto_select"
Feature-selected profile rows: 2
Feature-selected profile columns: 12

Files written in $RUN_ROOT/06_cyto_select:

  • pycytominer/feature_selected.parquet

  • pipeline_skill_manifest.json

These four pycytominer stages turn the merged single-cell measurements into a cleaned well-level profile table: first aggregate, then annotate, normalize, and finally select features.

Summary Outputs

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-summarize-classical-profiles \
  --feature-selected-path "$RUN_ROOT/06_cyto_select/pycytominer/feature_selected.parquet" \
  --output-dir "$RUN_ROOT/07_cyto_summary"
Summary rows represented: 2
Features retained at this stage: 6
Top variable features reported: 6
PCA components written: 2

Files written in $RUN_ROOT/07_cyto_summary:

  • profile_summary.json

  • well_metadata_summary.csv

  • top_variable_features.csv

  • pca_coordinates.csv

  • pca_plot.png

  • pipeline_skill_manifest.json

This final step turns the processed profile table into files that are easier to inspect directly: a compact summary, metadata summaries, top-variable features, and PCA outputs.

Result Files

After this Quick Start, the most useful files to inspect are:

  • data_access_summary.json for the configured source inventory

  • download_plan.json for the resolved Gallery request

  • single_cell.csv.gz for the merged single-cell measurements table

  • aggregated.parquet, annotated.parquet, normalized.parquet, and feature_selected.parquet for the pycytominer stages

  • profile_summary.json, well_metadata_summary.csv, top_variable_features.csv, and pca_plot.png for the final review layer

Next Pages

Continue with these pages after the first run: