Quick Start¶

This page shows the traditional first run for CellPainting-Claw.

This page walks through one standard classical profiling run from input data to final profile outputs.

The run shown here does five concrete things in order:

check the configured data source and stage a small demo download
run CellProfiler extraction so the measurement tables are available
merge those tables into one single-cell table
use pycytominer to aggregate, annotate, normalize, and feature-select the classical profiles
write summary tables and PCA views for quick inspection

DeepProfiler is not part of this page.

All output cells below are real recorded outputs from the current repository demo assets and current runtime.

Install¶

From the repository root:

conda env create -f environment/cellpainting-claw.environment.yml
conda activate cellpainting-claw
pip install -e .[data-access]

Repository Root¶

Run the remaining commands on this page from the repository root.

cd /path/to/CellPainting-Claw

Run Variables¶

Run these three lines once in your terminal from the repository root before the commands below.

CONFIG points to the demo project config file
DATA_ROOT is the directory for the data-access demo outputs
RUN_ROOT is the directory for the classical profiling demo outputs

The later commands reuse these names as $CONFIG, $DATA_ROOT, and $RUN_ROOT. If you prefer, you can replace them with the full paths directly in each command.

CONFIG=configs/project_config.demo.json
DATA_ROOT=demo/workspace/outputs/quick_start_data
RUN_ROOT=demo/workspace/outputs/quick_start_classical

Prepare Input Data¶

This section prepares input data before classical profiling starts.

If your images are still in the Cell Painting Gallery, the usual sequence is:

check which dataset and source are configured
build a download plan for the subset you want
download that subset into local storage

If your input files are already present locally, you can skip this section and move to the CellProfiler steps below.

The commands below were run against the live Cell Painting Gallery with the demo config.

Inspect Configured Sources¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill data-inspect-availability \
  --output-dir "$DATA_ROOT/01_inspect"

Default dataset: cpg0016-jump
Gallery datasets discovered: 42
Sources discovered under the default dataset: 14
Required data-access packages were available in the recorded runtime

Files written in $DATA_ROOT/01_inspect:

data_access_summary.json
pipeline_skill_manifest.json

Download Plan¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill data-plan-download \
  --dataset-id cpg0016-jump \
  --source-id source_4 \
  --max-files 4 \
  --output-dir "$DATA_ROOT/02_plan"

Resolved dataset: cpg0016-jump
Resolved source: source_4
Resolved Gallery prefix: cpg0016-jump/source_4/
Planned steps: 1
File cap in this example plan: 4

Files written in $DATA_ROOT/02_plan:

download_plan.json
pipeline_skill_manifest.json

Download A Small Local Input Slice¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill data-download \
  --dataset-id cpg0016-jump \
  --source-id source_4 \
  --subprefix workspace/load_data_csv/2021_04_26_Batch1/BR00117035 \
  --output-dir "$DATA_ROOT/03_download_small"

Downloaded prefix: cpg0016-jump/source_4/workspace/load_data_csv/2021_04_26_Batch1/BR00117035/
Matched files: 2
Downloaded files: 2
Downloaded filenames: load_data.csv, load_data_with_illum.csv

Files written in $DATA_ROOT/03_download_small:

downloads/download_manifest.json
downloads/load_data.csv
downloads/load_data_with_illum.csv

This bounded download only demonstrates how remote Gallery data can be staged locally. The classical profiling steps below continue from the repository demo assets, which already include a minimal CellProfiler result set.

Measurement Tables¶

This step makes the CellProfiler measurement tables available for the rest of the classical profiling path.

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cp-extract-measurements \
  --output-dir "$RUN_ROOT/01_measurements"

Demo mode: bundled measurement tables were reused
Exposed tables: Image.csv, Cells.csv, Nuclei.csv
Public skill entrypoint: cp-extract-measurements

Files available in this step:

Image.csv
Cells.csv
Nuclei.csv
pipeline_skill_manifest.json

In the public demo checkout, these tables come from the bundled demo backend.

In this public demo checkout, the original profiling backend script is not packaged. For the recorded demo run, this skill therefore reuses the bundled measurement tables instead of rerunning CellProfiler. In a user-owned workspace, the same skill remains the public entrypoint for the measurement stage.

Single-Cell Table¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cp-build-single-cell-table \
  --image-csv-path demo/backend/profiling_backend/outputs/cellprofiler/Image.csv \
  --object-table-path demo/backend/profiling_backend/outputs/cellprofiler/Cells.csv \
  --object-table Cells \
  --output-dir "$RUN_ROOT/02_single_cell"

Single-cell rows written: 4
Columns written: 16
Object table used for the merge: Cells

Files written in $RUN_ROOT/02_single_cell:

single_cell.csv.gz
pipeline_skill_manifest.json

This step merges the CellProfiler tables into one single-cell feature table. The pycytominer steps below use that merged table as their input.

Classical Profiles¶

Aggregate Profiles¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-aggregate-profiles \
  --single-cell-path "$RUN_ROOT/02_single_cell/single_cell.csv.gz" \
  --output-dir "$RUN_ROOT/03_cyto_aggregate"

Aggregated profile rows: 2
Aggregated profile columns: 14

Files written in $RUN_ROOT/03_cyto_aggregate:

pycytominer/aggregated.parquet
pipeline_skill_manifest.json

Annotate Profiles¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-annotate-profiles \
  --aggregated-path "$RUN_ROOT/03_cyto_aggregate/pycytominer/aggregated.parquet" \
  --output-dir "$RUN_ROOT/04_cyto_annotate"

Annotated profile rows: 2
Annotated profile columns: 17

Files written in $RUN_ROOT/04_cyto_annotate:

pycytominer/annotated.parquet
pipeline_skill_manifest.json

Normalize Profiles¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-normalize-profiles \
  --annotated-path "$RUN_ROOT/04_cyto_annotate/pycytominer/annotated.parquet" \
  --output-dir "$RUN_ROOT/05_cyto_normalize"

Normalized profile rows: 2
Normalized profile columns: 17

Files written in $RUN_ROOT/05_cyto_normalize:

pycytominer/normalized.parquet
pipeline_skill_manifest.json

Select Profile Features¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-select-profile-features \
  --normalized-path "$RUN_ROOT/05_cyto_normalize/pycytominer/normalized.parquet" \
  --output-dir "$RUN_ROOT/06_cyto_select"

Feature-selected profile rows: 2
Feature-selected profile columns: 12

Files written in $RUN_ROOT/06_cyto_select:

pycytominer/feature_selected.parquet
pipeline_skill_manifest.json

These four pycytominer stages turn the merged single-cell measurements into a cleaned well-level profile table: first aggregate, then annotate, normalize, and finally select features.

Summary Outputs¶

cellpainting-skills run \
  --config "$CONFIG" \
  --skill cyto-summarize-classical-profiles \
  --feature-selected-path "$RUN_ROOT/06_cyto_select/pycytominer/feature_selected.parquet" \
  --output-dir "$RUN_ROOT/07_cyto_summary"

Summary rows represented: 2
Features retained at this stage: 6
Top variable features reported: 6
PCA components written: 2

Files written in $RUN_ROOT/07_cyto_summary:

profile_summary.json
well_metadata_summary.csv
top_variable_features.csv
pca_coordinates.csv
pca_plot.png
pipeline_skill_manifest.json

This final step turns the processed profile table into files that are easier to inspect directly: a compact summary, metadata summaries, top-variable features, and PCA outputs.

Result Files¶

After this Quick Start, the most useful files to inspect are:

data_access_summary.json for the configured source inventory
download_plan.json for the resolved Gallery request
single_cell.csv.gz for the merged single-cell measurements table
aggregated.parquet, annotated.parquet, normalized.parquet, and feature_selected.parquet for the pycytominer stages
profile_summary.json, well_metadata_summary.csv, top_variable_features.csv, and pca_plot.png for the final review layer

Next Pages¶

Continue with these pages after the first run:

CellPainting-Skills for the full skill catalog
Command-Line Interface for direct CLI usage
OpenClaw for agent-mediated use of the same skills