{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quick Start\n", "\n", "This page shows the **traditional first run** for CellPainting-Claw.\n", "\n", "This page walks through one standard classical profiling run from input data to final profile outputs.\n", "\n", "The run shown here does five concrete things in order:\n", "\n", "- check the configured data source and stage a small demo download\n", "- run CellProfiler extraction so the measurement tables are available\n", "- merge those tables into one single-cell table\n", "- use pycytominer to aggregate, annotate, normalize, and feature-select the classical profiles\n", "- write summary tables and PCA views for quick inspection\n", "\n", "DeepProfiler is not part of this page.\n", "\n", "All output cells below are real recorded outputs from the current repository demo assets and current runtime.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install\n", "\n", "From the repository root:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conda env create -f environment/cellpainting-claw.environment.yml\n", "conda activate cellpainting-claw\n", "pip install -e .[data-access]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Repository Root\n", "\n", "Run the remaining commands on this page from the repository root.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd /path/to/CellPainting-Claw\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Variables\n", "\n", "Run these three lines once in your terminal from the repository root before the commands below.\n", "\n", "- `CONFIG` points to the demo project config file\n", "- `DATA_ROOT` is the directory for the data-access demo outputs\n", "- `RUN_ROOT` is the directory for the classical profiling demo outputs\n", "\n", "The later commands reuse these names as `$CONFIG`, `$DATA_ROOT`, and `$RUN_ROOT`. If you prefer, you can replace them with the full paths directly in each command.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CONFIG=configs/project_config.demo.json\n", "DATA_ROOT=demo/workspace/outputs/quick_start_data\n", "RUN_ROOT=demo/workspace/outputs/quick_start_classical\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare Input Data\n", "\n", "This section prepares input data before classical profiling starts.\n", "\n", "If your images are still in the Cell Painting Gallery, the usual sequence is:\n", "\n", "1. check which dataset and source are configured\n", "2. build a download plan for the subset you want\n", "3. download that subset into local storage\n", "\n", "If your input files are already present locally, you can skip this section and move to the CellProfiler steps below.\n", "\n", "The commands below were run against the live Cell Painting Gallery with the demo config.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect Configured Sources\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Default dataset: cpg0016-jump\n", "Gallery datasets discovered: 42\n", "Sources discovered under the default dataset: 14\n", "Required data-access packages were available in the recorded runtime\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill data-inspect-availability \\\n", " --output-dir \"$DATA_ROOT/01_inspect\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$DATA_ROOT/01_inspect`:\n", "\n", "- `data_access_summary.json`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download Plan\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Resolved dataset: cpg0016-jump\n", "Resolved source: source_4\n", "Resolved Gallery prefix: cpg0016-jump/source_4/\n", "Planned steps: 1\n", "File cap in this example plan: 4\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill data-plan-download \\\n", " --dataset-id cpg0016-jump \\\n", " --source-id source_4 \\\n", " --max-files 4 \\\n", " --output-dir \"$DATA_ROOT/02_plan\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$DATA_ROOT/02_plan`:\n", "\n", "- `download_plan.json`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download A Small Local Input Slice\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloaded prefix: cpg0016-jump/source_4/workspace/load_data_csv/2021_04_26_Batch1/BR00117035/\n", "Matched files: 2\n", "Downloaded files: 2\n", "Downloaded filenames: load_data.csv, load_data_with_illum.csv\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill data-download \\\n", " --dataset-id cpg0016-jump \\\n", " --source-id source_4 \\\n", " --subprefix workspace/load_data_csv/2021_04_26_Batch1/BR00117035 \\\n", " --output-dir \"$DATA_ROOT/03_download_small\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$DATA_ROOT/03_download_small`:\n", "\n", "- `downloads/download_manifest.json`\n", "- `downloads/load_data.csv`\n", "- `downloads/load_data_with_illum.csv`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This bounded download only demonstrates how remote Gallery data can be staged locally. The classical profiling steps below continue from the repository demo assets, which already include a minimal CellProfiler result set.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Measurement Tables\n", "\n", "This step makes the CellProfiler measurement tables available for the rest of the classical profiling path.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Demo mode: bundled measurement tables were reused\n", "Exposed tables: Image.csv, Cells.csv, Nuclei.csv\n", "Public skill entrypoint: cp-extract-measurements\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cp-extract-measurements \\\n", " --output-dir \"$RUN_ROOT/01_measurements\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files available in this step:\n", "\n", "- `Image.csv`\n", "- `Cells.csv`\n", "- `Nuclei.csv`\n", "- `pipeline_skill_manifest.json`\n", "\n", "In the public demo checkout, these tables come from the bundled demo backend.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this public demo checkout, the original profiling backend script is not packaged. For the recorded demo run, this skill therefore reuses the bundled measurement tables instead of rerunning CellProfiler. In a user-owned workspace, the same skill remains the public entrypoint for the measurement stage.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Single-Cell Table\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Single-cell rows written: 4\n", "Columns written: 16\n", "Object table used for the merge: Cells\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cp-build-single-cell-table \\\n", " --image-csv-path demo/backend/profiling_backend/outputs/cellprofiler/Image.csv \\\n", " --object-table-path demo/backend/profiling_backend/outputs/cellprofiler/Cells.csv \\\n", " --object-table Cells \\\n", " --output-dir \"$RUN_ROOT/02_single_cell\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$RUN_ROOT/02_single_cell`:\n", "\n", "- `single_cell.csv.gz`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This step merges the CellProfiler tables into one single-cell feature table. The pycytominer steps below use that merged table as their input.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classical Profiles\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregate Profiles\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Aggregated profile rows: 2\n", "Aggregated profile columns: 14\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cyto-aggregate-profiles \\\n", " --single-cell-path \"$RUN_ROOT/02_single_cell/single_cell.csv.gz\" \\\n", " --output-dir \"$RUN_ROOT/03_cyto_aggregate\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$RUN_ROOT/03_cyto_aggregate`:\n", "\n", "- `pycytominer/aggregated.parquet`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Annotate Profiles\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Annotated profile rows: 2\n", "Annotated profile columns: 17\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cyto-annotate-profiles \\\n", " --aggregated-path \"$RUN_ROOT/03_cyto_aggregate/pycytominer/aggregated.parquet\" \\\n", " --output-dir \"$RUN_ROOT/04_cyto_annotate\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$RUN_ROOT/04_cyto_annotate`:\n", "\n", "- `pycytominer/annotated.parquet`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normalize Profiles\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Normalized profile rows: 2\n", "Normalized profile columns: 17\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cyto-normalize-profiles \\\n", " --annotated-path \"$RUN_ROOT/04_cyto_annotate/pycytominer/annotated.parquet\" \\\n", " --output-dir \"$RUN_ROOT/05_cyto_normalize\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$RUN_ROOT/05_cyto_normalize`:\n", "\n", "- `pycytominer/normalized.parquet`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select Profile Features\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Feature-selected profile rows: 2\n", "Feature-selected profile columns: 12\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cyto-select-profile-features \\\n", " --normalized-path \"$RUN_ROOT/05_cyto_normalize/pycytominer/normalized.parquet\" \\\n", " --output-dir \"$RUN_ROOT/06_cyto_select\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$RUN_ROOT/06_cyto_select`:\n", "\n", "- `pycytominer/feature_selected.parquet`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These four pycytominer stages turn the merged single-cell measurements into a cleaned well-level profile table: first aggregate, then annotate, normalize, and finally select features.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary Outputs\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Summary rows represented: 2\n", "Features retained at this stage: 6\n", "Top variable features reported: 6\n", "PCA components written: 2\n" ] } ], "source": [ "cellpainting-skills run \\\n", " --config \"$CONFIG\" \\\n", " --skill cyto-summarize-classical-profiles \\\n", " --feature-selected-path \"$RUN_ROOT/06_cyto_select/pycytominer/feature_selected.parquet\" \\\n", " --output-dir \"$RUN_ROOT/07_cyto_summary\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files written in `$RUN_ROOT/07_cyto_summary`:\n", "\n", "- `profile_summary.json`\n", "- `well_metadata_summary.csv`\n", "- `top_variable_features.csv`\n", "- `pca_coordinates.csv`\n", "- `pca_plot.png`\n", "- `pipeline_skill_manifest.json`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This final step turns the processed profile table into files that are easier to inspect directly: a compact summary, metadata summaries, top-variable features, and PCA outputs.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Result Files\n", "\n", "After this Quick Start, the most useful files to inspect are:\n", "\n", "- `data_access_summary.json` for the configured source inventory\n", "- `download_plan.json` for the resolved Gallery request\n", "- `single_cell.csv.gz` for the merged single-cell measurements table\n", "- `aggregated.parquet`, `annotated.parquet`, `normalized.parquet`, and `feature_selected.parquet` for the pycytominer stages\n", "- `profile_summary.json`, `well_metadata_summary.csv`, `top_variable_features.csv`, and `pca_plot.png` for the final review layer\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next Pages\n", "\n", "Continue with these pages after the first run:\n", "\n", "- [CellPainting-Skills](../skills/index.md) for the full skill catalog\n", "- [Command-Line Interface](../cli/index.md) for direct CLI usage\n", "- [OpenClaw](../openclaw/index.md) for agent-mediated use of the same skills\n" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "file_extension": ".sh", "mimetype": "application/x-sh", "name": "bash", "pygments_lexer": "bash" } }, "nbformat": 4, "nbformat_minor": 5 }