# CLAUDE.md - hokusai development guide ## Project overview hokusai is a `make`-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; hokusai builds a DAG with networkx and executes generation in parallel topological order using Mistral, OpenAI (text), and BlackForestLabs, OpenAI (images) as providers. ## Commands ```bash uv sync # install dependencies uv run hokusai build # build all targets uv run hokusai build X # build target X and its transitive deps uv run hokusai regenerate X # force rebuild X even if up to date uv run hokusai clean # remove generated artifacts + state file (or archive if configured) uv run hokusai graph # print dependency graph with stages uv run pytest # run tests ``` ## Code quality Pre-commit hooks run automatically on `git commit`: - **basedpyright** - strict static type checking (config: `pyrightconfig.json` points to `.devenv/state/venv`) - **ruff check** - linting with auto-fix - **ruff format** - formatting - **commitizen** - enforces conventional commit messages (`feat:`, `fix:`, `chore:`, etc.) Run manually: ```bash basedpyright ruff check ruff format --check ``` ## Code style conventions - **All function signatures must be fully typed.** No `Any` unless truly unavoidable. - Use `pathlib.Path` everywhere, never `os.path`. - Use `from __future__ import annotations` in every module. - Use modern typing: `str | None` (not `Optional[str]`), `Self`, `override`, `Annotated`. - Pydantic `BaseModel` for data that serializes to/from YAML. `dataclass` for internal-only data structures (e.g. `BuildResult`). - Errors: raise with `msg = "..."; raise ValueError(msg)` pattern (ruff W0 compliance). - Commit messages follow conventional commits (`feat:`, `fix:`, `refactor:`, `chore:`). ## Architecture ### Module structure ``` main.py # Entry point: imports and runs hokusai.cli.app hokusai/ __init__.py cli.py # Typer CLI: build, regenerate, clean, graph, init, models commands config.py # Pydantic models for YAML config + loop expansion at load time expand.py # Loop variable extraction, substitution, and target expansion graph.py # networkx DAG construction and traversal builder.py # Build orchestrator: incremental + parallel state.py # .hokusai.state.yaml hash tracking archive.py # Archive helper for preserving previous generations prompt.py # Prompt resolution and placeholder substitution resolve.py # Model resolution (target -> provider/model) providers/ __init__.py # Abstract Provider base class (ABC) models.py # ModelInfo and Capability definitions registry.py # Provider/model registry blackforest.py # BlackForestLabs FLUX image generation mistral.py # Mistral text generation openai_text.py # OpenAI text generation (GPT-4, GPT-5, o3, etc.) openai_image.py # OpenAI image generation (DALL-E, gpt-image) bfl.py # Low-level BFL API client ``` ### Data flow 1. **cli.py** finds the `*.hokusai.yaml` in cwd, calls `load_config()` from `config.py` 2. **config.py** parses YAML, expands loop templates via `expand.py` (cartesian product), then validates into `ProjectConfig` (pydantic) which contains `Defaults`, `loops`, and `dict[str, TargetConfig]` 3. **graph.py** builds an `nx.DiGraph` from target dependencies. `get_build_order()` uses `nx.topological_generations()` to return parallel batches 4. **builder.py** `run_build()` iterates generations. Per generation: - Checks each target for dirtiness via `state.py` (SHA-256 hashes of inputs, prompt, model, extra params) - Skips targets whose deps already failed - Runs dirty targets concurrently with `asyncio.gather()` - Records state after each generation (crash resilience) 5. **providers/** dispatch by `TargetType` (inferred from file extension) ### Key design decisions - **Target type inference**: `.png/.jpg/.jpeg/.webp` = image, `.md/.txt` = text. Defined in `config.py` as `IMAGE_EXTENSIONS` / `TEXT_EXTENSIONS`. - **Prompt resolution**: if the `prompt` string is a path to an existing file, its contents are read; otherwise it's used as-is. Supports `{filename}` placeholders. Done in `prompt.py`. - **Model resolution**: `resolve.py` maps target config + defaults to a `ModelInfo` with provider, model name, and capabilities. - **Content targets**: targets with `content:` write literal text to the file; no provider needed, no archiving on overwrite. State tracks the content string for incremental skip. - **Download targets**: targets with `download:` URL are fetched via httpx; state tracks the URL for incremental skip. - **Loop expansion**: `loops:` defines named lists of values. Targets with `[var]` in their name are expanded via cartesian product at config load time (in `expand.py`). Only variables appearing in the target name trigger expansion. Explicit targets override expanded ones. Escaping: `\[var]` → literal `[var]`. Substitution applies to all string fields (prompt, content, download, inputs, reference_images, control_images). The rest of the system sees only expanded targets. - **BFL client is async**: custom async client in `providers/bfl.py` polls for completion. - **Mistral client is natively async**: uses `complete_async()` directly. - **OpenAI clients are async**: use the official `openai` SDK with async methods. - **Incremental builds**: `.hokusai.state.yaml` tracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty. - **Archiving**: when `archive_folder` is set, previous outputs are moved to `archive/.01.` (incrementing) before rebuild or clean. - **Error isolation**: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building. - **State saved per-generation**: partial progress survives crashes. At most one generation of work is lost. ### Provider interface All providers implement `hokusai.providers.Provider`: ```python async def generate(self, target_name, target_config, resolved_prompt, resolved_model, project_dir) -> None ``` The provider writes the result file to `project_dir / target_name`. ### Image provider specifics (BFL) - Reference images are base64-encoded and passed as `input_image` (flux-2), `image_prompt` (flux-1.x), etc. - Control images for canny/depth models use `control_image` field - Result image URL is polled and downloaded via httpx - Supported models: `flux-dev`, `flux-pro`, `flux-pro-1.1`, `flux-pro-1.1-ultra`, `flux-2-pro`, `flux-kontext-pro`, `flux-pro-1.0-canny`, `flux-pro-1.0-depth`, `flux-pro-1.0-fill`, `flux-pro-1.0-expand` ### Image provider specifics (OpenAI) - Uses `images.generate` for text-to-image, `images.edit` for image-to-image - Reference images passed as raw bytes to the edit endpoint - Supported models: `gpt-image-1.5`, `gpt-image-1`, `gpt-image-1-mini`, `dall-e-3`, `dall-e-2` ### Text provider specifics (Mistral) - Text input files are appended to the prompt with `--- Contents of ---` headers - Image inputs are encoded as data URLs for multimodal models (pixtral) - Raw LLM response is written directly to the output file, no post-processing - Supported models: `mistral-large-latest`, `mistral-small-latest`, `pixtral-large-latest`, `pixtral-12b-latest` ### Text provider specifics (OpenAI) - Similar to Mistral: text inputs appended, images encoded as data URLs - Supported models: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `o3`, `o3-mini`, `o3-pro`, `o4-mini` ## Environment variables - `MISTRAL_API_KEY` - required for Mistral text models - `BFL_API_KEY` - required for BlackForestLabs FLUX image models - `OPENAI_API_KEY` - required for OpenAI text and image models ## Dependencies - `typer` - CLI framework - `pydantic` - data validation and config models - `pyyaml` - YAML parsing - `networkx` - dependency graph - `mistralai` - Mistral API client (supports async) - `openai` - OpenAI API client (supports async) - `httpx` - async HTTP for BFL polling and image downloads - `hatchling` - build backend