hokusai/CLAUDE.md
Konstantin Fickel 7503672942
All checks were successful
Continuous Integration / Build Package (push) Successful in 25s
Continuous Integration / Lint, Check & Test (push) Successful in 44s
feat: add content targets and loop expansion for target templates
Content targets write literal text to files via 'content:' field,
without requiring an AI provider or API keys. They are not archived
when overwritten.

Loop expansion allows defining 'loops:' at the top level with named
lists of values. Targets with [var] in their name are expanded via
cartesian product. Variables are substituted in all string fields.
Explicit targets override expanded ones. Escaping: \[var] -> [var].
Expansion happens at config load time so the rest of the system
(builder, graph, state) sees only expanded targets.
2026-02-21 18:39:13 +01:00

8.2 KiB

CLAUDE.md - hokusai development guide

Project overview

hokusai is a make-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; hokusai builds a DAG with networkx and executes generation in parallel topological order using Mistral, OpenAI (text), and BlackForestLabs, OpenAI (images) as providers.

Commands

uv sync                      # install dependencies
uv run hokusai build         # build all targets
uv run hokusai build X       # build target X and its transitive deps
uv run hokusai regenerate X  # force rebuild X even if up to date
uv run hokusai clean         # remove generated artifacts + state file (or archive if configured)
uv run hokusai graph         # print dependency graph with stages
uv run pytest                # run tests

Code quality

Pre-commit hooks run automatically on git commit:

  • basedpyright - strict static type checking (config: pyrightconfig.json points to .devenv/state/venv)
  • ruff check - linting with auto-fix
  • ruff format - formatting
  • commitizen - enforces conventional commit messages (feat:, fix:, chore:, etc.)

Run manually:

basedpyright
ruff check
ruff format --check

Code style conventions

  • All function signatures must be fully typed. No Any unless truly unavoidable.
  • Use pathlib.Path everywhere, never os.path.
  • Use from __future__ import annotations in every module.
  • Use modern typing: str | None (not Optional[str]), Self, override, Annotated.
  • Pydantic BaseModel for data that serializes to/from YAML. dataclass for internal-only data structures (e.g. BuildResult).
  • Errors: raise with msg = "..."; raise ValueError(msg) pattern (ruff W0 compliance).
  • Commit messages follow conventional commits (feat:, fix:, refactor:, chore:).

Architecture

Module structure

main.py                  # Entry point: imports and runs hokusai.cli.app
hokusai/
  __init__.py
  cli.py                 # Typer CLI: build, regenerate, clean, graph, init, models commands
  config.py              # Pydantic models for YAML config + loop expansion at load time
  expand.py              # Loop variable extraction, substitution, and target expansion
  graph.py               # networkx DAG construction and traversal
  builder.py             # Build orchestrator: incremental + parallel
  state.py               # .hokusai.state.yaml hash tracking
  archive.py             # Archive helper for preserving previous generations
  prompt.py              # Prompt resolution and placeholder substitution
  resolve.py             # Model resolution (target -> provider/model)
  providers/
    __init__.py           # Abstract Provider base class (ABC)
    models.py             # ModelInfo and Capability definitions
    registry.py           # Provider/model registry
    blackforest.py        # BlackForestLabs FLUX image generation
    mistral.py            # Mistral text generation
    openai_text.py        # OpenAI text generation (GPT-4, GPT-5, o3, etc.)
    openai_image.py       # OpenAI image generation (DALL-E, gpt-image)
    bfl.py                # Low-level BFL API client

Data flow

  1. cli.py finds the *.hokusai.yaml in cwd, calls load_config() from config.py
  2. config.py parses YAML, expands loop templates via expand.py (cartesian product), then validates into ProjectConfig (pydantic) which contains Defaults, loops, and dict[str, TargetConfig]
  3. graph.py builds an nx.DiGraph from target dependencies. get_build_order() uses nx.topological_generations() to return parallel batches
  4. builder.py run_build() iterates generations. Per generation:
    • Checks each target for dirtiness via state.py (SHA-256 hashes of inputs, prompt, model, extra params)
    • Skips targets whose deps already failed
    • Runs dirty targets concurrently with asyncio.gather()
    • Records state after each generation (crash resilience)
  5. providers/ dispatch by TargetType (inferred from file extension)

Key design decisions

  • Target type inference: .png/.jpg/.jpeg/.webp = image, .md/.txt = text. Defined in config.py as IMAGE_EXTENSIONS / TEXT_EXTENSIONS.
  • Prompt resolution: if the prompt string is a path to an existing file, its contents are read; otherwise it's used as-is. Supports {filename} placeholders. Done in prompt.py.
  • Model resolution: resolve.py maps target config + defaults to a ModelInfo with provider, model name, and capabilities.
  • Content targets: targets with content: write literal text to the file; no provider needed, no archiving on overwrite. State tracks the content string for incremental skip.
  • Download targets: targets with download: URL are fetched via httpx; state tracks the URL for incremental skip.
  • Loop expansion: loops: defines named lists of values. Targets with [var] in their name are expanded via cartesian product at config load time (in expand.py). Only variables appearing in the target name trigger expansion. Explicit targets override expanded ones. Escaping: \[var] → literal [var]. Substitution applies to all string fields (prompt, content, download, inputs, reference_images, control_images). The rest of the system sees only expanded targets.
  • BFL client is async: custom async client in providers/bfl.py polls for completion.
  • Mistral client is natively async: uses complete_async() directly.
  • OpenAI clients are async: use the official openai SDK with async methods.
  • Incremental builds: .hokusai.state.yaml tracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty.
  • Archiving: when archive_folder is set, previous outputs are moved to archive/<name>.01.<ext> (incrementing) before rebuild or clean.
  • Error isolation: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building.
  • State saved per-generation: partial progress survives crashes. At most one generation of work is lost.

Provider interface

All providers implement hokusai.providers.Provider:

async def generate(self, target_name, target_config, resolved_prompt, resolved_model, project_dir) -> None

The provider writes the result file to project_dir / target_name.

Image provider specifics (BFL)

  • Reference images are base64-encoded and passed as input_image (flux-2), image_prompt (flux-1.x), etc.
  • Control images for canny/depth models use control_image field
  • Result image URL is polled and downloaded via httpx
  • Supported models: flux-dev, flux-pro, flux-pro-1.1, flux-pro-1.1-ultra, flux-2-pro, flux-kontext-pro, flux-pro-1.0-canny, flux-pro-1.0-depth, flux-pro-1.0-fill, flux-pro-1.0-expand

Image provider specifics (OpenAI)

  • Uses images.generate for text-to-image, images.edit for image-to-image
  • Reference images passed as raw bytes to the edit endpoint
  • Supported models: gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-3, dall-e-2

Text provider specifics (Mistral)

  • Text input files are appended to the prompt with --- Contents of <name> --- headers
  • Image inputs are encoded as data URLs for multimodal models (pixtral)
  • Raw LLM response is written directly to the output file, no post-processing
  • Supported models: mistral-large-latest, mistral-small-latest, pixtral-large-latest, pixtral-12b-latest

Text provider specifics (OpenAI)

  • Similar to Mistral: text inputs appended, images encoded as data URLs
  • Supported models: gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o3-pro, o4-mini

Environment variables

  • MISTRAL_API_KEY - required for Mistral text models
  • BFL_API_KEY - required for BlackForestLabs FLUX image models
  • OPENAI_API_KEY - required for OpenAI text and image models

Dependencies

  • typer - CLI framework
  • pydantic - data validation and config models
  • pyyaml - YAML parsing
  • networkx - dependency graph
  • mistralai - Mistral API client (supports async)
  • openai - OpenAI API client (supports async)
  • httpx - async HTTP for BFL polling and image downloads
  • hatchling - build backend