kfickel/hokusai

Fork 0

Konstantin Fickel 7503672942

Continuous Integration / Build Package (push) Successful in 25s

Details

Continuous Integration / Lint, Check & Test (push) Successful in 44s

Details

feat: add content targets and loop expansion for target templates

Content targets write literal text to files via 'content:' field,
without requiring an AI provider or API keys. They are not archived
when overwritten.

Loop expansion allows defining 'loops:' at the top level with named
lists of values. Targets with [var] in their name are expanded via
cartesian product. Variables are substituted in all string fields.
Explicit targets override expanded ones. Escaping: \[var] -> [var].
Expansion happens at config load time so the rest of the system
(builder, graph, state) sees only expanded targets.

2026-02-21 18:39:13 +01:00

8.2 KiB

Raw Permalink Blame History

CLAUDE.md - hokusai development guide

Project overview

hokusai is a make-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; hokusai builds a DAG with networkx and executes generation in parallel topological order using Mistral, OpenAI (text), and BlackForestLabs, OpenAI (images) as providers.

Commands

uv sync                      # install dependencies
uv run hokusai build         # build all targets
uv run hokusai build X       # build target X and its transitive deps
uv run hokusai regenerate X  # force rebuild X even if up to date
uv run hokusai clean         # remove generated artifacts + state file (or archive if configured)
uv run hokusai graph         # print dependency graph with stages
uv run pytest                # run tests

Code quality

Pre-commit hooks run automatically on git commit:

basedpyright - strict static type checking (config: pyrightconfig.json points to .devenv/state/venv)
ruff check - linting with auto-fix
ruff format - formatting
commitizen - enforces conventional commit messages (feat:, fix:, chore:, etc.)

Run manually:

basedpyright
ruff check
ruff format --check

Code style conventions

All function signatures must be fully typed. No Any unless truly unavoidable.
Use pathlib.Path everywhere, never os.path.
Use from __future__ import annotations in every module.
Use modern typing: str | None (not Optional[str]), Self, override, Annotated.
Pydantic BaseModel for data that serializes to/from YAML. dataclass for internal-only data structures (e.g. BuildResult).
Errors: raise with msg = "..."; raise ValueError(msg) pattern (ruff W0 compliance).
Commit messages follow conventional commits (feat:, fix:, refactor:, chore:).

Architecture

Module structure

main.py                  # Entry point: imports and runs hokusai.cli.app
hokusai/
  __init__.py
  cli.py                 # Typer CLI: build, regenerate, clean, graph, init, models commands
  config.py              # Pydantic models for YAML config + loop expansion at load time
  expand.py              # Loop variable extraction, substitution, and target expansion
  graph.py               # networkx DAG construction and traversal
  builder.py             # Build orchestrator: incremental + parallel
  state.py               # .hokusai.state.yaml hash tracking
  archive.py             # Archive helper for preserving previous generations
  prompt.py              # Prompt resolution and placeholder substitution
  resolve.py             # Model resolution (target -> provider/model)
  providers/
    __init__.py           # Abstract Provider base class (ABC)
    models.py             # ModelInfo and Capability definitions
    registry.py           # Provider/model registry
    blackforest.py        # BlackForestLabs FLUX image generation
    mistral.py            # Mistral text generation
    openai_text.py        # OpenAI text generation (GPT-4, GPT-5, o3, etc.)
    openai_image.py       # OpenAI image generation (DALL-E, gpt-image)
    bfl.py                # Low-level BFL API client

Data flow

cli.py finds the *.hokusai.yaml in cwd, calls load_config() from config.py
config.py parses YAML, expands loop templates via expand.py (cartesian product), then validates into ProjectConfig (pydantic) which contains Defaults, loops, and dict[str, TargetConfig]
graph.py builds an nx.DiGraph from target dependencies. get_build_order() uses nx.topological_generations() to return parallel batches
builder.py run_build() iterates generations. Per generation:
- Checks each target for dirtiness via state.py (SHA-256 hashes of inputs, prompt, model, extra params)
- Skips targets whose deps already failed
- Runs dirty targets concurrently with asyncio.gather()
- Records state after each generation (crash resilience)
providers/ dispatch by TargetType (inferred from file extension)

Key design decisions

Target type inference: .png/.jpg/.jpeg/.webp = image, .md/.txt = text. Defined in config.py as IMAGE_EXTENSIONS / TEXT_EXTENSIONS.
Prompt resolution: if the prompt string is a path to an existing file, its contents are read; otherwise it's used as-is. Supports {filename} placeholders. Done in prompt.py.
Model resolution: resolve.py maps target config + defaults to a ModelInfo with provider, model name, and capabilities.
Content targets: targets with content: write literal text to the file; no provider needed, no archiving on overwrite. State tracks the content string for incremental skip.
Download targets: targets with download: URL are fetched via httpx; state tracks the URL for incremental skip.
Loop expansion: loops: defines named lists of values. Targets with [var] in their name are expanded via cartesian product at config load time (in expand.py). Only variables appearing in the target name trigger expansion. Explicit targets override expanded ones. Escaping: \[var] → literal [var]. Substitution applies to all string fields (prompt, content, download, inputs, reference_images, control_images). The rest of the system sees only expanded targets.
BFL client is async: custom async client in providers/bfl.py polls for completion.
Mistral client is natively async: uses complete_async() directly.
OpenAI clients are async: use the official openai SDK with async methods.
Incremental builds: .hokusai.state.yaml tracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty.
Archiving: when archive_folder is set, previous outputs are moved to archive/<name>.01.<ext> (incrementing) before rebuild or clean.
Error isolation: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building.
State saved per-generation: partial progress survives crashes. At most one generation of work is lost.

Provider interface

All providers implement hokusai.providers.Provider:

async def generate(self, target_name, target_config, resolved_prompt, resolved_model, project_dir) -> None

The provider writes the result file to project_dir / target_name.

Image provider specifics (BFL)

Reference images are base64-encoded and passed as input_image (flux-2), image_prompt (flux-1.x), etc.
Control images for canny/depth models use control_image field
Result image URL is polled and downloaded via httpx
Supported models: flux-dev, flux-pro, flux-pro-1.1, flux-pro-1.1-ultra, flux-2-pro, flux-kontext-pro, flux-pro-1.0-canny, flux-pro-1.0-depth, flux-pro-1.0-fill, flux-pro-1.0-expand

Image provider specifics (OpenAI)

Uses images.generate for text-to-image, images.edit for image-to-image
Reference images passed as raw bytes to the edit endpoint
Supported models: gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-3, dall-e-2

Text provider specifics (Mistral)

Text input files are appended to the prompt with --- Contents of <name> --- headers
Image inputs are encoded as data URLs for multimodal models (pixtral)
Raw LLM response is written directly to the output file, no post-processing
Supported models: mistral-large-latest, mistral-small-latest, pixtral-large-latest, pixtral-12b-latest

Text provider specifics (OpenAI)

Similar to Mistral: text inputs appended, images encoded as data URLs
Supported models: gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o3-pro, o4-mini

Environment variables

MISTRAL_API_KEY - required for Mistral text models
BFL_API_KEY - required for BlackForestLabs FLUX image models
OPENAI_API_KEY - required for OpenAI text and image models

Dependencies

typer - CLI framework
pydantic - data validation and config models
pyyaml - YAML parsing
networkx - dependency graph
mistralai - Mistral API client (supports async)
openai - OpenAI API client (supports async)
httpx - async HTTP for BFL polling and image downloads
hatchling - build backend

8.2 KiB Raw Permalink Blame History