From fda28b5afa203a387dccc9088cceb379466dc0fe Mon Sep 17 00:00:00 2001 From: Konstantin Fickel Date: Sat, 21 Feb 2026 11:48:49 +0100 Subject: [PATCH] docs: update README and CLAUDE.md with recent features - Add regenerate command documentation - Add download target type - Fix reference_images field (list, not single string) - Document archive behavior for clean command - Update module structure to reflect actual files - Add OpenAI provider documentation - Update supported models list - Add OPENAI_API_KEY to environment variables --- CLAUDE.md | 70 ++++++++++++++++++++++++++++++++++++++----------------- README.md | 33 +++++++++++++++++++++++++- 2 files changed, 80 insertions(+), 23 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 3a65f43..a0751ac 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,17 +2,18 @@ ## Project overview -hokusai is a `make`-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; hokusai builds a DAG with networkx and executes generation in parallel topological order using Mistral (text) and BlackForestLabs (images) as providers. +hokusai is a `make`-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; hokusai builds a DAG with networkx and executes generation in parallel topological order using Mistral, OpenAI (text), and BlackForestLabs, OpenAI (images) as providers. ## Commands ```bash -uv sync # install dependencies -uv run hokusai build # build all targets -uv run hokusai build X # build target X and its transitive deps -uv run hokusai clean # remove generated artifacts + state file -uv run hokusai graph # print dependency graph with stages -uv run pytest # run tests +uv sync # install dependencies +uv run hokusai build # build all targets +uv run hokusai build X # build target X and its transitive deps +uv run hokusai regenerate X # force rebuild X even if up to date +uv run hokusai clean # remove generated artifacts + state file (or archive if configured) +uv run hokusai graph # print dependency graph with stages +uv run pytest # run tests ``` ## Code quality @@ -48,15 +49,23 @@ ruff format --check main.py # Entry point: imports and runs hokusai.cli.app hokusai/ __init__.py - cli.py # Typer CLI: build, clean, graph commands + cli.py # Typer CLI: build, regenerate, clean, graph, init, models commands config.py # Pydantic models for YAML config graph.py # networkx DAG construction and traversal builder.py # Build orchestrator: incremental + parallel state.py # .hokusai.state.yaml hash tracking + archive.py # Archive helper for preserving previous generations + prompt.py # Prompt resolution and placeholder substitution + resolve.py # Model resolution (target -> provider/model) providers/ __init__.py # Abstract Provider base class (ABC) - image.py # BlackForestLabs image generation - text.py # Mistral text generation + models.py # ModelInfo and Capability definitions + registry.py # Provider/model registry + blackforest.py # BlackForestLabs FLUX image generation + mistral.py # Mistral text generation + openai_text.py # OpenAI text generation (GPT-4, GPT-5, o3, etc.) + openai_image.py # OpenAI image generation (DALL-E, gpt-image) + bfl.py # Low-level BFL API client ``` ### Data flow @@ -74,10 +83,14 @@ hokusai/ ### Key design decisions - **Target type inference**: `.png/.jpg/.jpeg/.webp` = image, `.md/.txt` = text. Defined in `config.py` as `IMAGE_EXTENSIONS` / `TEXT_EXTENSIONS`. -- **Prompt resolution**: if the `prompt` string is a path to an existing file, its contents are read; otherwise it's used as-is. Done in `builder.py:_resolve_prompt()`. -- **BFL client is synchronous**: wrapped in `asyncio.to_thread()` in `providers/image.py`. Uses `ClientConfig(sync=True, timeout=300)` for internal polling. -- **Mistral client is natively async**: uses `complete_async()` directly in `providers/text.py`. +- **Prompt resolution**: if the `prompt` string is a path to an existing file, its contents are read; otherwise it's used as-is. Supports `{filename}` placeholders. Done in `prompt.py`. +- **Model resolution**: `resolve.py` maps target config + defaults to a `ModelInfo` with provider, model name, and capabilities. +- **Download targets**: targets with `download:` URL are fetched via httpx; state tracks the URL for incremental skip. +- **BFL client is async**: custom async client in `providers/bfl.py` polls for completion. +- **Mistral client is natively async**: uses `complete_async()` directly. +- **OpenAI clients are async**: use the official `openai` SDK with async methods. - **Incremental builds**: `.hokusai.state.yaml` tracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty. +- **Archiving**: when `archive_folder` is set, previous outputs are moved to `archive/.01.` (incrementing) before rebuild or clean. - **Error isolation**: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building. - **State saved per-generation**: partial progress survives crashes. At most one generation of work is lost. @@ -91,21 +104,34 @@ The provider writes the result file to `project_dir / target_name`. ### Image provider specifics (BFL) -- `image_prompt` field: base64-encoded reference image (from `target.reference_image`) -- `control_image` field: base64-encoded control image (from `target.control_images`) -- Result image URL is in `result.result["sample"]`, downloaded via httpx -- Supported models: `flux-dev`, `flux-pro`, `flux-pro-1.1`, `flux-pro-1.1-ultra`, `flux-kontext-pro`, `flux-pro-1.0-canny`, `flux-pro-1.0-depth`, `flux-pro-1.0-fill`, `flux-pro-1.0-expand` +- Reference images are base64-encoded and passed as `input_image` (flux-2), `image_prompt` (flux-1.x), etc. +- Control images for canny/depth models use `control_image` field +- Result image URL is polled and downloaded via httpx +- Supported models: `flux-dev`, `flux-pro`, `flux-pro-1.1`, `flux-pro-1.1-ultra`, `flux-2-pro`, `flux-kontext-pro`, `flux-pro-1.0-canny`, `flux-pro-1.0-depth`, `flux-pro-1.0-fill`, `flux-pro-1.0-expand` + +### Image provider specifics (OpenAI) + +- Uses `images.generate` for text-to-image, `images.edit` for image-to-image +- Reference images passed as raw bytes to the edit endpoint +- Supported models: `gpt-image-1.5`, `gpt-image-1`, `gpt-image-1-mini`, `dall-e-3`, `dall-e-2` ### Text provider specifics (Mistral) - Text input files are appended to the prompt with `--- Contents of ---` headers -- Image inputs are noted as `[Attached image: ]` (no actual vision/multimodal yet) +- Image inputs are encoded as data URLs for multimodal models (pixtral) - Raw LLM response is written directly to the output file, no post-processing +- Supported models: `mistral-large-latest`, `mistral-small-latest`, `pixtral-large-latest`, `pixtral-12b-latest` + +### Text provider specifics (OpenAI) + +- Similar to Mistral: text inputs appended, images encoded as data URLs +- Supported models: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `o3`, `o3-mini`, `o3-pro`, `o4-mini` ## Environment variables -- `MISTRAL_API_KEY` - required for text targets -- `BFL_API_KEY` - required for image targets +- `MISTRAL_API_KEY` - required for Mistral text models +- `BFL_API_KEY` - required for BlackForestLabs FLUX image models +- `OPENAI_API_KEY` - required for OpenAI text and image models ## Dependencies @@ -113,7 +139,7 @@ The provider writes the result file to `project_dir / target_name`. - `pydantic` - data validation and config models - `pyyaml` - YAML parsing - `networkx` - dependency graph -- `blackforest` - BlackForestLabs API client (sync, uses `requests`; no type stubs) - `mistralai` - Mistral API client (supports async) -- `httpx` - async HTTP for downloading BFL result images (transitive via mistralai) +- `openai` - OpenAI API client (supports async) +- `httpx` - async HTTP for BFL polling and image downloads - `hatchling` - build backend diff --git a/README.md b/README.md index 4c090cf..22143f2 100644 --- a/README.md +++ b/README.md @@ -85,10 +85,11 @@ defaults: | `prompt` | string | Inline prompt text, or path to a prompt file | | `model` | string | Override the default model for this target | | `inputs` | list[string] | Files this target depends on (other targets or existing files) | -| `reference_image` | string | Image file for image-to-image generation | +| `reference_images` | list[string] | Image files for image-to-image generation | | `control_images` | list[string] | Control images (for canny/depth models) | | `width` | int | Image width in pixels | | `height` | int | Image height in pixels | +| `download` | string | URL to download instead of generating (mutually exclusive with prompt) | Target type is inferred from the file extension: - **Image**: `.png`, `.jpg`, `.jpeg`, `.webp` @@ -133,6 +134,23 @@ targets: hokusai resolves dependencies automatically. If you build a single target, its transitive dependencies are included. +### Download targets + +Targets can download files from URLs instead of generating them: + +```yaml +targets: + reference.jpg: + download: https://example.com/image.jpg + + variation.png: + prompt: "A variation of this image in watercolor style" + reference_images: + - reference.jpg +``` + +Download targets participate in dependency resolution like any other target. They are skipped if the URL hasn't changed. + ### Archiving previous outputs Set `archive_folder` at the top level to preserve previous versions of generated files. When a target is rebuilt, the existing output is moved to the archive folder with an incrementing numeric suffix: @@ -157,10 +175,23 @@ Build all targets, or a specific target and its dependencies. - Runs independent targets in parallel - Continues building if a target fails (dependents of the failed target are skipped) +### `hokusai regenerate ` + +Force regeneration of specific targets, ignoring their up-to-date status. Useful for getting a new variation of an AI-generated output without changing the prompt. + +```bash +hokusai regenerate hero.png # regenerate one target +hokusai regenerate hero.png logo.png # regenerate multiple targets +``` + +If `archive_folder` is set, the previous versions are archived before regeneration. + ### `hokusai clean` Remove all generated target files and the build state file (`.hokusai.state.yaml`). Input files are preserved. +If `archive_folder` is set, files are moved to the archive instead of being deleted. + ### `hokusai graph` Print the dependency graph showing build stages: