docs: add CLAUDE.md development guide and README.md user docs

CLAUDE.md covers architecture, data flow, code style conventions, provider specifics, and all commands needed for development. README.md covers installation, quick start, full config format reference, CLI usage, incremental builds, and environment variables.
2026-02-13 20:18:51 +01:00 · 2026-02-13 20:18:51 +01:00 · f71af1cfaf
commit f71af1cfaf
parent d38682597c
2 changed files with 288 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,118 @@
 # CLAUDE.md - bulkgen development guide
 ## Project overview
 bulkgen is a `make`-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; bulkgen builds a DAG with networkx and executes generation in parallel topological order using Mistral (text) and BlackForestLabs (images) as providers.
 ## Commands
 ```bash
 uv sync                  # install dependencies
 uv run bulkgen build     # build all targets
 uv run bulkgen build X   # build target X and its transitive deps
 uv run bulkgen clean     # remove generated artifacts + state file
 uv run bulkgen graph     # print dependency graph with stages
 uv run pytest            # run tests
 ```
 ## Code quality
 Pre-commit hooks run automatically on `git commit`:
 - **pyright** - static type checking (config: `pyrightconfig.json` points to `.devenv/state/venv`)
 - **ruff check** - linting with auto-fix
 - **ruff format** - formatting
 - **commitizen** - enforces conventional commit messages (`feat:`, `fix:`, `chore:`, etc.)
 Run manually:
 ```bash
 /nix/store/h7f5vym2ykpl7ls8icw0wiqgmv9xiwnx-pyright-1.1.407/bin/pyright
 /nix/store/xmy9vff4zlbvkz3y830085dzgjpmaj8d-ruff-0.14.14/bin/ruff check
 /nix/store/xmy9vff4zlbvkz3y830085dzgjpmaj8d-ruff-0.14.14/bin/ruff format --check
 ```
 ## Code style conventions
 - **All function signatures must be fully typed.** No `Any` unless truly unavoidable.
 - Use `pathlib.Path` everywhere, never `os.path`.
 - Use `from __future__ import annotations` in every module.
 - Use modern typing: `str | None` (not `Optional[str]`), `Self`, `override`, `Annotated`.
 - Pydantic `BaseModel` for data that serializes to/from YAML. `dataclass` for internal-only data structures (e.g. `BuildResult`).
 - Errors: raise with `msg = "..."; raise ValueError(msg)` pattern (ruff W0 compliance).
 - Commit messages follow conventional commits (`feat:`, `fix:`, `refactor:`, `chore:`).
 ## Architecture
 ### Module structure
 ```
 main.py                  # Entry point: imports and runs bulkgen.cli.app
 bulkgen/
  __init__.py
  cli.py                 # Typer CLI: build, clean, graph commands
  config.py              # Pydantic models for YAML config
  graph.py               # networkx DAG construction and traversal
  builder.py             # Build orchestrator: incremental + parallel
  state.py               # .bulkgen.state.yaml hash tracking
  providers/
    __init__.py           # Abstract Provider base class (ABC)
    image.py              # BlackForestLabs image generation
    text.py               # Mistral text generation
 ```
 ### Data flow
 1. **cli.py** finds the `*.bulkgen.yaml` in cwd, calls `load_config()` from `config.py`
 2. **config.py** parses YAML into `ProjectConfig` (pydantic), which contains `Defaults` and `dict[str, TargetConfig]`
 3. **graph.py** builds an `nx.DiGraph` from target dependencies. `get_build_order()` uses `nx.topological_generations()` to return parallel batches
 4. **builder.py** `run_build()` iterates generations. Per generation:
   - Checks each target for dirtiness via `state.py` (SHA-256 hashes of inputs, prompt, model, extra params)
   - Skips targets whose deps already failed
   - Runs dirty targets concurrently with `asyncio.gather()`
   - Records state after each generation (crash resilience)
 5. **providers/** dispatch by `TargetType` (inferred from file extension)
 ### Key design decisions
 - **Target type inference**: `.png/.jpg/.jpeg/.webp` = image, `.md/.txt` = text. Defined in `config.py` as `IMAGE_EXTENSIONS` / `TEXT_EXTENSIONS`.
 - **Prompt resolution**: if the `prompt` string is a path to an existing file, its contents are read; otherwise it's used as-is. Done in `builder.py:_resolve_prompt()`.
 - **BFL client is synchronous**: wrapped in `asyncio.to_thread()` in `providers/image.py`. Uses `ClientConfig(sync=True, timeout=300)` for internal polling.
 - **Mistral client is natively async**: uses `complete_async()` directly in `providers/text.py`.
 - **Incremental builds**: `.bulkgen.state.yaml` tracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty.
 - **Error isolation**: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building.
 - **State saved per-generation**: partial progress survives crashes. At most one generation of work is lost.
 ### Provider interface
 All providers implement `bulkgen.providers.Provider`:
 ```python
 async def generate(self, target_name, target_config, resolved_prompt, resolved_model, project_dir) -> None
 ```
 The provider writes the result file to `project_dir / target_name`.
 ### Image provider specifics (BFL)
 - `image_prompt` field: base64-encoded reference image (from `target.reference_image`)
 - `control_image` field: base64-encoded control image (from `target.control_images`)
 - Result image URL is in `result.result["sample"]`, downloaded via httpx
 - Supported models: `flux-dev`, `flux-pro`, `flux-pro-1.1`, `flux-pro-1.1-ultra`, `flux-kontext-pro`, `flux-pro-1.0-canny`, `flux-pro-1.0-depth`, `flux-pro-1.0-fill`, `flux-pro-1.0-expand`
 ### Text provider specifics (Mistral)
 - Text input files are appended to the prompt with `--- Contents of <name> ---` headers
 - Image inputs are noted as `[Attached image: <name>]` (no actual vision/multimodal yet)
 - Raw LLM response is written directly to the output file, no post-processing
 ## Environment variables
 - `MISTRAL_API_KEY` - required for text targets
 - `BFL_API_KEY` - required for image targets
 ## Dependencies
 - `typer` - CLI framework
 - `pydantic` / `pydantic-settings[yaml]` - config parsing (pyyaml comes via the yaml extra)
 - `networkx` - dependency graph
 - `blackforest` - BlackForestLabs API client (sync, uses `requests`)
 - `mistralai` - Mistral API client (supports async)
 - `httpx` - async HTTP for downloading BFL result images (transitive via mistralai)
 - `hatchling` - build backend
--- a/README.md
+++ b/README.md
@ -0,0 +1,170 @@
 # bulkgen
 A build tool for AI-generated artifacts. Define image and text targets in a YAML config, and bulkgen handles dependency resolution, incremental builds, and parallel execution.
 Uses [Mistral](https://mistral.ai) for text generation and [BlackForestLabs](https://blackforestlabs.ai) (FLUX) for image generation.
 ## Installation
 Requires Python 3.13+.
 ```bash
 pip install .
 ```
 Or with [uv](https://docs.astral.sh/uv/):
 ```bash
 uv sync
 ```
 ## Quick start
 1. Set your API keys:
 ```bash
 export MISTRAL_API_KEY="your-key"
 export BFL_API_KEY="your-key"
 ```
 2. Create a config file (e.g. `my-project.bulkgen.yaml`):
 ```yaml
 defaults:
  text_model: mistral-large-latest
  image_model: flux-pro
 targets:
  hero.png:
    prompt: "A dramatic sunset over mountains, photorealistic"
    width: 1024
    height: 768
  blog-post.md:
    prompt: prompts/write-blog.txt
    inputs:
      - hero.png
      - notes.md
 ```
 3. Build:
 ```bash
 bulkgen build
 ```
 ## Config format
 The config file must be named `<anything>.bulkgen.yaml` and placed in your project directory. One config file per directory.
 ### Top-level fields
 | Field | Description |
 |---|---|
 | `defaults` | Default model names (optional) |
 | `targets` | Map of output filenames to their configuration |
 ### Defaults
 ```yaml
 defaults:
  text_model: mistral-large-latest   # used for .md, .txt targets
  image_model: flux-pro              # used for .png, .jpg, .jpeg, .webp targets
 ```
 ### Target fields
 | Field | Type | Description |
 |---|---|---|
 | `prompt` | string | Inline prompt text, or path to a prompt file |
 | `model` | string | Override the default model for this target |
 | `inputs` | list[string] | Files this target depends on (other targets or existing files) |
 | `reference_image` | string | Image file for image-to-image generation |
 | `control_images` | list[string] | Control images (for canny/depth models) |
 | `width` | int | Image width in pixels |
 | `height` | int | Image height in pixels |
 Target type is inferred from the file extension:
 - **Image**: `.png`, `.jpg`, `.jpeg`, `.webp`
 - **Text**: `.md`, `.txt`
 ### Prompts
 Prompts can be inline strings or file references:
 ```yaml
 targets:
  # Inline prompt
  image.png:
    prompt: "A cat sitting on a windowsill"
  # File reference (reads the file contents as the prompt)
  article.md:
    prompt: prompts/article-prompt.txt
 ```
 If the prompt value is a path to an existing file, its contents are read. Otherwise the string is used directly.
 ### Dependencies
 Targets can depend on other targets or on existing files in the project directory:
 ```yaml
 targets:
  base.png:
    prompt: "A landscape scene"
  variant.png:
    prompt: "Same scene but in winter"
    reference_image: base.png    # image-to-image, depends on base.png
  summary.md:
    prompt: "Summarize these notes"
    inputs:
      - base.png                 # depends on a generated target
      - research-notes.md        # depends on an existing file
 ```
 bulkgen resolves dependencies automatically. If you build a single target, its transitive dependencies are included.
 ## CLI
 ### `bulkgen build [target]`
 Build all targets, or a specific target and its dependencies.
 - Skips targets that are already up to date (incremental builds)
 - Runs independent targets in parallel
 - Continues building if a target fails (dependents of the failed target are skipped)
 ### `bulkgen clean`
 Remove all generated target files and the build state file (`.bulkgen.state.yaml`). Input files are preserved.
 ### `bulkgen graph`
 Print the dependency graph showing build stages:
 ```
 Stage 0 (inputs): research-notes.md
 Stage 0 (targets): base.png
 Stage 1 (targets): variant.png, summary.md
  variant.png <- base.png
  summary.md <- base.png, research-notes.md
 ```
 ## Incremental builds
 bulkgen tracks the state of each build in `.bulkgen.state.yaml` (auto-generated, add to `.gitignore`). A target is rebuilt when any of these change:
 - Input file contents (SHA-256 hash)
 - Prompt text
 - Model name
 - Extra parameters (width, height, etc.)
 ## Environment variables
 | Variable | Required for |
 |---|---|
 | `MISTRAL_API_KEY` | Text targets (`.md`, `.txt`) |
 | `BFL_API_KEY` | Image targets (`.png`, `.jpg`, `.jpeg`, `.webp`) |