CLAUDE.md covers architecture, data flow, code style conventions, provider specifics, and all commands needed for development. README.md covers installation, quick start, full config format reference, CLI usage, incremental builds, and environment variables.
5.8 KiB
5.8 KiB
CLAUDE.md - bulkgen development guide
Project overview
bulkgen is a make-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; bulkgen builds a DAG with networkx and executes generation in parallel topological order using Mistral (text) and BlackForestLabs (images) as providers.
Commands
uv sync # install dependencies
uv run bulkgen build # build all targets
uv run bulkgen build X # build target X and its transitive deps
uv run bulkgen clean # remove generated artifacts + state file
uv run bulkgen graph # print dependency graph with stages
uv run pytest # run tests
Code quality
Pre-commit hooks run automatically on git commit:
- pyright - static type checking (config:
pyrightconfig.jsonpoints to.devenv/state/venv) - ruff check - linting with auto-fix
- ruff format - formatting
- commitizen - enforces conventional commit messages (
feat:,fix:,chore:, etc.)
Run manually:
/nix/store/h7f5vym2ykpl7ls8icw0wiqgmv9xiwnx-pyright-1.1.407/bin/pyright
/nix/store/xmy9vff4zlbvkz3y830085dzgjpmaj8d-ruff-0.14.14/bin/ruff check
/nix/store/xmy9vff4zlbvkz3y830085dzgjpmaj8d-ruff-0.14.14/bin/ruff format --check
Code style conventions
- All function signatures must be fully typed. No
Anyunless truly unavoidable. - Use
pathlib.Patheverywhere, neveros.path. - Use
from __future__ import annotationsin every module. - Use modern typing:
str | None(notOptional[str]),Self,override,Annotated. - Pydantic
BaseModelfor data that serializes to/from YAML.dataclassfor internal-only data structures (e.g.BuildResult). - Errors: raise with
msg = "..."; raise ValueError(msg)pattern (ruff W0 compliance). - Commit messages follow conventional commits (
feat:,fix:,refactor:,chore:).
Architecture
Module structure
main.py # Entry point: imports and runs bulkgen.cli.app
bulkgen/
__init__.py
cli.py # Typer CLI: build, clean, graph commands
config.py # Pydantic models for YAML config
graph.py # networkx DAG construction and traversal
builder.py # Build orchestrator: incremental + parallel
state.py # .bulkgen.state.yaml hash tracking
providers/
__init__.py # Abstract Provider base class (ABC)
image.py # BlackForestLabs image generation
text.py # Mistral text generation
Data flow
- cli.py finds the
*.bulkgen.yamlin cwd, callsload_config()fromconfig.py - config.py parses YAML into
ProjectConfig(pydantic), which containsDefaultsanddict[str, TargetConfig] - graph.py builds an
nx.DiGraphfrom target dependencies.get_build_order()usesnx.topological_generations()to return parallel batches - builder.py
run_build()iterates generations. Per generation:- Checks each target for dirtiness via
state.py(SHA-256 hashes of inputs, prompt, model, extra params) - Skips targets whose deps already failed
- Runs dirty targets concurrently with
asyncio.gather() - Records state after each generation (crash resilience)
- Checks each target for dirtiness via
- providers/ dispatch by
TargetType(inferred from file extension)
Key design decisions
- Target type inference:
.png/.jpg/.jpeg/.webp= image,.md/.txt= text. Defined inconfig.pyasIMAGE_EXTENSIONS/TEXT_EXTENSIONS. - Prompt resolution: if the
promptstring is a path to an existing file, its contents are read; otherwise it's used as-is. Done inbuilder.py:_resolve_prompt(). - BFL client is synchronous: wrapped in
asyncio.to_thread()inproviders/image.py. UsesClientConfig(sync=True, timeout=300)for internal polling. - Mistral client is natively async: uses
complete_async()directly inproviders/text.py. - Incremental builds:
.bulkgen.state.yamltracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty. - Error isolation: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building.
- State saved per-generation: partial progress survives crashes. At most one generation of work is lost.
Provider interface
All providers implement bulkgen.providers.Provider:
async def generate(self, target_name, target_config, resolved_prompt, resolved_model, project_dir) -> None
The provider writes the result file to project_dir / target_name.
Image provider specifics (BFL)
image_promptfield: base64-encoded reference image (fromtarget.reference_image)control_imagefield: base64-encoded control image (fromtarget.control_images)- Result image URL is in
result.result["sample"], downloaded via httpx - Supported models:
flux-dev,flux-pro,flux-pro-1.1,flux-pro-1.1-ultra,flux-kontext-pro,flux-pro-1.0-canny,flux-pro-1.0-depth,flux-pro-1.0-fill,flux-pro-1.0-expand
Text provider specifics (Mistral)
- Text input files are appended to the prompt with
--- Contents of <name> ---headers - Image inputs are noted as
[Attached image: <name>](no actual vision/multimodal yet) - Raw LLM response is written directly to the output file, no post-processing
Environment variables
MISTRAL_API_KEY- required for text targetsBFL_API_KEY- required for image targets
Dependencies
typer- CLI frameworkpydantic/pydantic-settings[yaml]- config parsing (pyyaml comes via the yaml extra)networkx- dependency graphblackforest- BlackForestLabs API client (sync, usesrequests)mistralai- Mistral API client (supports async)httpx- async HTTP for downloading BFL result images (transitive via mistralai)hatchling- build backend