- Replace pyright with basedpyright in devenv.nix (custom hook) - Add basedpyright to devenv packages - Fix all basedpyright warnings: add DiGraph[str] type args, annotate class attributes, narrow SyncResponse, handle unused call results, suppress unavoidable Any from yaml.safe_load and untyped blackforest - Replace pydantic-settings[yaml] with direct pyyaml dependency - Update CLAUDE.md to reflect basedpyright and dependency changes
5.6 KiB
5.6 KiB
CLAUDE.md - bulkgen development guide
Project overview
bulkgen is a make-like build tool for AI-generated artifacts (images and text). A YAML config file defines targets with dependencies; bulkgen builds a DAG with networkx and executes generation in parallel topological order using Mistral (text) and BlackForestLabs (images) as providers.
Commands
uv sync # install dependencies
uv run bulkgen build # build all targets
uv run bulkgen build X # build target X and its transitive deps
uv run bulkgen clean # remove generated artifacts + state file
uv run bulkgen graph # print dependency graph with stages
uv run pytest # run tests
Code quality
Pre-commit hooks run automatically on git commit:
- basedpyright - strict static type checking (config:
pyrightconfig.jsonpoints to.devenv/state/venv) - ruff check - linting with auto-fix
- ruff format - formatting
- commitizen - enforces conventional commit messages (
feat:,fix:,chore:, etc.)
Run manually:
basedpyright
ruff check
ruff format --check
Code style conventions
- All function signatures must be fully typed. No
Anyunless truly unavoidable. - Use
pathlib.Patheverywhere, neveros.path. - Use
from __future__ import annotationsin every module. - Use modern typing:
str | None(notOptional[str]),Self,override,Annotated. - Pydantic
BaseModelfor data that serializes to/from YAML.dataclassfor internal-only data structures (e.g.BuildResult). - Errors: raise with
msg = "..."; raise ValueError(msg)pattern (ruff W0 compliance). - Commit messages follow conventional commits (
feat:,fix:,refactor:,chore:).
Architecture
Module structure
main.py # Entry point: imports and runs bulkgen.cli.app
bulkgen/
__init__.py
cli.py # Typer CLI: build, clean, graph commands
config.py # Pydantic models for YAML config
graph.py # networkx DAG construction and traversal
builder.py # Build orchestrator: incremental + parallel
state.py # .bulkgen.state.yaml hash tracking
providers/
__init__.py # Abstract Provider base class (ABC)
image.py # BlackForestLabs image generation
text.py # Mistral text generation
Data flow
- cli.py finds the
*.bulkgen.yamlin cwd, callsload_config()fromconfig.py - config.py parses YAML into
ProjectConfig(pydantic), which containsDefaultsanddict[str, TargetConfig] - graph.py builds an
nx.DiGraphfrom target dependencies.get_build_order()usesnx.topological_generations()to return parallel batches - builder.py
run_build()iterates generations. Per generation:- Checks each target for dirtiness via
state.py(SHA-256 hashes of inputs, prompt, model, extra params) - Skips targets whose deps already failed
- Runs dirty targets concurrently with
asyncio.gather() - Records state after each generation (crash resilience)
- Checks each target for dirtiness via
- providers/ dispatch by
TargetType(inferred from file extension)
Key design decisions
- Target type inference:
.png/.jpg/.jpeg/.webp= image,.md/.txt= text. Defined inconfig.pyasIMAGE_EXTENSIONS/TEXT_EXTENSIONS. - Prompt resolution: if the
promptstring is a path to an existing file, its contents are read; otherwise it's used as-is. Done inbuilder.py:_resolve_prompt(). - BFL client is synchronous: wrapped in
asyncio.to_thread()inproviders/image.py. UsesClientConfig(sync=True, timeout=300)for internal polling. - Mistral client is natively async: uses
complete_async()directly inproviders/text.py. - Incremental builds:
.bulkgen.state.yamltracks per-target: input file hashes, prompt hash, model name, and extra params hash. Any change marks the target dirty. - Error isolation: if a target fails, its dependents are marked "Dependency failed" but independent targets continue building.
- State saved per-generation: partial progress survives crashes. At most one generation of work is lost.
Provider interface
All providers implement bulkgen.providers.Provider:
async def generate(self, target_name, target_config, resolved_prompt, resolved_model, project_dir) -> None
The provider writes the result file to project_dir / target_name.
Image provider specifics (BFL)
image_promptfield: base64-encoded reference image (fromtarget.reference_image)control_imagefield: base64-encoded control image (fromtarget.control_images)- Result image URL is in
result.result["sample"], downloaded via httpx - Supported models:
flux-dev,flux-pro,flux-pro-1.1,flux-pro-1.1-ultra,flux-kontext-pro,flux-pro-1.0-canny,flux-pro-1.0-depth,flux-pro-1.0-fill,flux-pro-1.0-expand
Text provider specifics (Mistral)
- Text input files are appended to the prompt with
--- Contents of <name> ---headers - Image inputs are noted as
[Attached image: <name>](no actual vision/multimodal yet) - Raw LLM response is written directly to the output file, no post-processing
Environment variables
MISTRAL_API_KEY- required for text targetsBFL_API_KEY- required for image targets
Dependencies
typer- CLI frameworkpydantic- data validation and config modelspyyaml- YAML parsingnetworkx- dependency graphblackforest- BlackForestLabs API client (sync, usesrequests; no type stubs)mistralai- Mistral API client (supports async)httpx- async HTTP for downloading BFL result images (transitive via mistralai)hatchling- build backend