diff --git a/CLAUDE.md b/CLAUDE.md index a0751ac..2ca4675 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -50,7 +50,8 @@ main.py # Entry point: imports and runs hokusai.cli.app hokusai/ __init__.py cli.py # Typer CLI: build, regenerate, clean, graph, init, models commands - config.py # Pydantic models for YAML config + config.py # Pydantic models for YAML config + loop expansion at load time + expand.py # Loop variable extraction, substitution, and target expansion graph.py # networkx DAG construction and traversal builder.py # Build orchestrator: incremental + parallel state.py # .hokusai.state.yaml hash tracking @@ -71,7 +72,7 @@ hokusai/ ### Data flow 1. **cli.py** finds the `*.hokusai.yaml` in cwd, calls `load_config()` from `config.py` -2. **config.py** parses YAML into `ProjectConfig` (pydantic), which contains `Defaults` and `dict[str, TargetConfig]` +2. **config.py** parses YAML, expands loop templates via `expand.py` (cartesian product), then validates into `ProjectConfig` (pydantic) which contains `Defaults`, `loops`, and `dict[str, TargetConfig]` 3. **graph.py** builds an `nx.DiGraph` from target dependencies. `get_build_order()` uses `nx.topological_generations()` to return parallel batches 4. **builder.py** `run_build()` iterates generations. Per generation: - Checks each target for dirtiness via `state.py` (SHA-256 hashes of inputs, prompt, model, extra params) @@ -85,7 +86,9 @@ hokusai/ - **Target type inference**: `.png/.jpg/.jpeg/.webp` = image, `.md/.txt` = text. Defined in `config.py` as `IMAGE_EXTENSIONS` / `TEXT_EXTENSIONS`. - **Prompt resolution**: if the `prompt` string is a path to an existing file, its contents are read; otherwise it's used as-is. Supports `{filename}` placeholders. Done in `prompt.py`. - **Model resolution**: `resolve.py` maps target config + defaults to a `ModelInfo` with provider, model name, and capabilities. +- **Content targets**: targets with `content:` write literal text to the file; no provider needed, no archiving on overwrite. State tracks the content string for incremental skip. - **Download targets**: targets with `download:` URL are fetched via httpx; state tracks the URL for incremental skip. +- **Loop expansion**: `loops:` defines named lists of values. Targets with `[var]` in their name are expanded via cartesian product at config load time (in `expand.py`). Only variables appearing in the target name trigger expansion. Explicit targets override expanded ones. Escaping: `\[var]` → literal `[var]`. Substitution applies to all string fields (prompt, content, download, inputs, reference_images, control_images). The rest of the system sees only expanded targets. - **BFL client is async**: custom async client in `providers/bfl.py` polls for completion. - **Mistral client is natively async**: uses `complete_async()` directly. - **OpenAI clients are async**: use the official `openai` SDK with async methods. diff --git a/README.md b/README.md index 22143f2..2b19448 100644 --- a/README.md +++ b/README.md @@ -67,6 +67,7 @@ The config file must be named `.hokusai.yaml` and placed in your proje | Field | Description | |---|---| | `defaults` | Default model names (optional) | +| `loops` | Loop variables for target template expansion (optional) | | `archive_folder` | Directory to move previous outputs into before rebuilding (optional) | | `targets` | Map of output filenames to their configuration | @@ -90,6 +91,7 @@ defaults: | `width` | int | Image width in pixels | | `height` | int | Image height in pixels | | `download` | string | URL to download instead of generating (mutually exclusive with prompt) | +| `content` | string | Literal text to write to the file (mutually exclusive with prompt/download) | Target type is inferred from the file extension: - **Image**: `.png`, `.jpg`, `.jpeg`, `.webp` @@ -151,6 +153,94 @@ targets: Download targets participate in dependency resolution like any other target. They are skipped if the URL hasn't changed. +### Content targets + +Targets can write literal text content directly to a file without invoking any AI provider: + +```yaml +targets: + config.txt: + content: "Some static configuration" + + data.csv: + content: | + name,value + alpha,1 + beta,2 +``` + +Content targets don't require API keys and are not archived when overwritten. They participate in dependency resolution like any other target, so generated targets can depend on them. + +### Loops + +Define `loops` at the top level to generate multiple targets from a template using cartesian products: + +```yaml +loops: + color: + - red + - blue + - green + size: + - small + - large + +targets: + card-[color]-[size].png: + prompt: "A [color] card in [size] format" + width: 1024 + height: 768 +``` + +This expands to 6 targets: `card-red-small.png`, `card-red-large.png`, `card-blue-small.png`, etc. Loop variables are substituted in all string fields: prompts, inputs, reference images, control images, download URLs, and content. + +Only variables that appear in the target name cause expansion. A target without any `[var]` references in its name is not looped: + +```yaml +loops: + id: + - 1 + - 2 + +targets: + data-[id].txt: + content: "Data for [id]" + + # This target depends on ALL expanded data files + summary.md: + prompt: "Summarize everything" + inputs: + - data-1.txt + - data-2.txt +``` + +Loop variables also work across dependent targets: + +```yaml +targets: + data-[id].txt: + content: "Data for item [id]" + + report-[id].md: + prompt: "Write a report about item [id]" + inputs: + - data-[id].txt +``` + +**Explicit overrides**: If you define both a template and an explicit target that would collide, the explicit target wins: + +```yaml +targets: + image-[n].png: + prompt: "Generic image [n]" + image-3.png: + prompt: "Special custom image" # this overrides the template for n=3 +``` + +**Escaping**: Use `\[var]` to produce a literal `[var]` in the output. + +Loop values are always treated as strings. Numbers and booleans in YAML are automatically converted. + ### Archiving previous outputs Set `archive_folder` at the top level to preserve previous versions of generated files. When a target is rebuilt, the existing output is moved to the archive folder with an incrementing numeric suffix: diff --git a/hokusai/config.py b/hokusai/config.py index 40d37df..15ff2cc 100644 --- a/hokusai/config.py +++ b/hokusai/config.py @@ -75,6 +75,7 @@ class ProjectConfig(BaseModel): """Top-level configuration parsed from ``.hokusai.yaml``.""" defaults: Defaults = Defaults() + loops: dict[str, list[str]] = {} archive_folder: str | None = None targets: dict[str, TargetConfig] @@ -95,8 +96,30 @@ def target_type_from_capabilities(capabilities: frozenset[Capability]) -> Target return TargetType.TEXT +def _normalize_loops(raw: dict[str, object]) -> None: + """Normalize loop values to strings in-place.""" + loops = raw.get("loops") + if not isinstance(loops, dict): + return + for key in list(loops.keys()): # pyright: ignore[reportUnknownVariableType,reportUnknownArgumentType] + values = loops[key] # pyright: ignore[reportUnknownVariableType] + if not isinstance(values, list): + msg = f"Loop '{key}' must be a list" + raise ValueError(msg) + loops[key] = [str(v) for v in values] # pyright: ignore[reportUnknownVariableType,reportUnknownArgumentType] + + def load_config(config_path: Path) -> ProjectConfig: """Load and validate a ``.hokusai.yaml`` file.""" + from hokusai.expand import expand_targets + with config_path.open() as f: raw = yaml.safe_load(f) # pyright: ignore[reportAny] + + _normalize_loops(raw) # pyright: ignore[reportAny] + + loops: dict[str, list[str]] = raw.get("loops") or {} # pyright: ignore[reportAny] + if loops and "targets" in raw: + raw["targets"] = expand_targets(raw["targets"], loops) # pyright: ignore[reportAny] + return ProjectConfig.model_validate(raw) diff --git a/hokusai/expand.py b/hokusai/expand.py new file mode 100644 index 0000000..59c1eb8 --- /dev/null +++ b/hokusai/expand.py @@ -0,0 +1,134 @@ +"""Loop variable expansion for target templates.""" + +from __future__ import annotations + +import itertools +import re +from collections.abc import Mapping +from copy import deepcopy + +_LOOP_VAR_RE = re.compile(r"(\\*)\[([^\]]+)\]") +"""Match ``[varname]`` with optional leading backslashes. + +Groups: + 1. Zero or more backslashes immediately before the ``[`` + 2. The variable name between the brackets +""" + + +def extract_loop_variables(text: str) -> list[str]: + """Return loop variable names referenced as ``[var]`` in *text*. + + Only non-escaped references are returned (even number of leading + backslashes, including zero). Duplicates are removed but order is + preserved. + """ + seen: set[str] = set() + result: list[str] = [] + for match in _LOOP_VAR_RE.finditer(text): + n_bs = len(match.group(1)) + if n_bs % 2 == 0: + name = match.group(2) + if name not in seen: + seen.add(name) + result.append(name) + return result + + +def substitute_loop_variables(text: str, bindings: dict[str, str]) -> str: + """Replace ``[var]`` placeholders with values from *bindings*. + + Escaping rules (same logic as prompt placeholders): + + * ``[var]`` → value of *var* + * ``\\[var]`` → literal ``[var]`` + * ``\\\\[var]`` → literal ``\\`` + value of *var* + """ + + def _replace(match: re.Match[str]) -> str: + backslashes = match.group(1) + name = match.group(2) + n_bs = len(backslashes) + + if n_bs % 2 == 1: + return "\\" * (n_bs // 2) + "[" + name + "]" + + prefix = "\\" * (n_bs // 2) + if name in bindings: + return prefix + bindings[name] + return match.group(0) + + return _LOOP_VAR_RE.sub(_replace, text) + + +def _substitute_value(value: object, bindings: dict[str, str]) -> object: + """Recursively substitute loop variables in a config value.""" + if isinstance(value, str): + return substitute_loop_variables(value, bindings) + if isinstance(value, list): + return [_substitute_value(item, bindings) for item in value] # pyright: ignore[reportUnknownArgumentType,reportUnknownVariableType] + if isinstance(value, dict): + return { + _substitute_value(k, bindings): _substitute_value(v, bindings) # pyright: ignore[reportUnknownArgumentType] + for k, v in value.items() # pyright: ignore[reportUnknownVariableType] + } + return value + + +def expand_targets( + raw_targets: Mapping[str, object], + loops: Mapping[str, list[str]], +) -> dict[str, object]: + """Expand templated targets using loop variable cartesian products. + + Targets whose name contains ``[var]`` references are expanded for every + combination of the referenced loop variables. Targets without any + references are passed through unchanged. + + If an expanded name collides with an explicitly defined target, the + explicit target wins. If two *different* templates expand to the same + name, a :class:`ValueError` is raised. + """ + explicit: dict[str, object] = {} + templates: list[tuple[str, object]] = [] + + for name, cfg in raw_targets.items(): + vars_in_name = extract_loop_variables(name) + if vars_in_name: + templates.append((name, cfg)) + else: + explicit[name] = cfg + + expanded: dict[str, object] = {} + expanded_from: dict[str, str] = {} + + for tmpl_name, tmpl_cfg in templates: + vars_in_name = extract_loop_variables(tmpl_name) + + for var in vars_in_name: + if var not in loops: + msg = ( + f"Target '{tmpl_name}' references undefined loop variable '[{var}]'" + ) + raise ValueError(msg) + + var_values = [loops[v] for v in vars_in_name] + for combo in itertools.product(*var_values): + bindings = dict(zip(vars_in_name, combo)) + expanded_name = substitute_loop_variables(tmpl_name, bindings) + + if expanded_name in explicit: + continue + + if expanded_name in expanded and expanded_from[expanded_name] != tmpl_name: + msg = ( + f"Duplicate expanded target '{expanded_name}' from templates " + f"'{expanded_from[expanded_name]}' and '{tmpl_name}'" + ) + raise ValueError(msg) + + expanded_cfg = _substitute_value(deepcopy(tmpl_cfg), bindings) + expanded[expanded_name] = expanded_cfg + expanded_from[expanded_name] = tmpl_name + + return {**expanded, **explicit} diff --git a/tests/test_builder.py b/tests/test_builder.py index 340f32f..2bb0824 100644 --- a/tests/test_builder.py +++ b/tests/test_builder.py @@ -709,6 +709,68 @@ class TestContentTarget: assert result.failed == {} +class TestLoopExpansion: + """End-to-end tests for loop-expanded targets in builds.""" + + async def test_loop_content_targets_build( + self, project_dir: Path, write_config: WriteConfig + ) -> None: + config = write_config( + { + "loops": {"n": ["1", "2", "3"]}, + "targets": {"file-[n].txt": {"content": "Value [n]"}}, + } + ) + with patch("hokusai.builder._create_providers", return_value=_fake_providers()): + result = await run_build(config, project_dir, _PROJECT) + + assert set(result.built) == {"file-1.txt", "file-2.txt", "file-3.txt"} + assert (project_dir / "file-1.txt").read_text() == "Value 1" + assert (project_dir / "file-2.txt").read_text() == "Value 2" + assert (project_dir / "file-3.txt").read_text() == "Value 3" + + async def test_loop_incremental_skip( + self, project_dir: Path, write_config: WriteConfig + ) -> None: + config = write_config( + { + "loops": {"n": ["1", "2"]}, + "targets": {"file-[n].txt": {"content": "Value [n]"}}, + } + ) + with patch("hokusai.builder._create_providers", return_value=_fake_providers()): + r1 = await run_build(config, project_dir, _PROJECT) + assert len(r1.built) == 2 + + r2 = await run_build(config, project_dir, _PROJECT) + assert r2.built == [] + assert set(r2.skipped) == {"file-1.txt", "file-2.txt"} + + async def test_loop_with_dependency_chain( + self, project_dir: Path, write_config: WriteConfig + ) -> None: + config = write_config( + { + "loops": {"id": ["a", "b"]}, + "targets": { + "data-[id].txt": {"content": "Data for [id]"}, + "summary-[id].txt": { + "prompt": "Summarize [id]", + "inputs": ["data-[id].txt"], + }, + }, + } + ) + with patch("hokusai.builder._create_providers", return_value=_fake_providers()): + result = await run_build(config, project_dir, _PROJECT) + + assert "data-a.txt" in result.built + assert "data-b.txt" in result.built + assert "summary-a.txt" in result.built + assert "summary-b.txt" in result.built + assert result.failed == {} + + class TestPlaceholderPrompts: """Tests for prompt placeholder substitution in builds.""" diff --git a/tests/test_config.py b/tests/test_config.py index e782157..fb831e8 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -84,6 +84,41 @@ class TestLoadConfig: with pytest.raises(Exception): _ = load_config(config_path) + def test_config_with_loops(self, project_dir: Path) -> None: + config_path = project_dir / "test.hokusai.yaml" + _ = config_path.write_text( + yaml.dump( + { + "loops": {"a": [1, 2]}, + "targets": {"file-[a].txt": {"content": "Value [a]"}}, + } + ) + ) + config = load_config(config_path) + + assert "file-1.txt" in config.targets + assert "file-2.txt" in config.targets + assert "file-[a].txt" not in config.targets + + t1 = config.targets["file-1.txt"] + assert isinstance(t1, ContentTargetConfig) + assert t1.content == "Value 1" + + def test_config_loops_normalize_values_to_strings(self, project_dir: Path) -> None: + config_path = project_dir / "test.hokusai.yaml" + _ = config_path.write_text( + yaml.dump( + { + "loops": {"x": [True, 3.14]}, + "targets": {"out-[x].txt": {"content": "[x]"}}, + } + ) + ) + config = load_config(config_path) + + assert "out-True.txt" in config.targets + assert "out-3.14.txt" in config.targets + def test_content_target(self, project_dir: Path) -> None: config_path = project_dir / "test.hokusai.yaml" _ = config_path.write_text( diff --git a/tests/test_expand.py b/tests/test_expand.py new file mode 100644 index 0000000..75f880f --- /dev/null +++ b/tests/test_expand.py @@ -0,0 +1,232 @@ +"""Unit tests for hokusai.expand.""" + +from __future__ import annotations + +import pytest + +from hokusai.expand import ( + expand_targets, + extract_loop_variables, + substitute_loop_variables, +) + + +class TestExtractLoopVariables: + """Tests for extracting [var] references from strings.""" + + def test_single_variable(self) -> None: + assert extract_loop_variables("image-[a].png") == ["a"] + + def test_multiple_variables(self) -> None: + assert extract_loop_variables("card-[size]-[color].png") == ["size", "color"] + + def test_no_variables(self) -> None: + assert extract_loop_variables("plain.png") == [] + + def test_escaped_variable(self) -> None: + assert extract_loop_variables(r"file-\[a].png") == [] + + def test_mixed_escaped_and_real(self) -> None: + assert extract_loop_variables(r"file-\[a]-[b].png") == ["b"] + + def test_double_backslash_is_not_escaped(self) -> None: + assert extract_loop_variables("file-\\\\[a].png") == ["a"] + + def test_deduplicates(self) -> None: + assert extract_loop_variables("[a]-[a].png") == ["a"] + + def test_preserves_order(self) -> None: + assert extract_loop_variables("[b]-[a]-[c].png") == ["b", "a", "c"] + + +class TestSubstituteLoopVariables: + """Tests for substituting [var] with values.""" + + def test_single_substitution(self) -> None: + result = substitute_loop_variables("image-[a].png", {"a": "1"}) + assert result == "image-1.png" + + def test_multiple_substitutions(self) -> None: + result = substitute_loop_variables( + "card-[size]-[color].png", {"size": "large", "color": "red"} + ) + assert result == "card-large-red.png" + + def test_escaped_not_substituted(self) -> None: + result = substitute_loop_variables(r"file-\[a].png", {"a": "1"}) + assert result == "file-[a].png" + + def test_double_backslash_substituted(self) -> None: + result = substitute_loop_variables("file-\\\\[a].png", {"a": "1"}) + assert result == "file-\\1.png" + + def test_unknown_variable_left_as_is(self) -> None: + result = substitute_loop_variables("file-[unknown].png", {"a": "1"}) + assert result == "file-[unknown].png" + + def test_no_variables(self) -> None: + result = substitute_loop_variables("plain.png", {"a": "1"}) + assert result == "plain.png" + + +class TestExpandTargets: + """Tests for full target expansion.""" + + def test_single_variable_expansion(self) -> None: + raw: dict[str, object] = {"image-[a].png": {"prompt": "Draw [a]"}} + loops = {"a": ["1", "2", "3"]} + result = expand_targets(raw, loops) + + assert len(result) == 3 + assert result["image-1.png"] == {"prompt": "Draw 1"} + assert result["image-2.png"] == {"prompt": "Draw 2"} + assert result["image-3.png"] == {"prompt": "Draw 3"} + + def test_cartesian_product(self) -> None: + raw: dict[str, object] = {"card-[a]-[b].png": {"prompt": "[a] [b]"}} + loops = {"a": ["1", "2"], "b": ["x", "y"]} + result = expand_targets(raw, loops) + + assert len(result) == 4 + assert result["card-1-x.png"] == {"prompt": "1 x"} + assert result["card-1-y.png"] == {"prompt": "1 y"} + assert result["card-2-x.png"] == {"prompt": "2 x"} + assert result["card-2-y.png"] == {"prompt": "2 y"} + + def test_partial_loop_only_referenced_vars(self) -> None: + raw: dict[str, object] = {"image-[a].png": {"prompt": "Draw [a]"}} + loops = {"a": ["1", "2"], "b": ["x", "y"]} + result = expand_targets(raw, loops) + + assert len(result) == 2 + assert "image-1.png" in result + assert "image-2.png" in result + + def test_non_template_target_passed_through(self) -> None: + raw: dict[str, object] = { + "image-[a].png": {"prompt": "Draw [a]"}, + "static.txt": {"content": "hello"}, + } + loops = {"a": ["1", "2"]} + result = expand_targets(raw, loops) + + assert len(result) == 3 + assert result["static.txt"] == {"content": "hello"} + + def test_explicit_target_overrides_expanded(self) -> None: + raw: dict[str, object] = { + "image-[a].png": {"prompt": "Draw [a]"}, + "image-1.png": {"prompt": "Custom prompt for 1"}, + } + loops = {"a": ["1", "2"]} + result = expand_targets(raw, loops) + + assert len(result) == 2 + assert result["image-1.png"] == {"prompt": "Custom prompt for 1"} + assert result["image-2.png"] == {"prompt": "Draw 2"} + + def test_substitution_in_inputs(self) -> None: + raw: dict[str, object] = { + "out-[a].txt": { + "prompt": "Summarize [a]", + "inputs": ["data-[a].txt"], + } + } + loops = {"a": ["x", "y"]} + result = expand_targets(raw, loops) + + assert result["out-x.txt"] == { + "prompt": "Summarize x", + "inputs": ["data-x.txt"], + } + assert result["out-y.txt"] == { + "prompt": "Summarize y", + "inputs": ["data-y.txt"], + } + + def test_substitution_in_reference_images(self) -> None: + raw: dict[str, object] = { + "out-[a].png": { + "prompt": "Enhance", + "reference_images": ["ref-[a].png"], + } + } + loops = {"a": ["1", "2"]} + result = expand_targets(raw, loops) + + assert result["out-1.png"]["reference_images"] == ["ref-1.png"] # pyright: ignore[reportIndexIssue] + assert result["out-2.png"]["reference_images"] == ["ref-2.png"] # pyright: ignore[reportIndexIssue] + + def test_substitution_in_content(self) -> None: + raw: dict[str, object] = {"file-[a].txt": {"content": "Value is [a]"}} + loops = {"a": ["x", "y"]} + result = expand_targets(raw, loops) + + assert result["file-x.txt"] == {"content": "Value is x"} + assert result["file-y.txt"] == {"content": "Value is y"} + + def test_substitution_in_download(self) -> None: + raw: dict[str, object] = { + "file-[a].png": {"download": "https://example.com/[a].png"} + } + loops = {"a": ["cat", "dog"]} + result = expand_targets(raw, loops) + + assert result["file-cat.png"] == {"download": "https://example.com/cat.png"} + assert result["file-dog.png"] == {"download": "https://example.com/dog.png"} + + def test_escaped_brackets_preserved(self) -> None: + raw: dict[str, object] = {r"image-[a].png": {"prompt": r"Draw \[a] for [a]"}} + loops = {"a": ["1"]} + result = expand_targets(raw, loops) + + assert result["image-1.png"] == {"prompt": "Draw [a] for 1"} + + def test_undefined_variable_raises(self) -> None: + raw: dict[str, object] = {"image-[missing].png": {"prompt": "x"}} + loops = {"a": ["1"]} + + with pytest.raises(ValueError, match="undefined loop variable"): + _ = expand_targets(raw, loops) + + def test_duplicate_from_different_templates_raises(self) -> None: + raw: dict[str, object] = { + "[a]-[b].png": {"prompt": "first"}, + "[b]-[a].png": {"prompt": "second"}, + } + loops = {"a": ["x"], "b": ["x"]} + + with pytest.raises(ValueError, match="Duplicate expanded target"): + _ = expand_targets(raw, loops) + + def test_empty_loops_passes_through(self) -> None: + raw: dict[str, object] = {"out.txt": {"prompt": "hello"}} + result = expand_targets(raw, {}) + assert result == {"out.txt": {"prompt": "hello"}} + + def test_cross_reference_between_expanded_targets(self) -> None: + raw: dict[str, object] = { + "data-[id].txt": {"content": "Data for [id]"}, + "summary-[id].txt": { + "prompt": "Summarize", + "inputs": ["data-[id].txt"], + }, + } + loops = {"id": ["a", "b"]} + result = expand_targets(raw, loops) + + assert len(result) == 4 + assert result["summary-a.txt"]["inputs"] == ["data-a.txt"] # pyright: ignore[reportIndexIssue] + assert result["summary-b.txt"]["inputs"] == ["data-b.txt"] # pyright: ignore[reportIndexIssue] + + def test_substitution_in_control_images(self) -> None: + raw: dict[str, object] = { + "out-[a].png": { + "prompt": "Generate", + "control_images": ["ctrl-[a].png"], + } + } + loops = {"a": ["1"]} + result = expand_targets(raw, loops) + + assert result["out-1.png"]["control_images"] == ["ctrl-1.png"] # pyright: ignore[reportIndexIssue]