feat: add content targets and loop expansion for target templates
All checks were successful
Continuous Integration / Build Package (push) Successful in 25s
Continuous Integration / Lint, Check & Test (push) Successful in 44s

Content targets write literal text to files via 'content:' field,
without requiring an AI provider or API keys. They are not archived
when overwritten.

Loop expansion allows defining 'loops:' at the top level with named
lists of values. Targets with [var] in their name are expanded via
cartesian product. Variables are substituted in all string fields.
Explicit targets override expanded ones. Escaping: \[var] -> [var].
Expansion happens at config load time so the rest of the system
(builder, graph, state) sees only expanded targets.
This commit is contained in:
Konstantin Fickel 2026-02-21 18:39:13 +01:00
parent bb03975ece
commit 7503672942
Signed by: kfickel
GPG key ID: A793722F9933C1A5
7 changed files with 581 additions and 2 deletions

View file

@ -50,7 +50,8 @@ main.py # Entry point: imports and runs hokusai.cli.app
hokusai/
__init__.py
cli.py # Typer CLI: build, regenerate, clean, graph, init, models commands
config.py # Pydantic models for YAML config
config.py # Pydantic models for YAML config + loop expansion at load time
expand.py # Loop variable extraction, substitution, and target expansion
graph.py # networkx DAG construction and traversal
builder.py # Build orchestrator: incremental + parallel
state.py # .hokusai.state.yaml hash tracking
@ -71,7 +72,7 @@ hokusai/
### Data flow
1. **cli.py** finds the `*.hokusai.yaml` in cwd, calls `load_config()` from `config.py`
2. **config.py** parses YAML into `ProjectConfig` (pydantic), which contains `Defaults` and `dict[str, TargetConfig]`
2. **config.py** parses YAML, expands loop templates via `expand.py` (cartesian product), then validates into `ProjectConfig` (pydantic) which contains `Defaults`, `loops`, and `dict[str, TargetConfig]`
3. **graph.py** builds an `nx.DiGraph` from target dependencies. `get_build_order()` uses `nx.topological_generations()` to return parallel batches
4. **builder.py** `run_build()` iterates generations. Per generation:
- Checks each target for dirtiness via `state.py` (SHA-256 hashes of inputs, prompt, model, extra params)
@ -85,7 +86,9 @@ hokusai/
- **Target type inference**: `.png/.jpg/.jpeg/.webp` = image, `.md/.txt` = text. Defined in `config.py` as `IMAGE_EXTENSIONS` / `TEXT_EXTENSIONS`.
- **Prompt resolution**: if the `prompt` string is a path to an existing file, its contents are read; otherwise it's used as-is. Supports `{filename}` placeholders. Done in `prompt.py`.
- **Model resolution**: `resolve.py` maps target config + defaults to a `ModelInfo` with provider, model name, and capabilities.
- **Content targets**: targets with `content:` write literal text to the file; no provider needed, no archiving on overwrite. State tracks the content string for incremental skip.
- **Download targets**: targets with `download:` URL are fetched via httpx; state tracks the URL for incremental skip.
- **Loop expansion**: `loops:` defines named lists of values. Targets with `[var]` in their name are expanded via cartesian product at config load time (in `expand.py`). Only variables appearing in the target name trigger expansion. Explicit targets override expanded ones. Escaping: `\[var]` → literal `[var]`. Substitution applies to all string fields (prompt, content, download, inputs, reference_images, control_images). The rest of the system sees only expanded targets.
- **BFL client is async**: custom async client in `providers/bfl.py` polls for completion.
- **Mistral client is natively async**: uses `complete_async()` directly.
- **OpenAI clients are async**: use the official `openai` SDK with async methods.

View file

@ -67,6 +67,7 @@ The config file must be named `<anything>.hokusai.yaml` and placed in your proje
| Field | Description |
|---|---|
| `defaults` | Default model names (optional) |
| `loops` | Loop variables for target template expansion (optional) |
| `archive_folder` | Directory to move previous outputs into before rebuilding (optional) |
| `targets` | Map of output filenames to their configuration |
@ -90,6 +91,7 @@ defaults:
| `width` | int | Image width in pixels |
| `height` | int | Image height in pixels |
| `download` | string | URL to download instead of generating (mutually exclusive with prompt) |
| `content` | string | Literal text to write to the file (mutually exclusive with prompt/download) |
Target type is inferred from the file extension:
- **Image**: `.png`, `.jpg`, `.jpeg`, `.webp`
@ -151,6 +153,94 @@ targets:
Download targets participate in dependency resolution like any other target. They are skipped if the URL hasn't changed.
### Content targets
Targets can write literal text content directly to a file without invoking any AI provider:
```yaml
targets:
config.txt:
content: "Some static configuration"
data.csv:
content: |
name,value
alpha,1
beta,2
```
Content targets don't require API keys and are not archived when overwritten. They participate in dependency resolution like any other target, so generated targets can depend on them.
### Loops
Define `loops` at the top level to generate multiple targets from a template using cartesian products:
```yaml
loops:
color:
- red
- blue
- green
size:
- small
- large
targets:
card-[color]-[size].png:
prompt: "A [color] card in [size] format"
width: 1024
height: 768
```
This expands to 6 targets: `card-red-small.png`, `card-red-large.png`, `card-blue-small.png`, etc. Loop variables are substituted in all string fields: prompts, inputs, reference images, control images, download URLs, and content.
Only variables that appear in the target name cause expansion. A target without any `[var]` references in its name is not looped:
```yaml
loops:
id:
- 1
- 2
targets:
data-[id].txt:
content: "Data for [id]"
# This target depends on ALL expanded data files
summary.md:
prompt: "Summarize everything"
inputs:
- data-1.txt
- data-2.txt
```
Loop variables also work across dependent targets:
```yaml
targets:
data-[id].txt:
content: "Data for item [id]"
report-[id].md:
prompt: "Write a report about item [id]"
inputs:
- data-[id].txt
```
**Explicit overrides**: If you define both a template and an explicit target that would collide, the explicit target wins:
```yaml
targets:
image-[n].png:
prompt: "Generic image [n]"
image-3.png:
prompt: "Special custom image" # this overrides the template for n=3
```
**Escaping**: Use `\[var]` to produce a literal `[var]` in the output.
Loop values are always treated as strings. Numbers and booleans in YAML are automatically converted.
### Archiving previous outputs
Set `archive_folder` at the top level to preserve previous versions of generated files. When a target is rebuilt, the existing output is moved to the archive folder with an incrementing numeric suffix:

View file

@ -75,6 +75,7 @@ class ProjectConfig(BaseModel):
"""Top-level configuration parsed from ``<name>.hokusai.yaml``."""
defaults: Defaults = Defaults()
loops: dict[str, list[str]] = {}
archive_folder: str | None = None
targets: dict[str, TargetConfig]
@ -95,8 +96,30 @@ def target_type_from_capabilities(capabilities: frozenset[Capability]) -> Target
return TargetType.TEXT
def _normalize_loops(raw: dict[str, object]) -> None:
"""Normalize loop values to strings in-place."""
loops = raw.get("loops")
if not isinstance(loops, dict):
return
for key in list(loops.keys()): # pyright: ignore[reportUnknownVariableType,reportUnknownArgumentType]
values = loops[key] # pyright: ignore[reportUnknownVariableType]
if not isinstance(values, list):
msg = f"Loop '{key}' must be a list"
raise ValueError(msg)
loops[key] = [str(v) for v in values] # pyright: ignore[reportUnknownVariableType,reportUnknownArgumentType]
def load_config(config_path: Path) -> ProjectConfig:
"""Load and validate a ``.hokusai.yaml`` file."""
from hokusai.expand import expand_targets
with config_path.open() as f:
raw = yaml.safe_load(f) # pyright: ignore[reportAny]
_normalize_loops(raw) # pyright: ignore[reportAny]
loops: dict[str, list[str]] = raw.get("loops") or {} # pyright: ignore[reportAny]
if loops and "targets" in raw:
raw["targets"] = expand_targets(raw["targets"], loops) # pyright: ignore[reportAny]
return ProjectConfig.model_validate(raw)

134
hokusai/expand.py Normal file
View file

@ -0,0 +1,134 @@
"""Loop variable expansion for target templates."""
from __future__ import annotations
import itertools
import re
from collections.abc import Mapping
from copy import deepcopy
_LOOP_VAR_RE = re.compile(r"(\\*)\[([^\]]+)\]")
"""Match ``[varname]`` with optional leading backslashes.
Groups:
1. Zero or more backslashes immediately before the ``[``
2. The variable name between the brackets
"""
def extract_loop_variables(text: str) -> list[str]:
"""Return loop variable names referenced as ``[var]`` in *text*.
Only non-escaped references are returned (even number of leading
backslashes, including zero). Duplicates are removed but order is
preserved.
"""
seen: set[str] = set()
result: list[str] = []
for match in _LOOP_VAR_RE.finditer(text):
n_bs = len(match.group(1))
if n_bs % 2 == 0:
name = match.group(2)
if name not in seen:
seen.add(name)
result.append(name)
return result
def substitute_loop_variables(text: str, bindings: dict[str, str]) -> str:
"""Replace ``[var]`` placeholders with values from *bindings*.
Escaping rules (same logic as prompt placeholders):
* ``[var]`` value of *var*
* ``\\[var]`` literal ``[var]``
* ``\\\\[var]`` literal ``\\`` + value of *var*
"""
def _replace(match: re.Match[str]) -> str:
backslashes = match.group(1)
name = match.group(2)
n_bs = len(backslashes)
if n_bs % 2 == 1:
return "\\" * (n_bs // 2) + "[" + name + "]"
prefix = "\\" * (n_bs // 2)
if name in bindings:
return prefix + bindings[name]
return match.group(0)
return _LOOP_VAR_RE.sub(_replace, text)
def _substitute_value(value: object, bindings: dict[str, str]) -> object:
"""Recursively substitute loop variables in a config value."""
if isinstance(value, str):
return substitute_loop_variables(value, bindings)
if isinstance(value, list):
return [_substitute_value(item, bindings) for item in value] # pyright: ignore[reportUnknownArgumentType,reportUnknownVariableType]
if isinstance(value, dict):
return {
_substitute_value(k, bindings): _substitute_value(v, bindings) # pyright: ignore[reportUnknownArgumentType]
for k, v in value.items() # pyright: ignore[reportUnknownVariableType]
}
return value
def expand_targets(
raw_targets: Mapping[str, object],
loops: Mapping[str, list[str]],
) -> dict[str, object]:
"""Expand templated targets using loop variable cartesian products.
Targets whose name contains ``[var]`` references are expanded for every
combination of the referenced loop variables. Targets without any
references are passed through unchanged.
If an expanded name collides with an explicitly defined target, the
explicit target wins. If two *different* templates expand to the same
name, a :class:`ValueError` is raised.
"""
explicit: dict[str, object] = {}
templates: list[tuple[str, object]] = []
for name, cfg in raw_targets.items():
vars_in_name = extract_loop_variables(name)
if vars_in_name:
templates.append((name, cfg))
else:
explicit[name] = cfg
expanded: dict[str, object] = {}
expanded_from: dict[str, str] = {}
for tmpl_name, tmpl_cfg in templates:
vars_in_name = extract_loop_variables(tmpl_name)
for var in vars_in_name:
if var not in loops:
msg = (
f"Target '{tmpl_name}' references undefined loop variable '[{var}]'"
)
raise ValueError(msg)
var_values = [loops[v] for v in vars_in_name]
for combo in itertools.product(*var_values):
bindings = dict(zip(vars_in_name, combo))
expanded_name = substitute_loop_variables(tmpl_name, bindings)
if expanded_name in explicit:
continue
if expanded_name in expanded and expanded_from[expanded_name] != tmpl_name:
msg = (
f"Duplicate expanded target '{expanded_name}' from templates "
f"'{expanded_from[expanded_name]}' and '{tmpl_name}'"
)
raise ValueError(msg)
expanded_cfg = _substitute_value(deepcopy(tmpl_cfg), bindings)
expanded[expanded_name] = expanded_cfg
expanded_from[expanded_name] = tmpl_name
return {**expanded, **explicit}

View file

@ -709,6 +709,68 @@ class TestContentTarget:
assert result.failed == {}
class TestLoopExpansion:
"""End-to-end tests for loop-expanded targets in builds."""
async def test_loop_content_targets_build(
self, project_dir: Path, write_config: WriteConfig
) -> None:
config = write_config(
{
"loops": {"n": ["1", "2", "3"]},
"targets": {"file-[n].txt": {"content": "Value [n]"}},
}
)
with patch("hokusai.builder._create_providers", return_value=_fake_providers()):
result = await run_build(config, project_dir, _PROJECT)
assert set(result.built) == {"file-1.txt", "file-2.txt", "file-3.txt"}
assert (project_dir / "file-1.txt").read_text() == "Value 1"
assert (project_dir / "file-2.txt").read_text() == "Value 2"
assert (project_dir / "file-3.txt").read_text() == "Value 3"
async def test_loop_incremental_skip(
self, project_dir: Path, write_config: WriteConfig
) -> None:
config = write_config(
{
"loops": {"n": ["1", "2"]},
"targets": {"file-[n].txt": {"content": "Value [n]"}},
}
)
with patch("hokusai.builder._create_providers", return_value=_fake_providers()):
r1 = await run_build(config, project_dir, _PROJECT)
assert len(r1.built) == 2
r2 = await run_build(config, project_dir, _PROJECT)
assert r2.built == []
assert set(r2.skipped) == {"file-1.txt", "file-2.txt"}
async def test_loop_with_dependency_chain(
self, project_dir: Path, write_config: WriteConfig
) -> None:
config = write_config(
{
"loops": {"id": ["a", "b"]},
"targets": {
"data-[id].txt": {"content": "Data for [id]"},
"summary-[id].txt": {
"prompt": "Summarize [id]",
"inputs": ["data-[id].txt"],
},
},
}
)
with patch("hokusai.builder._create_providers", return_value=_fake_providers()):
result = await run_build(config, project_dir, _PROJECT)
assert "data-a.txt" in result.built
assert "data-b.txt" in result.built
assert "summary-a.txt" in result.built
assert "summary-b.txt" in result.built
assert result.failed == {}
class TestPlaceholderPrompts:
"""Tests for prompt placeholder substitution in builds."""

View file

@ -84,6 +84,41 @@ class TestLoadConfig:
with pytest.raises(Exception):
_ = load_config(config_path)
def test_config_with_loops(self, project_dir: Path) -> None:
config_path = project_dir / "test.hokusai.yaml"
_ = config_path.write_text(
yaml.dump(
{
"loops": {"a": [1, 2]},
"targets": {"file-[a].txt": {"content": "Value [a]"}},
}
)
)
config = load_config(config_path)
assert "file-1.txt" in config.targets
assert "file-2.txt" in config.targets
assert "file-[a].txt" not in config.targets
t1 = config.targets["file-1.txt"]
assert isinstance(t1, ContentTargetConfig)
assert t1.content == "Value 1"
def test_config_loops_normalize_values_to_strings(self, project_dir: Path) -> None:
config_path = project_dir / "test.hokusai.yaml"
_ = config_path.write_text(
yaml.dump(
{
"loops": {"x": [True, 3.14]},
"targets": {"out-[x].txt": {"content": "[x]"}},
}
)
)
config = load_config(config_path)
assert "out-True.txt" in config.targets
assert "out-3.14.txt" in config.targets
def test_content_target(self, project_dir: Path) -> None:
config_path = project_dir / "test.hokusai.yaml"
_ = config_path.write_text(

232
tests/test_expand.py Normal file
View file

@ -0,0 +1,232 @@
"""Unit tests for hokusai.expand."""
from __future__ import annotations
import pytest
from hokusai.expand import (
expand_targets,
extract_loop_variables,
substitute_loop_variables,
)
class TestExtractLoopVariables:
"""Tests for extracting [var] references from strings."""
def test_single_variable(self) -> None:
assert extract_loop_variables("image-[a].png") == ["a"]
def test_multiple_variables(self) -> None:
assert extract_loop_variables("card-[size]-[color].png") == ["size", "color"]
def test_no_variables(self) -> None:
assert extract_loop_variables("plain.png") == []
def test_escaped_variable(self) -> None:
assert extract_loop_variables(r"file-\[a].png") == []
def test_mixed_escaped_and_real(self) -> None:
assert extract_loop_variables(r"file-\[a]-[b].png") == ["b"]
def test_double_backslash_is_not_escaped(self) -> None:
assert extract_loop_variables("file-\\\\[a].png") == ["a"]
def test_deduplicates(self) -> None:
assert extract_loop_variables("[a]-[a].png") == ["a"]
def test_preserves_order(self) -> None:
assert extract_loop_variables("[b]-[a]-[c].png") == ["b", "a", "c"]
class TestSubstituteLoopVariables:
"""Tests for substituting [var] with values."""
def test_single_substitution(self) -> None:
result = substitute_loop_variables("image-[a].png", {"a": "1"})
assert result == "image-1.png"
def test_multiple_substitutions(self) -> None:
result = substitute_loop_variables(
"card-[size]-[color].png", {"size": "large", "color": "red"}
)
assert result == "card-large-red.png"
def test_escaped_not_substituted(self) -> None:
result = substitute_loop_variables(r"file-\[a].png", {"a": "1"})
assert result == "file-[a].png"
def test_double_backslash_substituted(self) -> None:
result = substitute_loop_variables("file-\\\\[a].png", {"a": "1"})
assert result == "file-\\1.png"
def test_unknown_variable_left_as_is(self) -> None:
result = substitute_loop_variables("file-[unknown].png", {"a": "1"})
assert result == "file-[unknown].png"
def test_no_variables(self) -> None:
result = substitute_loop_variables("plain.png", {"a": "1"})
assert result == "plain.png"
class TestExpandTargets:
"""Tests for full target expansion."""
def test_single_variable_expansion(self) -> None:
raw: dict[str, object] = {"image-[a].png": {"prompt": "Draw [a]"}}
loops = {"a": ["1", "2", "3"]}
result = expand_targets(raw, loops)
assert len(result) == 3
assert result["image-1.png"] == {"prompt": "Draw 1"}
assert result["image-2.png"] == {"prompt": "Draw 2"}
assert result["image-3.png"] == {"prompt": "Draw 3"}
def test_cartesian_product(self) -> None:
raw: dict[str, object] = {"card-[a]-[b].png": {"prompt": "[a] [b]"}}
loops = {"a": ["1", "2"], "b": ["x", "y"]}
result = expand_targets(raw, loops)
assert len(result) == 4
assert result["card-1-x.png"] == {"prompt": "1 x"}
assert result["card-1-y.png"] == {"prompt": "1 y"}
assert result["card-2-x.png"] == {"prompt": "2 x"}
assert result["card-2-y.png"] == {"prompt": "2 y"}
def test_partial_loop_only_referenced_vars(self) -> None:
raw: dict[str, object] = {"image-[a].png": {"prompt": "Draw [a]"}}
loops = {"a": ["1", "2"], "b": ["x", "y"]}
result = expand_targets(raw, loops)
assert len(result) == 2
assert "image-1.png" in result
assert "image-2.png" in result
def test_non_template_target_passed_through(self) -> None:
raw: dict[str, object] = {
"image-[a].png": {"prompt": "Draw [a]"},
"static.txt": {"content": "hello"},
}
loops = {"a": ["1", "2"]}
result = expand_targets(raw, loops)
assert len(result) == 3
assert result["static.txt"] == {"content": "hello"}
def test_explicit_target_overrides_expanded(self) -> None:
raw: dict[str, object] = {
"image-[a].png": {"prompt": "Draw [a]"},
"image-1.png": {"prompt": "Custom prompt for 1"},
}
loops = {"a": ["1", "2"]}
result = expand_targets(raw, loops)
assert len(result) == 2
assert result["image-1.png"] == {"prompt": "Custom prompt for 1"}
assert result["image-2.png"] == {"prompt": "Draw 2"}
def test_substitution_in_inputs(self) -> None:
raw: dict[str, object] = {
"out-[a].txt": {
"prompt": "Summarize [a]",
"inputs": ["data-[a].txt"],
}
}
loops = {"a": ["x", "y"]}
result = expand_targets(raw, loops)
assert result["out-x.txt"] == {
"prompt": "Summarize x",
"inputs": ["data-x.txt"],
}
assert result["out-y.txt"] == {
"prompt": "Summarize y",
"inputs": ["data-y.txt"],
}
def test_substitution_in_reference_images(self) -> None:
raw: dict[str, object] = {
"out-[a].png": {
"prompt": "Enhance",
"reference_images": ["ref-[a].png"],
}
}
loops = {"a": ["1", "2"]}
result = expand_targets(raw, loops)
assert result["out-1.png"]["reference_images"] == ["ref-1.png"] # pyright: ignore[reportIndexIssue]
assert result["out-2.png"]["reference_images"] == ["ref-2.png"] # pyright: ignore[reportIndexIssue]
def test_substitution_in_content(self) -> None:
raw: dict[str, object] = {"file-[a].txt": {"content": "Value is [a]"}}
loops = {"a": ["x", "y"]}
result = expand_targets(raw, loops)
assert result["file-x.txt"] == {"content": "Value is x"}
assert result["file-y.txt"] == {"content": "Value is y"}
def test_substitution_in_download(self) -> None:
raw: dict[str, object] = {
"file-[a].png": {"download": "https://example.com/[a].png"}
}
loops = {"a": ["cat", "dog"]}
result = expand_targets(raw, loops)
assert result["file-cat.png"] == {"download": "https://example.com/cat.png"}
assert result["file-dog.png"] == {"download": "https://example.com/dog.png"}
def test_escaped_brackets_preserved(self) -> None:
raw: dict[str, object] = {r"image-[a].png": {"prompt": r"Draw \[a] for [a]"}}
loops = {"a": ["1"]}
result = expand_targets(raw, loops)
assert result["image-1.png"] == {"prompt": "Draw [a] for 1"}
def test_undefined_variable_raises(self) -> None:
raw: dict[str, object] = {"image-[missing].png": {"prompt": "x"}}
loops = {"a": ["1"]}
with pytest.raises(ValueError, match="undefined loop variable"):
_ = expand_targets(raw, loops)
def test_duplicate_from_different_templates_raises(self) -> None:
raw: dict[str, object] = {
"[a]-[b].png": {"prompt": "first"},
"[b]-[a].png": {"prompt": "second"},
}
loops = {"a": ["x"], "b": ["x"]}
with pytest.raises(ValueError, match="Duplicate expanded target"):
_ = expand_targets(raw, loops)
def test_empty_loops_passes_through(self) -> None:
raw: dict[str, object] = {"out.txt": {"prompt": "hello"}}
result = expand_targets(raw, {})
assert result == {"out.txt": {"prompt": "hello"}}
def test_cross_reference_between_expanded_targets(self) -> None:
raw: dict[str, object] = {
"data-[id].txt": {"content": "Data for [id]"},
"summary-[id].txt": {
"prompt": "Summarize",
"inputs": ["data-[id].txt"],
},
}
loops = {"id": ["a", "b"]}
result = expand_targets(raw, loops)
assert len(result) == 4
assert result["summary-a.txt"]["inputs"] == ["data-a.txt"] # pyright: ignore[reportIndexIssue]
assert result["summary-b.txt"]["inputs"] == ["data-b.txt"] # pyright: ignore[reportIndexIssue]
def test_substitution_in_control_images(self) -> None:
raw: dict[str, object] = {
"out-[a].png": {
"prompt": "Generate",
"control_images": ["ctrl-[a].png"],
}
}
loops = {"a": ["1"]}
result = expand_targets(raw, loops)
assert result["out-1.png"]["control_images"] == ["ctrl-1.png"] # pyright: ignore[reportIndexIssue]