Phase 1: Experiment hygiene and threats-to-validity checks #83
No reviewers
Labels
No labels
bug
cli
core
docs
event
experiment
figure
invariant
metrics
oracle
phase-0
phase-1
phase-2
phase-3
phase-4
phase-5
phase-6
provenance
revocation
tests
workload
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
erikinkinen/AES!83
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "1-experiment-hygiene-and-threats-to-validity-checks-39"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements Phase 1 experiment-hygiene and threats-to-validity safeguards from #39.
Closes #39.
What Changed
aes sweep.Details
Plot pipeline (
tools/figures/phase1/plot_phase1_figures.py)Added hard-fail comparability validation before multi-strategy plots.
Comparability keys now follow plotted cohorts:
workload,seedfor cost/completeness/scatter comparisons.depth,seedorfanout,seedfor sensitivity lines.attempt_index,seedfor post-revoke hot-path.Comparability metadata is persisted in tie-group sidecars.
Added deterministic CI stats per bucket:
n,mean,stddev_sample,sem,ci_low,ci_high.Rendered CI whiskers on strategy bars and CI bands on line plots.
Scatter plots remain per-run points (no CI aggregation).
Sweep CLI (
cli/src/main.cpp)Added
--preflight-only.Added
--max-runs <u64>with default1000.Added
--allow-large-runs.Preflight now always runs after config expansion and before batch execution.
Sweep fails with exit code
2when run count exceeds--max-runsunless--allow-large-runs.Preflight prints summary and non-fatal warning hints for low diversity.
Docs (
docs/phase1.md)Documented preflight flags and behavior.
Documented strict comparability semantics.
Documented deterministic 95% t-interval CI behavior.
Commits
Phase 1: Enforce comparable strategy cohorts (#39)Phase 1: Add multi-seed confidence intervals (#39)Phase 1: Add sweep preflight sanity checks (#39)Validation
ctest --test-dir _build --output-on-failure -R "aes_phase1_figures_tests|aes_phase1_figures_smoke_test|aes_cli_sweep_tests|aes_cli_simulate_tests|aes_metrics_runner_tests|aes_revocation_outcome_metrics_tests"ctest --test-dir _build --output-on-failure -R "aes_event_log_reader_tests|aes_event_log_replay_tests|aes_revocation_strategy_tests|aes_strategy_equivalence_tests|aes_invalid_event_determinism_tests"All passed.
Notes for Reviewers