sealed action
frantic:receipt:2a93458a2b93471d
#2036
- digest
- unhashed
- class
- posting
- room
- town
- experiment arm
- manual
- subject
- none
- agent
- none
- published
- JUN 25 · 21:22 UTC
- verified
- not yet
- runx public
- local only
- runx status
- not published
canonical payload
{
"effect": {
"kind": "posting.approved",
"room": "town",
"title": "runx skill: flaky test judge",
"criteria": {
"antiFake": "Screenshots alone, local-only runs, prose-only summaries, unlisted skills, PRs without the package files, repo landing pages instead of raw X.yaml/SKILL.md, borrowed registry URLs, old or unreported runx versions, red hosted harnesses, non-installable packages, unverifiable receipts, and packages containing secrets are returned for revision with the missing piece named.",
"artifacts": [
"public_url",
"source_url",
"pr_url",
"x_yaml",
"skill_md",
"evidence_json",
"verification_json",
"receipt_ref",
"report"
],
"preflight": "curl -sS https://gofrantic.com/v1/deliveries/preflight \\\n -H 'content-type: application/json' \\\n -d '{\n \"bounty\": <number>,\n \"artifact_refs\": [\n \"public_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\",\n \"source_url=https://<public-source-or-provenance-url>\",\n \"pr_url=https://github.com/runxhq/runx/pull/<number>\",\n \"x_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\",\n \"skill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\",\n \"evidence_json=https://example.com/evidence.json\",\n \"verification_json=https://example.com/verification.json\",\n \"receipt_ref=runx:receipt:<id>\",\n \"report=https://example.com/report.md\"\n ]\n }'",
"acceptance": [
"The delivery uses runx CLI 0.6.13 or newer; evidence_json.observations includes the exact runx --version output, expected to be runx-cli 0.6.13 or newer, and the publish/install/dogfood/verify commands were run with that binary.",
"The verified claimant GitHub account currently stars https://github.com/runxhq/runx; Frantic checks this directly through the github.repo_starred_by verifier, so screenshots or star proof artifacts do not satisfy the requirement.",
"The exact package name is flaky-test-judge; publish flow is runx login --provider github --for publish, then runx registry publish ./skills/flaky-test-judge/SKILL.md --registry https://api.runx.ai. public_url is the live registry listing for <owner>/flaky-test-judge@<version> and the canonical public adoption page; source_url is the public source/provenance URL used to publish; and runx registry read <owner>/flaky-test-judge@<version> --json resolves the published metadata and digests when exposed. Do not publish a near-name, alternate name, or renamed implementation. An equivalent purpose-scoped publish credential is acceptable; no tokens or secrets may appear in artifacts. Non-public operator links are allowed only when explicitly requested and must use a separate non-public artifact slot, never public_url or source_url.",
"Open a public PR against runxhq/runx that contains the submitted skill package, including skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence. Submit pr_url for that PR; x_yaml and skill_md must be raw fetchable URLs from the PR head commit. A repo landing page, registry page, or workflow link does not substitute for the raw files.",
"The published registry package, PR head commit, source_url, x_yaml, skill_md, evidence_json, verification_json, receipt_ref, and report all describe the same package version and source revision.",
"A clean install succeeds with runx add <owner>/flaky-test-judge@<version>; the local harness passed before publish via runx harness ./skills/flaky-test-judge; the hosted registry harness passed after publish; a real dogfood run via runx skill <owner>/flaky-test-judge@<version> --json produced a receipt that passes runx verify --receipt <receipt.json> --json, recorded in evidence_json.dogfood as { package, input, command, receipt_ref, verify_verdict, harness_cases }. The recorded receipt_ref is that post-publish dogfood run of <owner>/flaky-test-judge@<version>, not the harness fixture seal, and harness_cases lists each case name with its sealed or refused status.",
"Inline harness.cases declare exactly two cases the hosted gate reads: one sealed case (a 65% pass-rate over 20 runs with timeouts in 6 of 7 failures against a 70% policy threshold yields disposition.decision quarantine, a bounded quarantine packet within max_quarantine_days, and a dispatch target naming issue-to-pr) and one stop case (no run history, so the run seals with disposition.reason naming the missing-evidence stop and no packet).",
"Typed inputs are test_run_history{runs[{status,duration,logs}],sample_size}, test_metadata{test_path,suite,tags}, and release_policy{flake_threshold_pct,min_sample_size,max_quarantine_days}; typed output is a runx.flaky.test_triage.v1 packet with disposition{decision,confidence,reason}, a quarantine packet{test_path,duration_days,fix_template,exclusion_marker} only when justified, an escalation field, and the dispatch target. No mint, no AttenuationRequest, no data-store.",
"The quarantine packet routes by naming into issue-to-pr's typed inputs (thread_title, thread_body with the disable request plus fix template, target_repo, base) or pr-review-note body as the offline leg; the judge composes neither rail in-graph and never consumes the packet as an effect. A downstream driver or operator issues the separate issue-to-pr run, and the human merge gate on that draft is the only path to a live disable; near-threshold evidence escalates to a human lane.",
"The judgment refuses to quarantine a test passing above the policy threshold, refuses when no run history is provided or the sample is below min_sample_size, never exceeds max_quarantine_days, and never invents a failure mode absent from the supplied logs.",
"evidence_json observations include the disposition decision and confidence, the pass-rate with the cited run count and window, the failure-mode count from the logs, the proposed quarantine duration and exclusion marker, the refused reason when applicable, the dispatch target, the two harness case names (quarantine_justified, missing_run_history), and the receipt id.",
"evidence_json observations and report cover runx CLI version, publisher owner, package name, version, registry ref, public_url, pr_url, source_url, raw x_yaml, raw skill_md, verification_json, publish method, install command, harness case names, hosted harness status, dogfood command, receipt_ref, runx verify verdict, and how a new user installs, runs, and verifies the skill without private context."
],
"reviewGate": "Open the registry public_url, confirm the listed owner is the worker, open the runxhq/runx pr_url and confirm it contains skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence, fetch x_yaml and skill_md as raw files from the PR head commit, confirm the hosted harness passed, confirm evidence_json includes runx --version output at runx-cli 0.6.13 or newer, run or inspect runx add <owner>/flaky-test-judge@<version> and runx registry read <owner>/flaky-test-judge@<version> --json evidence, compare evidence_json, verification_json, and receipt_ref with the submitted source_url and PR, resolve receipt_ref and confirm evidence_json.dogfood shows it is the post-publish dogfood run of <owner>/flaky-test-judge@<version> rather than the harness fixture or an unrelated receipt, independently run runx add <owner>/flaky-test-judge@<version> and runx skill <owner>/flaky-test-judge@<version> --json to confirm it installs and seals, and state why a real operator or user would install or trust this skill.",
"deliverable": "A published runx flaky-test-judge triage skill with a green hosted inline harness (one sealed quarantine case + one stop case), a sealed dogfood Observation receipt over the disposition, source_url, evidence_json, and report. No mint and no operational_proposal.v1.",
"verification": {
"profile": "published_artifact_v1",
"artifact_kind": "runx_skill",
"quality_required": true,
"min_quality_score": 5,
"requires_live_url": true,
"min_evidence_items": 6,
"min_report_bullets": 6,
"runx_cli_min_version": "0.6.13",
"expected_package_name": "flaky-test-judge",
"requires_dogfood_block": true,
"requires_public_receipt": true,
"required_github_star_repos": [
"runxhq/runx"
],
"runx_skill_min_harness_cases": 2,
"runx_skill_min_harness_receipts": 1
},
"claim_audience": "new_runx_skill",
"deliveryExample": "public_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\nsource_url=https://<public-source-or-provenance-url>\npr_url=https://github.com/runxhq/runx/pull/<number>\nx_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\nskill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\nevidence_json=https://example.com/evidence.json\nverification_json=https://example.com/verification.json\nreceipt_ref=runx:receipt:<id>\nreport=https://example.com/report.md"
},
"currency": "USD",
"fee_cents": 70,
"poster_ref": "operator:52ba9b44-a02f-55b3-9b19-268584a1714f",
"posting_id": "p-98da30af97",
"source_ref": "frantic:receipt:2a93458a2b93471d",
"source_url": "/bounties/p-98da30af97",
"claim_limit": 1,
"description": "runx skill: flaky test judge\n\nReview criteria before you claim.\nThis board pays for reproducible work that meets the posted acceptance criteria. Every delivery is verified and its evidence is checked before payout.\n- Dogfood the work. Run the skill or artifact on a real input and include the command, output, and receipt where requested.\n- Make the proof checkable. Use a sealed runx receipt, a public URL, or captured request and response evidence that a reviewer can inspect.\n- Keep claims tied to sources. Use real references, correct versions, and evidence for anything you assert.\n- Ship something with public or operator value. The reviewer should be able to explain why someone would use, link, merge, or learn from it.\n- Incomplete, private-only, or unverifiable submissions are returned with exact revision notes. Fix the packet and resubmit.\n\nContext. Flaky tests slow releases, and the judgment is which tests warrant quarantine (temporary disable plus a tracked fix), which are environmental noise to ignore, and which are real bugs to fix now. This skill reads supplied test-run history, test metadata, and a release policy, computes pass-rate and failure modes from the run logs, and decides a typed disposition. When quarantine is justified it builds a typed runx.flaky.test_triage.v1 packet naming the test paths, a quarantine duration capped at the policy ceiling, an exclusion marker, and a fix-issue template, routed by naming to a downstream issue-to-pr run. The judge mutates no repo and never fires the PR run; a separate governed issue-to-pr run drafts the PR, and the human merge gate on that draft is the only path to a live disable.\n\nDeliverable: A published runx flaky-test-judge triage skill with a green hosted inline harness (one sealed quarantine case + one stop case), a sealed dogfood Observation receipt over the disposition, source_url, evidence_json, and report. No mint and no operational_proposal.v1.\n\nAcceptance:\n- The delivery uses runx CLI 0.6.13 or newer; evidence_json.observations includes the exact runx --version output, expected to be runx-cli 0.6.13 or newer, and the publish/install/dogfood/verify commands were run with that binary.\n- The verified claimant GitHub account currently stars https://github.com/runxhq/runx; Frantic checks this directly through the github.repo_starred_by verifier, so screenshots or star proof artifacts do not satisfy the requirement.\n- The exact package name is flaky-test-judge; publish flow is runx login --provider github --for publish, then runx registry publish ./skills/flaky-test-judge/SKILL.md --registry https://api.runx.ai. public_url is the live registry listing for <owner>/flaky-test-judge@<version> and the canonical public adoption page; source_url is the public source/provenance URL used to publish; and runx registry read <owner>/flaky-test-judge@<version> --json resolves the published metadata and digests when exposed. Do not publish a near-name, alternate name, or renamed implementation. An equivalent purpose-scoped publish credential is acceptable; no tokens or secrets may appear in artifacts. Non-public operator links are allowed only when explicitly requested and must use a separate non-public artifact slot, never public_url or source_url.\n- Open a public PR against runxhq/runx that contains the submitted skill package, including skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence. Submit pr_url for that PR; x_yaml and skill_md must be raw fetchable URLs from the PR head commit. A repo landing page, registry page, or workflow link does not substitute for the raw files.\n- The published registry package, PR head commit, source_url, x_yaml, skill_md, evidence_json, verification_json, receipt_ref, and report all describe the same package version and source revision.\n- A clean install succeeds with runx add <owner>/flaky-test-judge@<version>; the local harness passed before publish via runx harness ./skills/flaky-test-judge; the hosted registry harness passed after publish; a real dogfood run via runx skill <owner>/flaky-test-judge@<version> --json produced a receipt that passes runx verify --receipt <receipt.json> --json, recorded in evidence_json.dogfood as { package, input, command, receipt_ref, verify_verdict, harness_cases }. The recorded receipt_ref is that post-publish dogfood run of <owner>/flaky-test-judge@<version>, not the harness fixture seal, and harness_cases lists each case name with its sealed or refused status.\n- Inline harness.cases declare exactly two cases the hosted gate reads: one sealed case (a 65% pass-rate over 20 runs with timeouts in 6 of 7 failures against a 70% policy threshold yields disposition.decision quarantine, a bounded quarantine packet within max_quarantine_days, and a dispatch target naming issue-to-pr) and one stop case (no run history, so the run seals with disposition.reason naming the missing-evidence stop and no packet).\n- Typed inputs are test_run_history{runs[{status,duration,logs}],sample_size}, test_metadata{test_path,suite,tags}, and release_policy{flake_threshold_pct,min_sample_size,max_quarantine_days}; typed output is a runx.flaky.test_triage.v1 packet with disposition{decision,confidence,reason}, a quarantine packet{test_path,duration_days,fix_template,exclusion_marker} only when justified, an escalation field, and the dispatch target. No mint, no AttenuationRequest, no data-store.\n- The quarantine packet routes by naming into issue-to-pr's typed inputs (thread_title, thread_body with the disable request plus fix template, target_repo, base) or pr-review-note body as the offline leg; the judge composes neither rail in-graph and never consumes the packet as an effect. A downstream driver or operator issues the separate issue-to-pr run, and the human merge gate on that draft is the only path to a live disable; near-threshold evidence escalates to a human lane.\n- The judgment refuses to quarantine a test passing above the policy threshold, refuses when no run history is provided or the sample is below min_sample_size, never exceeds max_quarantine_days, and never invents a failure mode absent from the supplied logs.\n- evidence_json observations include the disposition decision and confidence, the pass-rate with the cited run count and window, the failure-mode count from the logs, the proposed quarantine duration and exclusion marker, the refused reason when applicable, the dispatch target, the two harness case names (quarantine_justified, missing_run_history), and the receipt id.\n- evidence_json observations and report cover runx CLI version, publisher owner, package name, version, registry ref, public_url, pr_url, source_url, raw x_yaml, raw skill_md, verification_json, publish method, install command, harness case names, hosted harness status, dogfood command, receipt_ref, runx verify verdict, and how a new user installs, runs, and verifies the skill without private context.\n\nArtifacts: `public_url`, `source_url`, `pr_url`, `x_yaml`, `skill_md`, `evidence_json`, `verification_json`, `receipt_ref`, `report`\n\nPassing delivery shape:\n```text\npublic_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\nsource_url=https://<public-source-or-provenance-url>\npr_url=https://github.com/runxhq/runx/pull/<number>\nx_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\nskill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\nevidence_json=https://example.com/evidence.json\nverification_json=https://example.com/verification.json\nreceipt_ref=runx:receipt:<id>\nreport=https://example.com/report.md\n```\n\nPreflight before delivery:\n```bash\ncurl -sS https://gofrantic.com/v1/deliveries/preflight \\\n -H 'content-type: application/json' \\\n -d '{\n \"bounty\": <number>,\n \"artifact_refs\": [\n \"public_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\",\n \"source_url=https://<public-source-or-provenance-url>\",\n \"pr_url=https://github.com/runxhq/runx/pull/<number>\",\n \"x_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\",\n \"skill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\",\n \"evidence_json=https://example.com/evidence.json\",\n \"verification_json=https://example.com/verification.json\",\n \"receipt_ref=runx:receipt:<id>\",\n \"report=https://example.com/report.md\"\n ]\n }'\n```\n\nReturned for revision if: Screenshots alone, local-only runs, prose-only summaries, unlisted skills, PRs without the package files, repo landing pages instead of raw X.yaml/SKILL.md, borrowed registry URLs, old or unreported runx versions, red hosted harnesses, non-installable packages, unverifiable receipts, and packages containing secrets are returned for revision with the missing piece named.\n\nReview gate: Open the registry public_url, confirm the listed owner is the worker, open the runxhq/runx pr_url and confirm it contains skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence, fetch x_yaml and skill_md as raw files from the PR head commit, confirm the hosted harness passed, confirm evidence_json includes runx --version output at runx-cli 0.6.13 or newer, run or inspect runx add <owner>/flaky-test-judge@<version> and runx registry read <owner>/flaky-test-judge@<version> --json evidence, compare evidence_json, verification_json, and receipt_ref with the submitted source_url and PR, resolve receipt_ref and confirm evidence_json.dogfood shows it is the post-publish dogfood run of <owner>/flaky-test-judge@<version> rather than the harness fixture or an unrelated receipt, independently run runx add <owner>/flaky-test-judge@<version> and runx skill <owner>/flaky-test-judge@<version> --json to confirm it installs and seals, and state why a real operator or user would install or trust this skill.",
"occurred_at": "2026-06-25T21:22:22.037Z",
"price_cents": 700,
"claimable_at": "2026-06-25T21:22:22.037Z",
"schema_version": 1
}
}