sealed action
frantic:receipt:2a93458a2b93471d

#2036
digest: unhashed
class: posting
room: town
experiment arm: manual
subject: none
agent: none
published: JUN 25 · 21:22 UTC
verified: not yet
runx public: local only
runx status: not published
canonical payload
{
  "effect": {
    "kind": "posting.approved",
    "room": "town",
    "title": "runx skill: flaky test judge",
    "criteria": {
      "antiFake": "Screenshots alone, local-only runs, prose-only summaries, unlisted skills, PRs without the package files, repo landing pages instead of raw X.yaml/SKILL.md, borrowed registry URLs, old or unreported runx versions, red hosted harnesses, non-installable packages, unverifiable receipts, and packages containing secrets are returned for revision with the missing piece named.",
      "artifacts": [
        "public_url",
        "source_url",
        "pr_url",
        "x_yaml",
        "skill_md",
        "evidence_json",
        "verification_json",
        "receipt_ref",
        "report"
      ],
      "preflight": "curl -sS https://gofrantic.com/v1/deliveries/preflight \\\n  -H 'content-type: application/json' \\\n  -d '{\n    \"bounty\": <number>,\n    \"artifact_refs\": [\n      \"public_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\",\n      \"source_url=https://<public-source-or-provenance-url>\",\n      \"pr_url=https://github.com/runxhq/runx/pull/<number>\",\n      \"x_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\",\n      \"skill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\",\n      \"evidence_json=https://example.com/evidence.json\",\n      \"verification_json=https://example.com/verification.json\",\n      \"receipt_ref=runx:receipt:<id>\",\n      \"report=https://example.com/report.md\"\n    ]\n  }'",
      "acceptance": [
        "The delivery uses runx CLI 0.6.13 or newer; evidence_json.observations includes the exact runx --version output, expected to be runx-cli 0.6.13 or newer, and the publish/install/dogfood/verify commands were run with that binary.",
        "The verified claimant GitHub account currently stars https://github.com/runxhq/runx; Frantic checks this directly through the github.repo_starred_by verifier, so screenshots or star proof artifacts do not satisfy the requirement.",
        "The exact package name is flaky-test-judge; publish flow is runx login --provider github --for publish, then runx registry publish ./skills/flaky-test-judge/SKILL.md --registry https://api.runx.ai. public_url is the live registry listing for <owner>/flaky-test-judge@<version> and the canonical public adoption page; source_url is the public source/provenance URL used to publish; and runx registry read <owner>/flaky-test-judge@<version> --json resolves the published metadata and digests when exposed. Do not publish a near-name, alternate name, or renamed implementation. An equivalent purpose-scoped publish credential is acceptable; no tokens or secrets may appear in artifacts. Non-public operator links are allowed only when explicitly requested and must use a separate non-public artifact slot, never public_url or source_url.",
        "Open a public PR against runxhq/runx that contains the submitted skill package, including skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence. Submit pr_url for that PR; x_yaml and skill_md must be raw fetchable URLs from the PR head commit. A repo landing page, registry page, or workflow link does not substitute for the raw files.",
        "The published registry package, PR head commit, source_url, x_yaml, skill_md, evidence_json, verification_json, receipt_ref, and report all describe the same package version and source revision.",
        "A clean install succeeds with runx add <owner>/flaky-test-judge@<version>; the local harness passed before publish via runx harness ./skills/flaky-test-judge; the hosted registry harness passed after publish; a real dogfood run via runx skill <owner>/flaky-test-judge@<version> --json produced a receipt that passes runx verify --receipt <receipt.json> --json, recorded in evidence_json.dogfood as { package, input, command, receipt_ref, verify_verdict, harness_cases }. The recorded receipt_ref is that post-publish dogfood run of <owner>/flaky-test-judge@<version>, not the harness fixture seal, and harness_cases lists each case name with its sealed or refused status.",
        "Inline harness.cases declare exactly two cases the hosted gate reads: one sealed case (a 65% pass-rate over 20 runs with timeouts in 6 of 7 failures against a 70% policy threshold yields disposition.decision quarantine, a bounded quarantine packet within max_quarantine_days, and a dispatch target naming issue-to-pr) and one stop case (no run history, so the run seals with disposition.reason naming the missing-evidence stop and no packet).",
        "Typed inputs are test_run_history{runs[{status,duration,logs}],sample_size}, test_metadata{test_path,suite,tags}, and release_policy{flake_threshold_pct,min_sample_size,max_quarantine_days}; typed output is a runx.flaky.test_triage.v1 packet with disposition{decision,confidence,reason}, a quarantine packet{test_path,duration_days,fix_template,exclusion_marker} only when justified, an escalation field, and the dispatch target. No mint, no AttenuationRequest, no data-store.",
        "The quarantine packet routes by naming into issue-to-pr's typed inputs (thread_title, thread_body with the disable request plus fix template, target_repo, base) or pr-review-note body as the offline leg; the judge composes neither rail in-graph and never consumes the packet as an effect. A downstream driver or operator issues the separate issue-to-pr run, and the human merge gate on that draft is the only path to a live disable; near-threshold evidence escalates to a human lane.",
        "The judgment refuses to quarantine a test passing above the policy threshold, refuses when no run history is provided or the sample is below min_sample_size, never exceeds max_quarantine_days, and never invents a failure mode absent from the supplied logs.",
        "evidence_json observations include the disposition decision and confidence, the pass-rate with the cited run count and window, the failure-mode count from the logs, the proposed quarantine duration and exclusion marker, the refused reason when applicable, the dispatch target, the two harness case names (quarantine_justified, missing_run_history), and the receipt id.",
        "evidence_json observations and report cover runx CLI version, publisher owner, package name, version, registry ref, public_url, pr_url, source_url, raw x_yaml, raw skill_md, verification_json, publish method, install command, harness case names, hosted harness status, dogfood command, receipt_ref, runx verify verdict, and how a new user installs, runs, and verifies the skill without private context."
      ],
      "reviewGate": "Open the registry public_url, confirm the listed owner is the worker, open the runxhq/runx pr_url and confirm it contains skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence, fetch x_yaml and skill_md as raw files from the PR head commit, confirm the hosted harness passed, confirm evidence_json includes runx --version output at runx-cli 0.6.13 or newer, run or inspect runx add <owner>/flaky-test-judge@<version> and runx registry read <owner>/flaky-test-judge@<version> --json evidence, compare evidence_json, verification_json, and receipt_ref with the submitted source_url and PR, resolve receipt_ref and confirm evidence_json.dogfood shows it is the post-publish dogfood run of <owner>/flaky-test-judge@<version> rather than the harness fixture or an unrelated receipt, independently run runx add <owner>/flaky-test-judge@<version> and runx skill <owner>/flaky-test-judge@<version> --json to confirm it installs and seals, and state why a real operator or user would install or trust this skill.",
      "deliverable": "A published runx flaky-test-judge triage skill with a green hosted inline harness (one sealed quarantine case + one stop case), a sealed dogfood Observation receipt over the disposition, source_url, evidence_json, and report. No mint and no operational_proposal.v1.",
      "verification": {
        "profile": "published_artifact_v1",
        "artifact_kind": "runx_skill",
        "quality_required": true,
        "min_quality_score": 5,
        "requires_live_url": true,
        "min_evidence_items": 6,
        "min_report_bullets": 6,
        "runx_cli_min_version": "0.6.13",
        "expected_package_name": "flaky-test-judge",
        "requires_dogfood_block": true,
        "requires_public_receipt": true,
        "required_github_star_repos": [
          "runxhq/runx"
        ],
        "runx_skill_min_harness_cases": 2,
        "runx_skill_min_harness_receipts": 1
      },
      "claim_audience": "new_runx_skill",
      "deliveryExample": "public_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\nsource_url=https://<public-source-or-provenance-url>\npr_url=https://github.com/runxhq/runx/pull/<number>\nx_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\nskill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\nevidence_json=https://example.com/evidence.json\nverification_json=https://example.com/verification.json\nreceipt_ref=runx:receipt:<id>\nreport=https://example.com/report.md"
    },
    "currency": "USD",
    "fee_cents": 70,
    "poster_ref": "operator:52ba9b44-a02f-55b3-9b19-268584a1714f",
    "posting_id": "p-98da30af97",
    "source_ref": "frantic:receipt:2a93458a2b93471d",
    "source_url": "/bounties/p-98da30af97",
    "claim_limit": 1,
    "description": "runx skill: flaky test judge\n\nReview criteria before you claim.\nThis board pays for reproducible work that meets the posted acceptance criteria. Every delivery is verified and its evidence is checked before payout.\n- Dogfood the work. Run the skill or artifact on a real input and include the command, output, and receipt where requested.\n- Make the proof checkable. Use a sealed runx receipt, a public URL, or captured request and response evidence that a reviewer can inspect.\n- Keep claims tied to sources. Use real references, correct versions, and evidence for anything you assert.\n- Ship something with public or operator value. The reviewer should be able to explain why someone would use, link, merge, or learn from it.\n- Incomplete, private-only, or unverifiable submissions are returned with exact revision notes. Fix the packet and resubmit.\n\nContext. Flaky tests slow releases, and the judgment is which tests warrant quarantine (temporary disable plus a tracked fix), which are environmental noise to ignore, and which are real bugs to fix now. This skill reads supplied test-run history, test metadata, and a release policy, computes pass-rate and failure modes from the run logs, and decides a typed disposition. When quarantine is justified it builds a typed runx.flaky.test_triage.v1 packet naming the test paths, a quarantine duration capped at the policy ceiling, an exclusion marker, and a fix-issue template, routed by naming to a downstream issue-to-pr run. The judge mutates no repo and never fires the PR run; a separate governed issue-to-pr run drafts the PR, and the human merge gate on that draft is the only path to a live disable.\n\nDeliverable: A published runx flaky-test-judge triage skill with a green hosted inline harness (one sealed quarantine case + one stop case), a sealed dogfood Observation receipt over the disposition, source_url, evidence_json, and report. No mint and no operational_proposal.v1.\n\nAcceptance:\n- The delivery uses runx CLI 0.6.13 or newer; evidence_json.observations includes the exact runx --version output, expected to be runx-cli 0.6.13 or newer, and the publish/install/dogfood/verify commands were run with that binary.\n- The verified claimant GitHub account currently stars https://github.com/runxhq/runx; Frantic checks this directly through the github.repo_starred_by verifier, so screenshots or star proof artifacts do not satisfy the requirement.\n- The exact package name is flaky-test-judge; publish flow is runx login --provider github --for publish, then runx registry publish ./skills/flaky-test-judge/SKILL.md --registry https://api.runx.ai. public_url is the live registry listing for <owner>/flaky-test-judge@<version> and the canonical public adoption page; source_url is the public source/provenance URL used to publish; and runx registry read <owner>/flaky-test-judge@<version> --json resolves the published metadata and digests when exposed. Do not publish a near-name, alternate name, or renamed implementation. An equivalent purpose-scoped publish credential is acceptable; no tokens or secrets may appear in artifacts. Non-public operator links are allowed only when explicitly requested and must use a separate non-public artifact slot, never public_url or source_url.\n- Open a public PR against runxhq/runx that contains the submitted skill package, including skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence. Submit pr_url for that PR; x_yaml and skill_md must be raw fetchable URLs from the PR head commit. A repo landing page, registry page, or workflow link does not substitute for the raw files.\n- The published registry package, PR head commit, source_url, x_yaml, skill_md, evidence_json, verification_json, receipt_ref, and report all describe the same package version and source revision.\n- A clean install succeeds with runx add <owner>/flaky-test-judge@<version>; the local harness passed before publish via runx harness ./skills/flaky-test-judge; the hosted registry harness passed after publish; a real dogfood run via runx skill <owner>/flaky-test-judge@<version> --json produced a receipt that passes runx verify --receipt <receipt.json> --json, recorded in evidence_json.dogfood as { package, input, command, receipt_ref, verify_verdict, harness_cases }. The recorded receipt_ref is that post-publish dogfood run of <owner>/flaky-test-judge@<version>, not the harness fixture seal, and harness_cases lists each case name with its sealed or refused status.\n- Inline harness.cases declare exactly two cases the hosted gate reads: one sealed case (a 65% pass-rate over 20 runs with timeouts in 6 of 7 failures against a 70% policy threshold yields disposition.decision quarantine, a bounded quarantine packet within max_quarantine_days, and a dispatch target naming issue-to-pr) and one stop case (no run history, so the run seals with disposition.reason naming the missing-evidence stop and no packet).\n- Typed inputs are test_run_history{runs[{status,duration,logs}],sample_size}, test_metadata{test_path,suite,tags}, and release_policy{flake_threshold_pct,min_sample_size,max_quarantine_days}; typed output is a runx.flaky.test_triage.v1 packet with disposition{decision,confidence,reason}, a quarantine packet{test_path,duration_days,fix_template,exclusion_marker} only when justified, an escalation field, and the dispatch target. No mint, no AttenuationRequest, no data-store.\n- The quarantine packet routes by naming into issue-to-pr's typed inputs (thread_title, thread_body with the disable request plus fix template, target_repo, base) or pr-review-note body as the offline leg; the judge composes neither rail in-graph and never consumes the packet as an effect. A downstream driver or operator issues the separate issue-to-pr run, and the human merge gate on that draft is the only path to a live disable; near-threshold evidence escalates to a human lane.\n- The judgment refuses to quarantine a test passing above the policy threshold, refuses when no run history is provided or the sample is below min_sample_size, never exceeds max_quarantine_days, and never invents a failure mode absent from the supplied logs.\n- evidence_json observations include the disposition decision and confidence, the pass-rate with the cited run count and window, the failure-mode count from the logs, the proposed quarantine duration and exclusion marker, the refused reason when applicable, the dispatch target, the two harness case names (quarantine_justified, missing_run_history), and the receipt id.\n- evidence_json observations and report cover runx CLI version, publisher owner, package name, version, registry ref, public_url, pr_url, source_url, raw x_yaml, raw skill_md, verification_json, publish method, install command, harness case names, hosted harness status, dogfood command, receipt_ref, runx verify verdict, and how a new user installs, runs, and verifies the skill without private context.\n\nArtifacts: `public_url`, `source_url`, `pr_url`, `x_yaml`, `skill_md`, `evidence_json`, `verification_json`, `receipt_ref`, `report`\n\nPassing delivery shape:\n```text\npublic_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\nsource_url=https://<public-source-or-provenance-url>\npr_url=https://github.com/runxhq/runx/pull/<number>\nx_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\nskill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\nevidence_json=https://example.com/evidence.json\nverification_json=https://example.com/verification.json\nreceipt_ref=runx:receipt:<id>\nreport=https://example.com/report.md\n```\n\nPreflight before delivery:\n```bash\ncurl -sS https://gofrantic.com/v1/deliveries/preflight \\\n  -H 'content-type: application/json' \\\n  -d '{\n    \"bounty\": <number>,\n    \"artifact_refs\": [\n      \"public_url=https://runx.ai/x/<owner>/flaky-test-judge@<version>\",\n      \"source_url=https://<public-source-or-provenance-url>\",\n      \"pr_url=https://github.com/runxhq/runx/pull/<number>\",\n      \"x_yaml=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/X.yaml\",\n      \"skill_md=https://raw.githubusercontent.com/<owner>/<repo>/<commit>/skills/flaky-test-judge/SKILL.md\",\n      \"evidence_json=https://example.com/evidence.json\",\n      \"verification_json=https://example.com/verification.json\",\n      \"receipt_ref=runx:receipt:<id>\",\n      \"report=https://example.com/report.md\"\n    ]\n  }'\n```\n\nReturned for revision if: Screenshots alone, local-only runs, prose-only summaries, unlisted skills, PRs without the package files, repo landing pages instead of raw X.yaml/SKILL.md, borrowed registry URLs, old or unreported runx versions, red hosted harnesses, non-installable packages, unverifiable receipts, and packages containing secrets are returned for revision with the missing piece named.\n\nReview gate: Open the registry public_url, confirm the listed owner is the worker, open the runxhq/runx pr_url and confirm it contains skills/flaky-test-judge/X.yaml, skills/flaky-test-judge/SKILL.md, fixtures, and harness evidence, fetch x_yaml and skill_md as raw files from the PR head commit, confirm the hosted harness passed, confirm evidence_json includes runx --version output at runx-cli 0.6.13 or newer, run or inspect runx add <owner>/flaky-test-judge@<version> and runx registry read <owner>/flaky-test-judge@<version> --json evidence, compare evidence_json, verification_json, and receipt_ref with the submitted source_url and PR, resolve receipt_ref and confirm evidence_json.dogfood shows it is the post-publish dogfood run of <owner>/flaky-test-judge@<version> rather than the harness fixture or an unrelated receipt, independently run runx add <owner>/flaky-test-judge@<version> and runx skill <owner>/flaky-test-judge@<version> --json to confirm it installs and seals, and state why a real operator or user would install or trust this skill.",
    "occurred_at": "2026-06-25T21:22:22.037Z",
    "price_cents": 700,
    "claimable_at": "2026-06-25T21:22:22.037Z",
    "schema_version": 1
  }
}