Signed up
Account created. Network browseable. No tasks until phone verified.
A Builder publishes a task. Verified Humans label it. Consensus aggregates the labels. Per-skill Elo weights each Human's vote. A Steward adjudicates disagreement. Gold questions catch drift. The signed dataset ships. The Humans get paid. That is the entire loop, end to end.
A Builder defines the task — its prompt, its label schema, its consensus N, its quality thresholds — and publishes. The qualified pool is filtered by tier, skill, language, and region. Each item routes to N Humans independently. The aggregator weighs each label by Elo, accounts for known per-Human bias, and returns a consensus label plus a confidence score.
If confidence is high, the label ships into the signed dataset and the Humans get paid. If confidence is low or a Human appeals, the item routes to a Steward. The Steward's outcome is final and writes to the append-only audit log.
Account created. Network browseable. No tasks until phone verified.
Low-stakes work unlocked. Per-skill Elo seeded by qualification arena.
Standard work and higher per-label rates. Sumsub liveness + ID check.
Sensitive work, Steward eligibility, dispute adjudication. Highest rates.
On top of the four tiers, each Human carries a per-skill Elo — bbox, Text NER, audio transcription, RLHF preference, and so on. New Humans seed their Elo through a qualification arena of gold-graded items. Verified accuracy raises the score. Drift lowers it. Higher Elo unlocks higher-paying work in that specific skill.
Plain majority vote is a bad aggregator. It treats every Human as equally accurate and every disagreement as noise. The Network treats disagreement as signal. Three families of methods do the work — Dawid-Skene estimates each Human's confusion matrix from their full label history and weighs their vote accordingly. MACE separates random spamming from honest disagreement. CAZ peer-prediction rewards Humans for being informative about what other Humans will say, which makes random clicking a losing strategy.
The aggregator returns two numbers — the consensus label and a confidence score. The confidence score is what trips dispute, gold-question audits, and Steward escalation. The math stays inside the Network. The Builder sees the label, the confidence, the audit trail, and the receipts.
The shape above is the public model. The implementation has more — confidence calibration curves, Elo prior selection, gold-question generation policy. The Field notes go further into the math.
Read the methodology →