Public beta

Measure engineers at the right level of abstraction

Engineering is shifting from typing code to directing it. The old assessments, leetcode, whiteboards, timed syntax, miss what matters now. Mezure measures the judgment behind the prompt, not the code that came out.

task-brief.md
Open-ended · AI allowed
# Search ranking regression

One customer segment is getting bad search results since
last week's deploy. Diagnose what changed and propose a fix.

## What you have
- repo access (read/write)
- prod logs (read-only, last 14 days)
- your AI assistant of choice

## What we review
- the prompts you sent
- the hypotheses you ruled out
- the path you took to a fix — not just the diff
Features

Competence, measured for how engineers actually work

From open-ended problems to process replays, Mezure assesses what now separates a competent engineer from one who just types fast.

Right level of abstraction

Assess judgment, taste, and system thinking — not whether a candidate can recall for-loop syntax under a stopwatch.

Judge the prompter

Evaluate how an engineer directs, questions, and verifies the AI. That's the work they actually do now.

Research-grade problems

Open-ended, ambiguous tasks that require framing and exploration — not pattern-matching against leetcode.

Process visibility

Replay the prompts, the dead ends, the corrections. Reasoning under ambiguity is the signal, not the final diff.

AI by default

Candidates use the tools they actually work with. We measure the human in the loop, not their willingness to avoid the AI.

Calibrated rubrics

Define what "competent" means on your team. Score against decisions made, not lines written.

How it works

Three steps to a real signal

Define what competent looks like for your team. Run an assessment candidates can take with the tools they use every day.

01

Define competence

Decide what your team actually values — judgment, taste, debugging instinct, research ability. The rubric is yours.

02

Send an open-ended task

Real, ambiguous problems. Candidates work the way they actually work — AI tools included.

03

Review the process

Watch the prompts, the back-and-forth, the choices. Decide who reasons like a strong engineer.

Why Mezure

Engineering changed. The yardstick didn't.

Most coding assessments still test what an LLM does in seconds: syntax recall, algorithm trivia, mechanical implementation. The work that separates a strong engineer in 2026 is upstream of that: framing the problem, choosing the abstraction, verifying the output, knowing what to ask. No standard exists for measuring those things yet. Mezure is one attempt at building it.

What we measure

  • Judgment under ambiguity
  • Quality of prompts and follow-up
  • Ability to verify and reject AI output
  • Taste in tradeoffs and architecture
  • Research instinct on unfamiliar ground

What we don't

  • Syntax recall and algorithm trivia
  • Speed-typing under a stopwatch
  • Memorized leetcode patterns
  • Whether the candidate used AI — they should
  • Theater over substance
Pricing

Free during public beta

Mezure is free while we build out the platform and calibrate against real teams. We'll announce pricing before any paid plans go into effect.

Start measuring what actually matters.

Define competence on your terms. Run assessments built for how engineers work now, AI included.