We built Atira to handle a specific problem: procurement teams who can't realistically read every supplier contract thoroughly before renewal decisions need to be made. But we also want to be honest about what clause extraction does well and where a human legal reviewer is still doing something the machine isn't.
Earlier this year, we ran an informal comparison exercise. We took a set of 12 supplier contracts — a mix of manufacturing services agreements, IT services contracts, and facilities management terms — and reviewed them in two ways: first with Atira's clause extraction pipeline, then with a solicitor who specialises in commercial procurement contracts. We were looking for the same set of outputs: renewal terms, price escalation provisions, termination rights, liability caps, and penalty or clawback clauses.
This was not a rigorous controlled study. The contract set was not randomised. The lawyer knew roughly what category of clause she was looking for. The results are directionally interesting, not statistically validated — and we'd encourage scepticism of anyone who presents this kind of comparison as definitive in either direction.
Where AI extraction had a consistent edge
Speed and coverage. Atira processed all 12 contracts and produced structured clause outputs in under four minutes total. The lawyer's review took approximately 11 hours across the same set, at a depth she described as "commercial review, not litigation-level." On routine identification of standard clause types — auto-renewal provisions, fixed-percentage escalation, standard termination-for-convenience language — the extracted output was equivalent in accuracy to the manual review.
Cross-document consistency. Because the extraction applies the same logic to every contract, it doesn't get fatigued or inconsistent. The 12th contract gets the same quality of attention as the first. Human reviewers, even experienced ones, vary in how thoroughly they read a document depending on how complex or tedious the preceding documents were.
Structured output. The extracted data can be sorted, filtered, and compared across contracts in a way that a lawyer's review notes cannot. If you want to see all contracts with notice periods over 90 days, or all contracts with uncapped escalation, you can do that instantly from structured extraction output. You can't do it from a set of legal memos.
Clause location. For procurement teams who need to find the relevant clause quickly in the original document — during a supplier call, say — the extraction output includes clause references that let you jump directly to the relevant section. This sounds minor but in practice saves significant time.
Where manual review caught things the extraction missed
Interaction effects between clauses. The lawyer identified two cases where a clause in one part of the contract modified or qualified the effect of a clause in another part in a non-obvious way. In one instance, a seemingly standard liability cap in clause 15 was qualified by an indemnity carve-out in schedule 3 that effectively removed the cap for a category of claims. The extraction had identified both clauses correctly but had not surfaced the interaction between them. A human reader working through the document as a whole is better at noticing when something earlier in the text changes the meaning of something later.
Commercial context and intent. In three contracts, the lawyer flagged language that was technically clear but commercially unusual — for example, a payment milestone structure that was explicitly drafted to delay the buyer's right to invoice the supplier for rebates. The clause was technically sound and Atira had extracted the payment milestone dates correctly. But the lawyer's comment was that the structure looked like it had been drafted specifically to frustrate the buyer's ability to claim. That's a judgment call that requires understanding of what commercial drafting norms look like and when a departure from those norms is intentional.
Defined-term chains. Two of the 12 contracts used a heavily defined-term structure where "Renewal Price" was defined in the definitions clause, but the definition itself referenced "Base Price," which was defined in a schedule, which had been updated by an amendment that was a separate document. The extraction had been given the main contract body only, not the amendment. This is partly a document management problem — the extraction can only work with what you give it — but it also highlights that human reviewers are better at noticing when something important is missing from what's in front of them.
Drafting errors and inconsistencies. The lawyer caught two internal inconsistencies — a clause that used different notice period figures in two different sections, and a defined term that was used but never defined in the main body. These were genuine drafting errors that had potentially significant commercial effect. The extraction normalised on one reading of the clause without flagging the inconsistency as an ambiguity.
The honest framing
We're not saying manual legal review is redundant. For high-value contracts, complex commercial arrangements, and situations where the buyer is taking on material liability or risk, a trained legal reviewer doing a thorough read of the full document — including defined terms, schedules, and amendments — is the right approach. No clause extraction tool replaces that for contracts where the stakes justify the cost.
The practical question is what happens to the other 85% of your contract portfolio — the contracts where the cost of thorough legal review on every renewal is prohibitive, but where missing an escalation clause or a 120-day notice window has real financial consequences. That's the gap we're filling.
For a team managing 300 supplier contracts, the unit economics of instructing a solicitor for every renewal review don't work. Even at a rapid commercial review rate, the cost across the portfolio would dwarf the savings from catching the escalation clauses and missed notice windows. The choice is not "AI extraction vs. legal review" — it's "AI extraction vs. no systematic review at all," which is what most mid-market procurement teams are actually doing today.
What the comparison told us to improve
The interaction-effect gap is the one we're working hardest on. Identifying clauses in isolation is tractable. Identifying when two clauses in different parts of a long document interact in ways that change the effective meaning requires reasoning across the full document structure, and we're not satisfied with where we are on that yet. The cases the lawyer caught were not subtle — they were commercially important — and we want Atira to surface those interactions rather than leaving them to chance.
The defined-term chain problem is partly architectural: we need to handle amendments and schedules as first-class documents in the same extraction run as the main body, not as afterthoughts. This is on the roadmap for later this year.
What the exercise confirmed is that for the clause types most directly relevant to procurement's financial exposure — renewal structure, price escalation, exit rights — the extraction output is accurate and actionable for the vast majority of contracts. The cases where a human reviewer is doing something qualitatively different are real, but they're concentrated in the complex end of the portfolio. Knowing which contracts are in that category — because the extraction has already found that they have unusual clause interactions or references to multiple external documents — is itself useful triage information.