AI Engineer Melbourne

Why Most AI De-Identification Fails in Production, And How We Built One Lawyers Actually Trust

De-identifying text is easy to demo and surprisingly hard to ship. This talk is a deep technical case study of building SmartScrub, a reversible de-identification system designed for legal workflows, where privacy guarantees, auditability, and user trust are non-negotiable.

The original goal was simple, allow lawyers to safely use LLMs on transcripts without exposing client data. The reality was a long series of architectural failures that common PII masking approaches cannot survive in production.

I will walk through what we actually built and why naive solutions broke down. This includes placeholder token design, collision avoidance, stability across edits, and why masking too aggressively destroys downstream LLM usefulness. I will show how reversible de-identification changes your entire data model, UI, and persistence strategy, and why this becomes a systems problem rather than an NLP problem.

The talk covers hard trade-offs we made around local-first processing, cloud services, manual review tooling, user-defined PII patterns, and audit-safe re-identification. I will also share failure modes we only discovered after real users interacted with the system, including false positives that destroy trust, silent data drift, and UI decisions that unintentionally leak meaning.

This is not a theoretical talk. It is a production story about building AI under legal risk, zero tolerance for silent errors, and users who will abandon the product instantly if they do not fully understand what the system is doing. If you are building AI systems that touch sensitive data, this talk will save you months of painful mistakes.

What Attendees Will Learn

Why common PII masking approaches fail under real legal workflows
How to design reversible de-identification that survives editing, reprocessing, and audits
Placeholder strategies that preserve LLM utility without leaking meaning
Architectural patterns for isolating raw data while still enabling AI pipelines
UI and data model decisions that directly impact user trust
Failure modes you will not catch until real professionals use your system

Technical Topics Covered

Reversible de-identification architectures
Placeholder token stability and mapping persistence
Manual scrub tooling and override precedence
User-defined PII pattern overlays
Auditability and re-identification guarantees
Local-first vs cloud processing trade-offs
Why this problem is systems engineering, not just NLP

Plus. a short live walkthrough showing how a legal transcript is de-identified, reviewed, edited, and safely re-identified, including examples of failure cases and how the system prevents them.

Moin Zaman

Moin Zaman works at the intersection of product strategy, UX, technology leadership, and AI-enabled systems. His background spans executive leadership, digital transformation, front-end engineering, and building software that needs to feel both useful and credible. His recent work includes high-trust AI workflows such as SmartScrub and Smartnote, within a broader focus on calm products, trusted systems, and practical execution.