Moin Zaman

Moin Zaman

Co-founder

Smartnote

Why Most AI De-Identification Fails in Production, And How We Built One Lawyers Actually Trust

explore the program

Why Most AI De-Identification Fails in Production, And How We Built One Lawyers Actually Trust

De-identifying text is easy to demo and surprisingly hard to ship. This talk is a deep technical case study of building SmartScrub, a reversible de-identification system designed for legal workflows, where privacy guarantees, auditability, and user trust are non-negotiable.

The original goal was simple, allow lawyers to safely use LLMs on transcripts without exposing client data. The reality was a long series of architectural failures that common PII masking approaches cannot survive in production.

I will walk through what we actually built and why naive solutions broke down. This includes placeholder token design, collision avoidance, stability across edits, and why masking too aggressively destroys downstream LLM usefulness. I will show how reversible de-identification changes your entire data model, UI, and persistence strategy, and why this becomes a systems problem rather than an NLP problem.

The talk covers hard trade-offs we made around local-first processing, cloud services, manual review tooling, user-defined PII patterns, and audit-safe re-identification. I will also share failure modes we only discovered after real users interacted with the system, including false positives that destroy trust, silent data drift, and UI decisions that unintentionally leak meaning.

This is not a theoretical talk. It is a production story about building AI under legal risk, zero tolerance for silent errors, and users who will abandon the product instantly if they do not fully understand what the system is doing. If you are building AI systems that touch sensitive data, this talk will save you months of painful mistakes.

What Attendees Will Learn - Why common PII masking approaches fail under real legal workflows - How to design reversible de-identification that survives editing, reprocessing, and audits - Placeholder strategies that preserve LLM utility without leaking meaning - Architectural patterns for isolating raw data while still enabling AI pipelines - UI and data model decisions that directly impact user trust - Failure modes you will not catch until real professionals use your system

Technical Topics Covered - Reversible de-identification architectures - Placeholder token stability and mapping persistence - Manual scrub tooling and override precedence - User-defined PII pattern overlays - Auditability and re-identification guarantees - Local-first vs cloud processing trade-offs - Why this problem is systems engineering, not just NLP

Plus. a short live walkthrough showing how a legal transcript is de-identified, reviewed, edited, and safely re-identified, including examples of failure cases and how the system prevents them.

Moin Zaman

Moin Zaman works at the intersection of product strategy, UX, technology leadership, and AI-enabled systems. His background spans executive leadership, digital transformation, front-end engineering, and building software that needs to feel both useful and credible. His recent work includes high-trust AI workflows such as SmartScrub and Smartnote, within a broader focus on calm products, trusted systems, and practical execution.