How should we validate batch results when inputs are messy?

How should we validate batch results when inputs are messy?‍

Run a second “cleaned” pass (normalized names + locality) and measure lift vs. raw. Drive decisioning with your own thresholds (e.g., ≥0.80 auto-accept; mid-confidence → review). Keep a small gold set of known answers to spot-check precision/recall over time.

testing-and-trialing