How should we validate batch results when inputs are messy?
Run a second “cleaned” pass (normalized names + locality) and measure lift vs. raw. Drive decisioning with your own thresholds (e.g., ≥0.80 auto-accept; mid-confidence → review). Keep a small gold set of known answers to spot-check precision/recall over time.
testing-and-trialing
