How does Cobalt normalize search queries and handle abbreviations or special characters?


Before running a search, Cobalt applies a standard layer of query normalization to clean up formatting issues that can interfere with matching. This includes:

  • Removing emojis and emoji variants
  • Removing private use area characters
  • Removing zero-width and other invisible characters
  • Removing the HTML LRM entity (‎)
  • Converting smart single quotes to straight quotes
  • Normalizing smart double quotes
  • Converting dashes to standard hyphens
  • Converting non-breaking spaces and tabs to normal spaces
  • Removing newlines
  • Collapsing consecutive whitespace into a single space
  • Removing bullets and straight double quotes
  • Converting the query to lowercase

After this base cleanup, Cobalt may apply additional adjustments depending on the state, since Secretary of State search behavior varies by jurisdiction. For example, some states allow characters like / in the search query, while others may require stricter normalization.

If an initial search does not return a result, Cobalt may then attempt a second-pass expansion when abbreviations appear to be involved. In these cases, the system can expand abbreviated terms and rerun the search.

For example:

  • “First Natl Bank” may be expanded to “FIRST NATIONAL BANK”

This fallback can help surface valid matches that would otherwise be missed. When that happens, the returned confidenceLevel may still be lower than an exact match, so clients should review those results accordingly rather than treating them as a perfect direct match.

matching-confidence