HubSpot Duplicate Similarity Score: Filter & Fix Duplicates Faster

What This Update Actually Is

HubSpot added a Similarity Score column to the duplicate management table. The score is a percentage that reflects how closely two records match according to HubSpot's default detection model.

Before this update, HubSpot surfaced duplicate contacts with a score above 30% and duplicate companies with a score above 80%. You saw the pairs. You didn't see the number. Now you do.

You can also filter the table to show only duplicates above any threshold you choose. Want to see only contact pairs with a score above 70%? Set the filter. The low-confidence matches disappear until you're ready for them.

This is a public beta. Every account enrolled in the program gets it today. The eligible tiers are Professional and Enterprise across Commerce Hub, Content Hub, Marketing Hub, Data Hub, Sales Hub, Service Hub, and Smart CRM.

Why HubSpot Shipped This

The external problem is straightforward. Dirty data costs revenue. Duplicate records split contact history, inflate list sizes, skew reporting, and confuse the humans working those contacts every day.

But the internal frustration was just as real. When HubSpot showed you a duplicate pair, you had no idea why it was there. Was it a 31% match or a 98% match? You couldn't tell. That made the tool feel like a black box, and it made the cleanup process feel like guesswork.

Admins we work with describe the same pattern: they'd open the duplicate table, see hundreds of pairs, freeze, and close it. Without a way to prioritize, the whole task felt impossible.

Surfacing the score changes the psychology. High-confidence matches are fast, obvious merges. Low-confidence matches are judgment calls you can schedule for later. That's a manageable workflow instead of an overwhelming list.

How to Use It Step by Step

Go to Contacts or Companies in your HubSpot portal and open the Actions menu. Select Manage Duplicates.
Look for the new Similarity Score column in the table. Each row now shows a percentage next to the flagged pair.
Use the filter to set a minimum threshold. Start with 80% or higher for contacts to build your high-confidence merge queue first.
Review those pairs and merge confidently. The score tells you HubSpot is very sure these are the same person or company.
Drop the threshold to 50% and review the next tier. These pairs need more scrutiny. Check email domains, job titles, and associated companies before merging.
Schedule a recurring calendar block to work through lower-score pairs monthly. Don't let them pile up again.

What It Touches in Your HubSpot Strategy

This update lives in the Smart CRM layer, but its ripple effect reaches every hub you run.

Duplicate contacts inflate your marketing lists, which means you're paying more for contacts, sending emails to the same humans twice, and reporting inflated engagement numbers. Clean that up and your deliverability improves alongside your data accuracy.

Duplicate companies distort your account-based reporting. If a target account has three company records, your deal attribution is split. Your sales team sees fragmented activity. Leaders make pipeline decisions on incomplete data.

Key Takeaway

A high similarity score doesn't guarantee a merge is safe. Always check associated deals, open tickets, and email history before combining records. The score tells you confidence, not context.

If you're running any integrations that sync CRM data outbound, such as a Salesforce sync or a data warehouse connection, duplicate records in HubSpot will replicate that mess downstream. Merging high-confidence duplicates before a sync is always the cleaner move.

If you're thinking about sync integrity more broadly, our breakdown of HubSpot's Salesforce integration rebuild explains why dirty CRM data is the single biggest risk in that migration.

The score filter also pairs well with any data governance workflow you've built inside HubSpot. If you're using property-based automation to flag data quality issues, the similarity score gives you another signal to layer in.

Key Takeaway

Build a tiered cleanup cadence: 80%+ scores weekly, 50-79% scores monthly, below 50% quarterly with a second set of eyes. Consistency beats one-time blitzes every time.

Combine this with the centralized sharing management for reports and dashboards and you're building a CRM that admins can actually govern instead of just react to.

Who Should Care Most

Not every role needs to act on this right away. Here's who it hits hardest.

HubSpot admins and RevOps leads managing portals with more than 10,000 contacts. The filter turns a wall of pairs into a prioritized queue so your team can actually make progress.
Marketing ops professionals whose list health directly affects email deliverability. Cleaning high-confidence duplicates reduces contact count bloat and improves send quality fast.
Sales leaders running account-based plays. Duplicate company records split your account view. Merging them gives your reps one clean record to work from.
Business owners who recently migrated to HubSpot or imported a large list. Post-import, duplicate rates spike. The similarity score filter is the fastest way to find the obvious merges first.
Anyone prepping for a HubSpot audit or data cleanup sprint. The score gives you a measurable starting point instead of a gut-feel approach.

George's Take

I've been inside a lot of HubSpot portals, and the duplicate table is one of those places where I consistently see humans give up. It's not that they don't care about clean data. It's that the table gave them no signal to work with. Every pair looked equally important, which meant none of them did. This score changes the game not because it's technically complex, but because it respects how humans actually make decisions. Give someone a priority signal, and they'll move. Give them an undifferentiated list of problems, and they'll close the tab.

“The best data hygiene tool isn't the most powerful one. It's the one your team will actually open and use. The similarity score finally gives the duplicate table what it needed: a reason to start at the top.”

— George B. Thomas

If you want to understand exactly how duplicate-inflated contact lists damage your email program, read our piece on why your B2B email list is killing your inbox placement. The connection between data quality and deliverability is more direct than most marketing teams realize.

If your portal has been collecting duplicate records for months and you don't know where to start, let's fix that together. Book a strategy call with the Sidekick team and we'll walk through your duplicate management setup, your data governance approach, and the fastest path to a CRM your whole team can trust.

Frequently Asked Questions

What is the HubSpot Duplicate Similarity Score?

The HubSpot Duplicate Similarity Score is a percentage shown in the duplicate management table that reflects how closely two records match according to HubSpot's default detection model. A higher score means HubSpot is more confident the records are duplicates. You can filter the table to show only pairs above a threshold you choose.

What score threshold does HubSpot use to surface duplicates by default?

HubSpot surfaces duplicate contacts with a similarity score above 30% and duplicate companies with a score above 80%. These thresholds existed before this update, but the scores were hidden. Now they're visible in a dedicated column, and you can filter to any custom threshold you want.

Does a high similarity score mean I should always merge those records?

Not automatically. A high score means HubSpot's model is confident the records share key properties. You should still check associated deals, open tickets, email history, and contact ownership before merging. The score tells you where to look first, not that the merge is guaranteed to be safe.

Which HubSpot hubs and tiers get the Duplicate Similarity Score?

The feature is available at Professional and Enterprise tiers across Commerce Hub, Content Hub, Marketing Hub, Data Hub, Sales Hub, Service Hub, and Smart CRM. It's currently in public beta, meaning all accounts in eligible tiers are enrolled automatically with no action required.

How should I use the similarity score filter in my duplicate cleanup workflow?

Start by filtering to 80% or higher and merge those pairs first. They're the easiest and least risky. Then work down to 50% and review those pairs more carefully. Schedule a recurring task for low-score pairs below 50%. This tiered approach prevents cleanup fatigue and makes steady progress over time.

Will the similarity score work with custom duplicate rules or only HubSpot's default model?

The similarity score column reflects only HubSpot's default duplicate detection model. If you're using custom duplicate rules, those results appear separately and don't carry the same similarity score. The filter applies specifically to the default model's output, so custom rule pairs won't show a percentage in this column.

HubSpot Training

HubSpot Implementation

AI Services

Design

Content

Duplicate Similarity Score: See Why HubSpot Flagged That Match

What This Update Actually Is

Why HubSpot Shipped This

How to Use It Step by Step

What It Touches in Your HubSpot Strategy

Who Should Care Most

George's Take

Frequently Asked Questions

What is the HubSpot Duplicate Similarity Score?

What score threshold does HubSpot use to surface duplicates by default?

Does a high similarity score mean I should always merge those records?

Which HubSpot hubs and tiers get the Duplicate Similarity Score?

How should I use the similarity score filter in my duplicate cleanup workflow?

Will the similarity score work with custom duplicate rules or only HubSpot's default model?

Comments

Leave a Comment

Related Resources

Webhook Data Sources in HubSpot Data Studio

Workflow Enrollments Survive Record Merges in HubSpot

Activity Auto-Associations for App Objects in HubSpot

Need Help Making Sense of HubSpot?