HubSpot has filed a patent for a method that improves computer systems, data stores, and search engine systems. The method involves triggering an on-demand deduplication action using an embedding model and approximate nearest neighbors algorithm to identify and process duplicate entity pairs. The patent aims to enhance sales, marketing, and service activities by utilizing custom objects and entity resolution systems. GlobalData’s report on HubSpot gives a 360-degree view of the company including its patenting strategy. Buy the report here.

According to GlobalData’s company profile on HubSpot, predictive modeling techniques was a key innovation area identified from patents. HubSpot's grant share as of September 2023 was 49%. Grant share is based on the ratio of number of grants to total number of patents.

Improving computer systems with on-demand deduplication action

Source: United States Patent and Trademark Office (USPTO). Credit: HubSpot Inc

A recently filed patent (Publication Number: US20230316186A1) describes a method for on-demand deduplication of entities within a database. The method involves generating embeddings for entities using an embedding model and then utilizing an approximate nearest neighbors algorithm to generate candidate duplicate entity pairs. Deduplication probabilities are generated for these pairs, and if the probability exceeds a threshold, the method specifies that the entities represented by the pair are duplicates. The on-demand deduplication action is then performed on these entities.

The triggering of the on-demand deduplication action can occur in various scenarios. For example, it can be triggered when a user accesses an entity within the database, allowing real-time identification and display of other duplicate entities. It can also be triggered during an entity import operation, where duplicate entities are identified and displayed in real-time. Additionally, the action can be triggered by user interaction with a user interface element or during an update operation on the database.

The method employs different techniques for generating candidate duplicate entity pairs. It can use a locality sensitive hashing algorithm or a hierarchical navigable small worlds (HNSW) algorithm to process the embeddings and generate these pairs.

Once the on-demand deduplication action is triggered, the method provides recommendations to the user. It can recommend merging the duplicate entities or deleting one entity while retaining the other. Alternatively, it can directly merge the entities.

The embedding model used in the method can be trained using various techniques. It can utilize self-supervised training, where the model learns from the data itself. The training can also involve constraints, such as considering an entity ID as a duplicate of itself. The model can be trained using labeled and unlabeled entities, and it can be designed as a Siamese network with weight sharing for entity comparison.

The patent also covers a non-transitory machine-readable medium comprising instructions for performing the method and a computing device with a processor and memory to execute the instructions. The computing device can generate embeddings, candidate duplicate entity pairs, deduplication probabilities, and provide correlation between duplicate entities.

Overall, this patent presents a method for on-demand deduplication of entities within a database, offering real-time identification and recommendations for handling duplicate entities. The method utilizes embedding models and approximate nearest neighbors algorithms for efficient processing and can be triggered in various scenarios.

To know more about GlobalData’s detailed insights on HubSpot, buy the report here.

Data Insights

From

The gold standard of business intelligence.

Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.

GlobalData

GlobalData, the leading provider of industry intelligence, provided the underlying data, research, and analysis used to produce this article.

GlobalData Patent Analytics tracks bibliographic data, legal events data, point in time patent ownerships, and backward and forward citations from global patenting offices. Textual analysis and official patent classifications are used to group patents into key thematic areas and link them to specific companies across the world’s largest industries.