Gadget Hype's Tech Hub — Cloud Computing Explained

Utilizing Machine Learning for Patent Searches Model Development

Google develops a collection of phrases for educating patent search models. Numerous patent holders utilize unconventional language to articulate their patent's topic, often describing a soccer ball as a spherical recreation device. This practice can lead to diversified and ineffective search...

, and Administrator

2025 August 2 . 7:08 AM

2 min read

Utilizing AI to Develop Proprietary Intelligence for Trademark Scouring

Utilizing Machine Learning for Patent Searches Model Development

In the world of patents, navigating through complex and often non-standard language can be a daunting task. Many patent owners use unconventional terms to describe their inventions, making it challenging for searchers to find relevant patents. However, a solution is at hand, thanks to Google's BigQuery and curated datasets.

Google has made its vast patent data accessible to the public through the "patents-public-data" dataset on Google BigQuery. This dataset aggregates global patent information, including titles, abstracts, classifications, and inventor details. While Google does not provide a prepackaged dataset of phrases specifically designed for training patent search models, it offers a rich source of data that can be extracted and utilised for this purpose.

For those seeking a more focused dataset, Kaggle hosts curated datasets like the "CleanTech - Google Patent Dataset." This dataset, derived from Google Patents data, is particularly useful for those interested in renewable energy and sustainable technologies. It offers JSON files with patents filtered by keywords such as "solar energy," "photovoltaics," and "wind energy."

If you wish to create a customised phrase dataset for training, you can query the "patents-public-data.patents.publications" dataset on Google BigQuery. Write SQL queries to extract phrases or text segments from titles, abstracts, and descriptions based on your selection criteria. Export the results in formats like CSV or JSON suitable for machine learning training.

It's important to note that while Google has created a dataset of phrases for training patent search models, manual PDF downloads of individual patents from Google Patents for large-scale model training is impractical. Specialized patent search tools like PatentLens and USPTO's Global Patent Search Network may complement but do not contain Google's phrase datasets.

In essence, the best approach is to utilise Google's BigQuery patent data or trusted derivatives thereof to build or acquire a phrase dataset for training patent search models. The dataset, comprising approximately 50,000 phrase-to-phrase pairs, includes labels denoting how phrases are related to one another. An example of non-standard language is describing a soccer ball as a "spherical recreation device."

The dataset serves as a tool for patent owners and searchers to better navigate patent descriptions, improving the efficiency and accuracy of patent searches. The dataset can lead to more practical and focused search returns, making it an invaluable resource in the patent search landscape.

[1] Source: https://www.kaggle.com/google-research/cleantech-google-patent-dataset [2] Source: https://arxiv.org/abs/2006.03934 [3] Source: https://www.sciencedirect.com/science/article/pii/S2468051820300895 [4] Source: https://www.patentlens.org/; https://www.uspto.gov/patent/global-patent-search-network

[1] The dataset, primarily sourced from Google's BigQuery patent data, is a powerful tool for patent owners and searchers, offering approximately 50,000 phrase-to-phrase pairs and labels denoting their relationships.

[2] This dataset, augmented by AI technologies and data-and-cloud-computing solutions, can significantly improve the efficiency and accuracy of patent searches by aiding in navigating non-standard language often found in patent descriptions.

Latest

In this picture, we see the coin in gold and brown color. We see some text written as "The United...

Invest Smart, Save More

Silver and Gold Surge to Decade, Record Highs Amid Market Uncertainty

Silver prices climb to 2011 highs, gold surges past $4,000. Digital gold tokens like PAX Gold and Tether Gold gain popularity, driving demand for safe havens.

, and Administrator

2025 October 9

In this image there are two buildings, in which there is a fire in a building,and in the background...

Smart-home-devices

Firefighters Quickly Extinguish Blaze, Save Lives in Kamchatka

Firefighters' quick response saved lives. A faulty chandelier sparked the blaze, causing significant damage to an apartment.

, and Administrator

2025 October 9

Explore Latest Tech Trends!

Apple AirPods 4 Now Available at 20% Off During Amazon Prime Day 2025

Get the new AirPods 4 at an unbeatable price. Enjoy improved fit, noise cancellation, and advanced features during Amazon's Prime Day 2025.

, and Administrator

2025 October 9

there was a room in which people are sitting in the chairs,in front of a table looking into the...

Protect Your Gadgets from Cyber Threats

Telstra Confirms Data Breach Affecting 30,000 Employees

Telstra's data breach follows the recent Optus incident. 30,000 employees' data exposed, but no sensitive personal details. Stay vigilant against potential phishing attempts.

, and Administrator

2025 October 9

Utilizing Machine Learning for Patent Searches Model Development

Utilizing Machine Learning for Patent Searches Model Development

Read also:

Related

Latest