How AI Prior-Art Search Actually Works

From a sentence to a shortlist, one stage at a time

Every prior-art search answers one question: of the 150 million-plus patents and papers ever published, which few actually matter to this invention? The interactive tool above runs a miniature version of how AI semantic search answers it. It moves through five stages — the same five stages PatentScan runs at full scale.

1Your disclosure

You start with the invention in plain language — no Boolean operators, no classification codes. The three preset buttons describe the same device three ways: everyday words, dense patent-ese, and a mix. That is the whole problem in one click: the invention is constant, but the words people use for it are not.

2Concept extraction

Instead of indexing your literal words, the model reads the disclosure for meaning and lights up the underlying concepts — balance control, tilt sensing, electric drive, and so on. This is the step keyword search simply does not have. Edit the text and watch the concept chips change; swap to “plain words” and notice the same concepts still appear even though almost none of the technical terms do.

3The meaning map

Every document — and your invention — becomes a point whose position is determined by the concepts it covers, not the words it uses. Things that mean the same thing land near each other. Your invention is the blue diamond; the closer a document sits, the more it shares the meaning of your disclosure. Hover any point to see what it is and how strongly it matches. The thick lines are the search reaching out to its nearest neighbours — the AI equivalent of “these are the ones worth your attention.”

Try the toggle. Flip from Semantic (AI) to Keyword. Documents that describe your invention in different words go dim, and the genuinely relevant ones the keyword search can no longer see are ringed in red. The recall number on the right is the share of the truly relevant prior art each method actually surfaced.

4The relevance threshold

Semantic search returns a ranked list, so you choose how wide to cast the net. Drag the threshold down and you recover more of the real prior art (higher recall) at the cost of more noise (lower precision); drag it up and the results get cleaner but you risk dropping a reference that matters. That trade-off — coverage versus reviewing time — is the entire economics of a search. Keyword search gives you no such dial: a word is either present or it is not.

5The ranked shortlist

What lands on the right is the deliverable: a ranked, scored set of references an attorney can actually review — patents and non-patent literature together, ordered by how closely they read on the invention rather than by which ones happened to reuse your vocabulary.

Why this matters for invalidation and clearance

The references most likely to sink a patent are often the ones written years earlier, in a different field, or in another language — exactly the references that share an invention’s meaning but none of its words. That is the vocabulary gap, and it is the single biggest reason thorough keyword searches still miss decisive prior art. Semantic search closes it by matching on concepts, which is why a modern workflow pairs the two: keyword for precision on known terms, semantic for the recall that catches what you didn’t know to search for.

A note on this demo: the corpus, concepts, and scores above are a small, hand-built illustration so the mechanism is easy to see. PatentScan runs the real version against the full global corpus with production embeddings — the principle is identical, the scale is not.