The "similar" filter is vaguely documented here: https://prismic.io/docs/technologies/query-similar-documents-graphql
I found a tiny bit more information here: How does Prismic work out "Similar content"?
As things stand, I have so far shied away from this feature (for years, over several projects) because it is too vaguely documented. Instead I've felt much more confident writing my own code to do things like match tags because I'll actually understand how it's working, and I will be able to communicate it to my client in a way they'll understand too.
Please expand significantly on the documentation for that feature. I have a lot of questions. For example:
- Which types of fields are considered? Rich text and key text were mentioned but I'm not sure if that's exhaustive.
- Does it matter if the rich text is a single block or multi-block?
- Are all possible block types of a rich text field considered, eg headings, paragraph, list item? Are images and embeds etc completely ignored, or will chunks of URLs etc sneak their way in?
- What about alt text of image fields?
- What about alt text of image blocks of rich text fields?
- Does key text just include "key text" itself or are you lumping in things like "select" with that, which I presume are stored very similarly?
- What if the field is in a repeatable group? Is it still indexed?
- What if the field is in the non-repeatable area of a slice? Is it still indexed?
- What if the field is in the repeatable area of a slice? Is it still indexed?
- It sounds like it works by indexing words. How is a "word" defined? Can you share a regex or similar which the underlying routine uses to match a "word"?
- Is the search exhaustive, or does it find "enough" matches then stop, even if more relevant ones might have been found later?
- Do results come back in any particular order?