How does the "similar" feature work?

Hey @bart,

I've got a response from our dev team on your questions:

  • Which types of fields are considered? Rich text and key text were mentioned but I'm not sure if that's exhaustive.

RichText, KeyText, Select, UID.

  • Does it matter if the rich text is a single block on multi-block?

No it doesn't.

  • Are all possible block types of a rich text field considered, eg headings, paragraph, list item? Are images and embeds etc completely ignored, or will chunks of URLs etc sneak their way in?

Only text blocks are taken into account. A text block can be a heading, a paragraph, a list item. Embeds and images are ignored.

  • What about alt text of image fields?

They are ignored.

  • What about alt text of image blocks of rich text fields?

They are ignored.

  • Does key text just include "key text" itself or are you lumping in things like "select" with that, which I presume are stored very similarly?

Even if KeyText and Select are very close, they are independent.

  • What if the field is in a repeatable group? Is it still indexed?

Yes

  • What if the field is in the non-repeatable area of a slice? Is it still indexed?

Yes

  • What if the field is in the repeatable area of a slice? Is it still indexed?

Yes

  • It sounds like it works by indexing words. How is a "word" defined? Can you share a regex or similar which the underlying routine uses to match a "word"?

Our search engine is able to take a word, find it's root and matches the term will all it's variants/conjugation depending on the locale of the content.

  • Is the search exhaustive, or does it find "enough" matches then stop, even if more relevant ones might have been found later?

The search is exhaustive and our API offer pagination to go over each result.

  • Do results come back in any particular order?

Yes, most relevant first. We give more priority to RichText content, especially headings.

Thanks for posting this question. Let me know if you have others!

Best,
Sam

2 Likes