Sorting "similar" results

In the How does the “similar” feature work? @samlittlefair mentioned that results are returned in order of relevance:

  • Do results come back in any particular order?

Yes, most relevant first. We give more priority to RichText content, especially headings.

This doesn't appear consistent with the results I'm seeing: AFAICT the order of results is the same as when the similar argument is not provided.

I've created a new content type, and added two articles (alpha and beta). I then duplicated both articles (alpha-duplicate and beta-duplicate), and queried them:

Query:

{
  allTest_pages(first: 99) {
    edges {
      node {
        _meta {
          id
          uid
        }
      }
    }
  }
}

Result:

{
  "data": {
    "allTest_pages": {
      "edges": [
        {
          "node": {
            "_meta": {
              "id": "1",
              "uid": "alpha"
            }
          }
        },
        {
          "node": {
            "_meta": {
              "id": "2",
              "uid": "beta"
            }
          }
        },
        {
          "node": {
            "_meta": {
              "id": "3",
              "uid": "alpha-duplicate"
            }
          }
        },
        {
          "node": {
            "_meta": {
              "id": "4",
              "uid": "beta-duplicate"
            }
          }
        }
      ]
    }
  }
}

I then ran the query with the similar argument:

Query:

{
  allTest_pages(first: 99, similar: {documentId: "1", max: 9999}) {
    edges {
      node {
        _meta {
          id
          uid
        }
      }
    }
  }
}

Result:

{
  "data": {
    "allTest_pages": {
      "edges": [
        {
          "node": {
            "_meta": {
              "id": "2",
              "uid": "beta"
            }
          }
        },
        {
          "node": {
            "_meta": {
              "id": "3",
              "uid": "alpha-duplicate"
            }
          }
        },
        {
          "node": {
            "_meta": {
              "id": "4",
              "uid": "beta-duplicate"
            }
          }
        }
      ]
    }
  }
}

If the results were ordered by relevance, I would have expected to see alpha-duplicate as the first item. Reducing the value of max verifies that alpha-duplicate is in fact the most relevant, as both beta results disappear:

Query:

{
  allTest_pages(first: 99, similar: {documentId: "1", max: 1}) {
    edges {
      node {
        _meta {
          id
          uid
        }
      }
    }
  }
}

Result:

{
  "data": {
    "allTest_pages": {
      "edges": [
        {
          "node": {
            "_meta": {
              "id": "3",
              "uid": "alpha-duplicate"
            }
          }
        },
      ]
    }
  }
}

Am I missing something? Is there a way to sort these correctly?

1 Like

Hi @timswalling,

Sorry for the slow response. I'll take some time tomorrow to go through your questions and see what answers I can find.

Hi @timswalling ,

This is an interesting question. I tried another experiment, and the results worked as expected. I created the following documents:

[
  {
    uid: "one",
    content: "dog cat cat cat cat cat"
  },
  {
    uid: "two",
    content: "dog dog cat cat cat cat"
  },
  {
    uid: "three",
    content: "dog dog dog cat cat cat"
  },
  {
    uid: "four",
    content: "dog dog dog dog cat cat"
  },
  {
    uid: "five",
    content: "dog dog dog dog dog cat"
  },
  {
    uid: "six",
    content: "dog dog dog dog dog dog"
  }
]

When I query the document with UID "six" in the Prismic GraphQL and Rest APIs, I receive the five remaining documents returned in reverse order (starting with "five" and ending with "one"), which is what I would expect.

For your case, I'm not exactly sure what could be causing the issue. Perhaps it's because it's such a small test?

Are you seeing issues with larger samples?
Sam

Edit: I added a couple more documents ("six-dupe" and "one-dupe" to make sure that the ordering wasn't based on the creation date, and the results still held up.)

Thanks @samlittlefair. I think the key difference between our examples is that you're querying _allDocuments while I'm querying allTest_pages.

If I update your example to filter by type (either using type: "page" or allPages) the sort order changes.

This is a major issue for my use case as we've numerous content types in our repo, but only one type is relevant.

Thanks, @timswalling! That does look like a bug. I'll submit an issue with the dev team and let you know what they say.

Sam

Thanks @samlittlefair

This thread is being monitored as an open ticket in the issue tracker. We will update this post as we get more information. If you have a similar use-case, you can ‘Flag’ this topic to reopen.