What is a span's start / end actually counting? Code Units or Graphemes?

Imagine I have the following text:

We’re big fans of pizza :pizza:. Pizza is an Italian, specifically Neapolitan, dish typically consisting of a flat base of leavened wheat-based dough topped with tomato, cheese, and other ingredients, baked at a high temperature, traditionally in a wood-fired oven.

When I get this content via the API, the output is escaping the unicode characters, as seen here:

{
  "content": [
      {
        "type": "paragraph",
        "text": "We\u2019re big fans of pizza \ud83c\udf55. Pizza is an Italian, specifically Neapolitan, dish typically consisting of a flat base of leavened wheat-based dough topped with tomato, cheese, and other ingredients, baked at a high temperature, traditionally in a wood-fired oven.",
        "spans": [
          { "start": 18, "end": 23, "type": "strong" },
          {
            "start": 28,
            "end": 33,
            "type": "hyperlink",
            "data": {
              "link_type": "Web",
              "url": "https://en.wikipedia.org/wiki/Pizza",
              "target": "_self"
            }
          }
        ],
        "direction": "ltr"
      }
    ]
}

Can you confirm whether the start and end count is counting code units, or graphemes, or something else? I am working with a client to translate their Prismic content, and will be relying on Unicode characters very heavily.

Thanks.

It’s based on the number of characters in the text. Out of curiosity, what’s the goal behind checking this? If it’s for implementing your own Rich Text logic, we’d recommend avoiding that, our SDKs already handle this automatically!