Imagine I have the following text:
We’re big fans of pizza
. Pizza is an Italian, specifically Neapolitan, dish typically consisting of a flat base of leavened wheat-based dough topped with tomato, cheese, and other ingredients, baked at a high temperature, traditionally in a wood-fired oven.
When I get this content via the API, the output is escaping the unicode characters, as seen here:
{
"content": [
{
"type": "paragraph",
"text": "We\u2019re big fans of pizza \ud83c\udf55. Pizza is an Italian, specifically Neapolitan, dish typically consisting of a flat base of leavened wheat-based dough topped with tomato, cheese, and other ingredients, baked at a high temperature, traditionally in a wood-fired oven.",
"spans": [
{ "start": 18, "end": 23, "type": "strong" },
{
"start": 28,
"end": 33,
"type": "hyperlink",
"data": {
"link_type": "Web",
"url": "https://en.wikipedia.org/wiki/Pizza",
"target": "_self"
}
}
],
"direction": "ltr"
}
]
}
Can you confirm whether the start and end count is counting code units, or graphemes, or something else? I am working with a client to translate their Prismic content, and will be relying on Unicode characters very heavily.
Thanks.