Emojis with links in rich text field produce weird behaviour

When inserting an emoji into a rich text field : :grinning: and combining it with links I run into a few problems. The links and the emojis are indeed inserted but the text in the link is not the one I selected in the prismic editor. For Example writiing :
:grin: prismic is amazing

will give back when requesting the api something like :
<p>"pr"<a href="https://prismic.io/">ismic i</a>"s amazing </p>

Is there a way to fix this problematic behavior ?

Hey Raffi, welcome to the forum!

I’m not able to reproduce this error. If I add a link followed by an emoji inside a Rich text field it’s correctly served in the API response. Maybe it is related to how you’re wrapping the Link.

Could you show me some examples of this to better understand what’s happening?
Screenshots or screen recordings help a lot

This issue has been closed due to inactivity.

Hello,

I believe I am confronted with this bug too.

From what I see, it seems that emojis are counted as 2 characters, which offsets the rich text markup (aka "spans").

Test cases

Case A - No emoji

This word is strong.

{
  "type": "paragraph",
  "text": "This **word** is strong.",
  "spans": [
    {
      "start": 5,
      "end": 9,
      "type": "em"
    }
  ]
}

The "strong" markup starts at position 5, which is correct.

Case B - one emoji

:apple:This word is strong.

{
  "type": "paragraph",
  "text": "🍎This **word** is strong.",
  "spans": [
    {
      "start": 7,
      "end": 11,
      "type": "em"
    }
  ]
}

The "strong" markup starts at position 7, but it should have been 6.

Case C - Five emojis

:apple::apple::apple::apple::apple:This word is strong.

{
  "type": "paragraph",
  "text": "🍎🍎🍎🍎🍎This **word** is strong.",
  "spans": [
    {
      "start": 15,
      "end": 19,
      "type": "em"
    }
  ]
}

The "strong" markup starts at position 15, but it should have been 10.

Why this is a problem

I am using the Ruby library, which properly counts emojis as one character, which means the HTML rendering provided by the Prismic gem does not match what is input in Prismic. By example, this is how case C would render :

<p>🍎🍎🍎🍎🍎This word <strong>is s</strong>trong.<p>

As you can see, the markup has an offset equal to the number of emojis preceding it. :smiley:

More info about the length of emojis

There are a few nice articles about it, including this one : Jonathan New | "πŸ’©".length === 2

One last question

Is there another way to report a bug or is this the designated process?

Thank you very much, have a great day!

Hello François, thanks for reaching out. I was able to reproduce this result, and I've made the dev team aware of it. We don't yet know if this could be considered a bug or a feature request. So as soon as I have more information I'll let you know.

Thanks

Hi, so not sure if this would help, but the start index of the spans elements is correct, it is a 0 start index, and even the example of the one with emoji's starts correctly, the beginning of the ** characters, not the word it appears.

The offset does seem to match the number of emojis but the writing room and the logic of parsing seems to account for the encoding of an emoji.

Also not sure if the examples are whipped up as 'examples'. If you generate a document from the writing room and query it, you get the following:

{
  "data": {
    "allArticles": {
      "edges": [
        {
          "node": {
            "article": [
              {
                "type": "paragraph",
                "text": "This word is strong.",
                "spans": [
                  {
                    "start": 5,
                    "end": 9,
                    "type": "em"
                  }
                ]
              },
              {
                "type": "paragraph",
                "text": "🍎🍎🍎🍎🍎This word is strong.",
                "spans": [
                  {
                    "start": 15,
                    "end": 19,
                    "type": "em"
                  },
                  {
                    "start": 23,
                    "end": 29,
                    "type": "hyperlink",
                    "data": {
                      "link_type": "Web",
                      "url": "https://google.com"
                    }
                  }
                ]
              },
              {
                "type": "paragraph",
                "text": "🍎🍎🍎🍎🍎This **word** is strong.",
                "spans": [
                  {
                    "start": 15,
                    "end": 23,
                    "type": "em"
                  },
                  {
                    "start": 27,
                    "end": 33,
                    "type": "hyperlink",
                    "data": {
                      "link_type": "Web",
                      "url": "https://google.com"
                    }
                  }
                ]
              }
            ]
          }
        }
      ]
    }
  }
}

Running this through a crude parser:

Start Sentence versions:
    This <em>word</em> is strong.
Start Sentence versions:
    🍎🍎🍎🍎🍎This <em>word</em> is strong.
    🍎🍎🍎🍎🍎This word is <a href="https://google.com">strong</a>.
Start Sentence versions:
    🍎🍎🍎🍎🍎This <em>**word**</em> is strong.
    🍎🍎🍎🍎🍎This **word** is <a href="https://google.com">strong</a>.

The indexes of the above are correct, even with the additional ** which would be markdown normally, but isn't needed for em, was the first instance of this using a custom dom library?

I was questioning this when looking at the examples and the index numbers are wrong, if you take it and parse it manually, and with some JS in a console, you get the same result.

I mean wrong as in that they don't make sense as the previous post bring out, but when getting them from the writing room, which, by inferring that it is coming from the Prismic API, it is the internal part that begins building the spans. But that appears to be correct when getting data from the API.

But am I wrong in suggesting that the examples provided are not accurate? Or is there encoding differences?

Just a thought that devs might be chasing a wild goose and there might only be an issue with the richtext parts, but it might not be the start/stop positions in a string. (Unless that has been fixed recently)

Please ignore this though if it is not correct :slight_smile:

Simple example

let article = withEmoji.data.allArticles.edges[0].node.article;
article.forEach(data => {
  console.group("Start Sentence versions:");

  data.spans.forEach(span => {
    let text = data.text.substring(span.start, span.end);

    switch (span.type) {
      case 'em':
        console.log(`${data.text.substr(0, span.start)}<em>${text}</em>${data.text.substr(span.end)}`)
        break;
      case 'hyperlink':
        console.log(`${data.text.substr(0, span.start)}<a href="${span.data.url}">${text}</a>${data.text.substr(span.end)}`)
        break;
    }
  })

  console.groupEnd()
  
})

You can see the code and data here: JS Bin on jsbin.com

1 Like

Thank you for sharing your example @ReeceM.
As soon as we have news about this we'll let you know

1 Like

Thank you very much @ReeceM for your comprehensive contribution! You can ignore the ** in my examples, retrospectively I can't explain why I added them to the examples, and they are confusing, I am sorry.

Indeed, but if you use the Prismic library for Ruby, the rendering will have an improper offset because the length of emojis is not computed the same way in JS and Ruby. Ruby counts an emoji as 1 character, but in JS the length of an emoji will vary. This is explained in the article I included in my previous post.

Tu put it more simply, the JSON payload provided by Prismic is primarily compatible with JavaScript when using both emojis and rich text.

@Pau Thank you for the quick responses! :+1: Do you need any additional information to identify the bug?

So I run a similar thing in ruby and see now,

string = "🍎🍎🍎🍎🍎This word is strong."
puts "🍎🍎🍎🍎🍎".size
puts string[10..14]
puts string[15..19]

It does give a messed up result:

5
using the correct points: word 
Using the provided points: is st

Changing encoding on ruby doesn't help either.

(Sorry for going off on a tangent about JS side @francois.ferrandis :+1: )

Hello @ReeceM and @francois.ferrandis. Thanks a lot for all the details. This subject is tracked as an open issue in the dev team's backlog. However, it is important to note that we will not likely update this feature in the near future.

If there is any change, we will give an announcement in this same thread.
Thanks

It's 2024 and we still have this issue. When will be fixed?

Hi @wakematta , this issue is not a priority and we don't have a timeline for a fix at the moment. I will update this thread if anything changes.

Best,
Guy