Error on line 84 at column 272: PCDATA invalid Char value 12

Hi, some of the submitted answers to my project show the error message like ‘‘error on line 84 at column 272: PCDATA invalid Char value 12’’ . What will be the reason for such error, and how can I troubleshoot this? Thanks

Hi @sawalee,

Welcome to the community! Would you mind sharing with us a screenshot of the issue you are seeing so that we are able to better understand your issue.

Have a great day!

Sure,

if you see this screenshot above, there is an error saying ‘error on line… at column… PCDATA invalid Char value’
this type of error message appears in several data. The data was initially fully-coded and submitted without an error, but then in this screen, it does not show the full info. The part of data became blank, then the error message appeared like this,

Hi @sawalee,

Would you mind sharing with us the following information in a private message so that we could have a closer look at your case:

  • Username
  • Server (OCHA or HHI)
  • Project name

Have a great day!

Hi @sawalee,

Could you also let us know the method (KoBoCollect android app or Enketo) that you adopted to collect data for this project. If Enketo could you kindly let us know the device (and it’s operating system) that you used for collecting data.

Have a great day!

Hi thanks for this , I believe that we used Enketo because we have not used android app at all, and
the device is a laptop, one of HP series and the OS is Windows 10
hope it is good for the answer but let me know if you need any further info

1 Like

Hi @sawalee,

Thank you for the information. Could you also let us know the browser you had used to collect data.

Have a great day!

Yes mainly used the Chrome ,
thanks a lot I await your response @Kal_Lam

1 Like

Hi @sawalee, we spent some time digging into this, that’s the first time we’ve seen this particular issue. It turns out that when you pasted text into your Enketo form it included some archaic invisible characters that broke part of the conversion into XML. Only Chrome was found to actually result in a valid submission to kobocat; Firefox changed it to an extent that didn’t allow the data to be received by the server. The XML that Enketo sent back to the server included this bit:

    <parsererror xmlns="http://www.w3.org/1999/xhtml" style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black">
        <h3>This page contains the following errors:</h3>
        <div style="font-family:monospace;font-size:12px">error on line 16 at column 868: PCDATA invalid Char value 4</div>
        <h3>Below is a rendering of the page up to the first error.</h3>
    </parsererror>

…which renders to

According to Wikipedia, Char value 28 is used at the end of a file, or between “what might otherwise be separate files”. This is not something someone can type in with a keyboard, so it had to have been copy-pasted.

`This seems to have happened for three of your records. In each case it seems to have been the ‘abstract’ field that broke things; all data after that field were not included in the submission. I suggest deleting the submissions and adding them again.

In order for us to reproduce and fix the issue, could you please let me know where you copied the abstracts from? I assume it’s an online database?

In the meantime, to avoid such issues going forward I recommend you either use Firefox for data entry or always paste abstracts using Shift+Command+V (Ctrl+Alt+V on Windows). \

2 Likes

Hi @tinok thanks a lot for your help,
actually there are around 20 data that have this kind of error
(the project EE EGM SRs-DEX-Official version - Jun 10 has 3 data with this error, and the project EE EGM IEs-DEX- Official version - Jun11 has 15 data). I will share with you the list of data in the following message.

I would like to double check then whether the broken data is able to be recovered if you fix the issue

The data copied & pasted were all from the offline pdf files. Will that be helpful for you if I send those pdf files via email? Let me know anytime please

Thanks

The list of data broken
the project<<EE EGM SRs-DEX-Official version - Jun 10>>

  1. (as of the column ‘‘study ID’’) 46501920 - (Error message) error on line 16 at column 866: PCDATA invalid Char value 4
  2. 47128053 - error on line 16 at column 105: PCDATA invalid Char value 28
  3. 46501920 - error on line 16 at column 868: PCDATA invalid Char value 4

the project <<EE EGM IEs-DEX- Official version - Jun11>>

  1. 47099318 - error on line 111 at column 17: PCDATA invalid Char value 4
  2. 46501892 - error on line 94 at column 39: PCDATA invalid Char value 4
  3. 47122887 - error on line 72 at column 59: PCDATA invalid Char value 14
  4. 48044642 - error on line 84 at column 272: PCDATA invalid Char value 12
  5. 47099318 - error on line 111 at column 17: PCDATA invalid Char value 4
  6. 47124522 - error on line 16 at column 1032: PCDATA invalid Char value 14
  7. 47918544 - error on line 196 at column 31: PCDATA invalid Char value 15
  8. 47930716 - error on line 85 at column 388: PCDATA invalid Char value 14
  9. 47098829 - error on line 224 at column 8: PCDATA invalid Char value 4
  10. 46501892 - error on line 94 at column 39: PCDATA invalid Char value 4
  11. 47122887 - error on line 72 at column 59: PCDATA invalid Char value 14
  12. 46886059 - error on line 81 at column 69: PCDATA invalid Char value 12
  13. 47116633 - error on line 84 at column 232: PCDATA invalid Char value 12
  14. 47123905 - error on line 100 at column 96: PCDATA invalid Char value 28
  15. 46329100 - error on line 86 at column 88: PCDATA invalid Char value 15

And sorry just to note, one of members who had this error in her data has just told me that she used the shift+command+v for copying&pasting all her data. But then the error occurred again (her ID is carolina_lozmd, in OHCA server and used the Chrome browser). Should I let members use Firefox for data entry, but without recommending the shift+command+v? Thanks

1 Like

Shift+Command+V actually won’t help with this, because the special characters aren’t a part of rich-text formatting, they’re bona fide plain-text characters that have been around since ASCII was first defined in 1963(!) To make the situation tricker, these characters might be completely invisible in some applications and operating systems.

We’ll have to work with the developer of Enketo to fix this for future submissions. Unfortunately, I think the previous submissions are unrecoverable because (sorry for personifying software):

  1. Enketo asks the Chrome to generate XML for upload, but doesn’t realize that the XML contains errors;
  2. Chrome’s failed XML includes “a rendering of the page [XML] up to the first error”, which contains the necessary identifiers for a valid submission but omits many of the responses;
  3. Enketo uploads this failed XML to KoBoCAT;
  4. KoBoCAT receives and stores it, and responds that the submission was successful;
  5. Enketo deletes the original data because it believes it has been stored on the server.

Using Firefox could be helpful in that the submissions will at least fail to upload, giving the enumerator an indication that something is wrong. However, many submissions could be added to the upload queue before realizing the problem, and it may not be possible to retrieve those responses at all.

https://www.reformattext.com/remove-control-characters.htm looks like a decent JavaScript tool to remove these troublesome characters. I can’t vouch for its safety, but it seems okay after taking a quick glance at the code. Also, for me, the text I paste in is not sent to their servers; the text processing is done wholly within the browser.

[edit: yes, it would be helpful to see one of the original PDFs. if you share it with Tino, he can share it with me]

3 Likes

It would definitely be useful to have the PDF. You can attach it here or send it over to me in a private message. This will help us to reproduce and thus fix the issue, but I’m hoping with this I can also test the workaround John offered (or suggest a different one in case something easier can be found).

1 Like

Hi thanks a lot for this , the thing is that the pdf file is not a format to be able to attach here (in a private message as well) can I please send this via email? If you let me know any valid email address

Thanks

Hi @sawalee
You can easily upload this in a shared drive and send it to @tinok directly in a private message.

Stephane

1 Like

sure do thanks a lot @stephanealoo

1 Like

Thanks for sharing the PDFs. I tested it with one of them and was able to reproduce the issue without needing to generate invalid chars in the console. I copied the text into this file: invalid characters.txt. They are visible in a program like Sublime as <0x0e>.

The issue stems from the fact that ligature letters (like the combined ff or fi characters) are copied as these ASCII control characters. Here is the same abstract as in the text file with the ligatures highlighted:

The good news is that the tool John recommended does strip these characters efficiently, so you can for now paste into that field first before pasting into the Enketo form. You can track the ultimate solution to the issue here: https://github.com/enketo/enketo-express/issues/181

3 Likes

@sawalee I just found an easier workaround for your problem. The source for these invalid control characters is actually Acrobat Reader. In my case I could resolve the problem by using Preview and Foxit (on Mac) to open the PDF you shared. Neither of these tools copied invalid characters.

Please try using Foxit or a different free PDF reader tool. Using this will mean you can avoid first posting the text into the other tool we had recommended.

3 Likes