ForumsDevelopersInvalid XML chars
Invalid XML chars
Author | Message |
---|---|
Benjamin Leclerc |
Hi,
Toodledo sometimes sends invalid XMLs. These XMLs are invalid because of invalid characters (like char 0x1a). These characters can be added in a note by copy paste (from Word for example). The w3c specifies the valid chars. I have patched my app to remove invalid chars, but it would be easier if toodledo does it upfront. http://www.w3.org/TR/xml11/#charsets Thank you |
fahad |
Yes this has been happening for over two years and has been reported before.
Toodledo please fix! +1 |
Jake Toodledo Founder |
I thought we had fixed this. I guess I should say that awhile ago we implemented some code for scrubbing out invalid characters, but we may have missed a character or some entry point.
Do you know of a way to replicate the problem? Do you have a Word file that you can copy and paste from to produce an invalid character in Toodledo? If so, please share it with us so that we can fix the problem. I just ran my unit tests for this and they all passed with the known invalid characters being correctly removed. These are the characters that we allow. All others are removed. ($char == 0x9) || //tab (9) ($char == 0xA) || //newline (10) ($char == 0xD) || //carriage return (13) (($char >= 0x20) && ($char <= 0xD7FF)) || //space and printable characters (32-55295) (($char >= 0xE000) && ($char <= 0xFFFD)) || //(57344-65533) (($char >= 0x10000) && ($char <= 0x10FFFF))) //(65536-1114111) |
fahad |
I for example don't remember what other characters we've encountered in the recent past but I think the safest way to actually remove all other invalid utf-8 characters would be achieved by the following in PHP:
$str = iconv("UTF-8","UTF-8//IGNORE",$str); This ensures all invalid UTF-8 characters are removed. Characters you've mentioned did indeed give us headache a year ago but I recall this was indeed fixed. There are still times when we've heard from users that 'sync' isn't working and it usually turns out to be some weird UTF-8 character that forms a malformed XML output. Other than that the following list shows 'valid' characters and a range of characters to avoid: http://www.w3.org/TR/xml/#charsets Thanks This message was edited May 17, 2011. |
Benjamin Leclerc |
Unfortunately, I cannot reproduce it.
One of the users reported that error to me. The only thing I saw is that the char 0x1a was present in the XML from Toodledo. Then I have added a function to remove the invalid XML chars in my app (the same conditions has you mentioned above) and the problem was fixed. So for me, somewhere in the text of a notebook, it is still possible that Toodledo sends invalid chars. This message was edited May 18, 2011. |
Jake Toodledo Founder |
We were able to find one place where invalid XML characters were not being scrubbed out of notebook entries, so this has been fixed. That might take care of it going forwards.
|
You cannot reply yet
U Back to topic home
R Post a reply
To participate in these forums, you must be signed in.