Encoding seems to be lost after export to PNG and extract back

asked Oct 12, 2018 in Bug by Alexander

It seems encoding information is lost during export to PNG.

Steps:

1. Create sequence diagram with some text in Russian

2. Export to PNG

3. Now extract it back: java -Dfile.encoding=UTF-8 -jar plantuml.jar -metadata -charset UTF-8 diagram.png

Expected result: all text is readable

Actual result: text in Russian is corrupted, displayed as ?????? ?? ????

commented Oct 12, 2018 by albert (3,620 points)

Can you give an example input file to reproduce the problem.
Which version of the plantuml;jar are you using?

commented Oct 13, 2018 by Alexander

Example: https://yadi.sk/i/yvF7aWVKilyP0g
It was created using this command: java -jar plantuml.jar -charset UTF-8 -tpng charset.plantuml
Version:
λ java -jar plantuml.jar -version
PlantUML version 1.2018.11 (Sat Sep 22 19:43:53 MSK 2018)
(GPL source distribution)
Java Runtime: Java(TM) SE Runtime Environment
JVM: Java HotSpot(TM) 64-Bit Server VM
Java Version: 1.8.0_181-b13
Operating System: Windows 7
OS Version: 6.1
Default Encoding: Cp1252
Language: en
Country: US
Machine: 700634-PC
PLANTUML_LIMIT_SIZE: 4096
Processors: 4
Max Memory: 1,873,805,312
Total Memory: 126,877,696
Free Memory: 122,169,096
Used Memory: 4,708,600
Thread Active Count: 1

commented Oct 13, 2018 by albert (3,620 points)

Please supply the source code or cut and past the source code into http://www.plantuml.com/plantuml/uml and post the resulting url.

commented Oct 13, 2018 by Alexander

http://www.plantuml.com/plantuml/uml/RP71QZ8n483lynJp2_AV5DkZ5775fVQm5gmzgKXsCsheRjOciqLlfRttr7jVGGg5KagV8Vj6pGeLeSqXa8ylv2FJjImC5rcAVIMMHcIeHKO0h3Yb0zkV05PA4YxodEDy0pOHtZXfdi8_IGkb2S_3OJWAnXUNmta_68TObu9f8cKpYffRlt0mQ6VPh0uh4f8Cfi9oerXNblnr6I2b1UzVy-fKJxKeU9KRaXXlyVeg7Cs7GHG2s1kbrwAgj7HRa4ue8cl5ae9KBJKQkYDZIIf32QjS7_a-MnGfZsd3UpSZpPZuZSoHfNt0duhloNMHnJwlTXPt9gtMh0Rm_-VFBDFQGe2D3lQmZUwbl7VlxjMzbG_GEz3dyj5zkbtvL34FWyfqN-x3xQ1BMlh9V0C0

commented Oct 13, 2018 by albert (3,620 points)

Looks indeed like that there is some skew between the original and the information in the png file / transferred back file as the output is with question marks where the Russian text should be.

Your answer

Your name to display (optional):

Email me at this address if my answer is selected or commented on:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

Please complete the anti-spam verification

[Antispam2 Feature: please please wait 1 or 2 minutes (this message will disappear) before pressing the button otherwise it will fail](--------)

To avoid this verification in future, please log in or register.

2 Answers

answered Oct 15, 2018 by plantuml (298,480 points)

Best answer

Finally it was easy to use iTXt chunk.

So this should be solved in last beta http://beta.plantuml.net/plantuml.jar

Tell us if it's not working for you!

commented Oct 15, 2018 by Alexander

Actually result still the same.
Maybe I need to use some specific command line options?

commented Oct 15, 2018 by plantuml (298,480 points)

Maybe I should be more specific.
You have to re-encode (that is, to re-create a new PNG file) with the last beta version.
And then extract metadata back from this new PNG files.
PNG files that have been generated with older versions of PlantUML cannot be retrieve (sorry about that)

commented Oct 15, 2018 by Alexander

Yes, that's clear and actually that is exactly what I tried to do(so I actually repeated steps from my initial description)
But as a result I still see corrupted characters.
Could you please provide recommended settings which work for your environment!?

commented Oct 15, 2018 by plantuml (298,480 points)

Could you send us by email your PNG file ?

BTW, I see that you have:
Default Encoding: Cp1252
This means that your default console cannot display russian characters (well, I think :-)

You have to use the following command line :

java -Dfile.encoding=UTF-8 -jar plantuml.jar -metadata -charset UTF-8 diagram.png > back_to_text.txt

Then edit "back_to_text.txt" file with some UTF-8 editor.

commented Oct 16, 2018 by Alexander

Now it's ok, probably I made a mistake during initial test.
Thanks a lot, this software absolutely briliant!

Your comment on this answer:

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

To avoid this verification in future, please log in or register.

answered Oct 14, 2018 by plantuml (298,480 points)

Thanks for the report.

We are using standard zTXt chunk to store PlantUML source (see http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_PNG_files )

Sadly, zTXt must be encoded using ISO/IEC 8859-1 which means that Russian cannot be used there :-(

We could use iTXt chunk that could be compressed, but the use of those chunks is not very documented (at least in Java), so we did not succeed (yet) in compressing them.

Another option would be to encode PlantUML source using UTF-7 ( https://en.wikipedia.org/wiki/UTF-7) when we detect that some non ISO-8859-1 are used. Then we could store the UTF-7 encoded String in zTXt chunk.

So stay tuned, we'll post some message here when we will be ready to test.

Regards,

Encoding seems to be lost after export to PNG and extract back

Your comment on this question:

Your answer

2 Answers

Your comment on this answer:

Your comment on this answer:

Related questions