Encoding seems to be lost after export to PNG and extract back

0 votes
asked 4 days ago in Bug by Alexander
It seems encoding information is lost during export to PNG.

Steps:

1. Create sequence diagram with some text in Russian

2. Export to PNG

3. Now extract it back: java -Dfile.encoding=UTF-8 -jar plantuml.jar -metadata -charset UTF-8 diagram.png

Expected result: all text is readable

Actual result: text in Russian is corrupted, displayed as ?????? ?? ????
commented 4 days ago by albert (2,340 points)
Can you give an example input file to reproduce the problem.
Which version of the plantuml;jar are you using?
commented 3 days ago by Alexander
Example: https://yadi.sk/i/yvF7aWVKilyP0g
It was created using this command: java -jar plantuml.jar -charset UTF-8 -tpng charset.plantuml
Version:
λ java -jar plantuml.jar -version
PlantUML version 1.2018.11 (Sat Sep 22 19:43:53 MSK 2018)
(GPL source distribution)
Java Runtime: Java(TM) SE Runtime Environment
JVM: Java HotSpot(TM) 64-Bit Server VM
Java Version: 1.8.0_181-b13
Operating System: Windows 7
OS Version: 6.1
Default Encoding: Cp1252
Language: en
Country: US
Machine: 700634-PC
PLANTUML_LIMIT_SIZE: 4096
Processors: 4
Max Memory: 1,873,805,312
Total Memory: 126,877,696
Free Memory: 122,169,096
Used Memory: 4,708,600
Thread Active Count: 1
commented 3 days ago by albert (2,340 points)
Please supply the source code or cut and past the source code into http://www.plantuml.com/plantuml/uml and post the resulting url.
commented 3 days ago by albert (2,340 points)
Looks indeed like that there is some skew between the original and the information in the png file / transferred back file as the output is with question marks where the Russian text should be.

2 Answers

0 votes
answered 1 day ago by plantuml (182,820 points)
 
Best answer
Finally it was easy to use iTXt chunk.

So this should be solved in last beta http://beta.plantuml.net/plantuml.jar

Tell us if it's not working for you!
commented 1 day ago by Alexander
Actually result still the same.
Maybe I need to use some specific command line options?
commented 1 day ago by plantuml (182,820 points)
Maybe I should be more specific.
You have to re-encode (that is, to re-create a new PNG file) with the last beta version.
And then extract metadata back from this new PNG files.
PNG files that have been generated with older versions of PlantUML cannot be retrieve (sorry about that)
commented 1 day ago by Alexander
Yes, that's clear and actually that is exactly what I tried to do(so I actually repeated steps from my initial description)
But as a result I still see corrupted characters.
Could you please provide recommended settings which work for your environment!?
commented 1 day ago by plantuml (182,820 points)
Could you send us by email your PNG file ?

BTW, I see that you have:
Default Encoding: Cp1252
This means that your default console cannot display russian characters (well, I think :-)

You have to use the following command line :

java -Dfile.encoding=UTF-8 -jar plantuml.jar -metadata -charset UTF-8 diagram.png > back_to_text.txt

Then edit "back_to_text.txt" file with some UTF-8 editor.
commented 20 hours ago by Alexander
Now it's ok, probably I made a mistake during initial test.
Thanks a lot, this software absolutely briliant!
0 votes
answered 2 days ago by plantuml (182,820 points)
Thanks for the report.

We are using standard zTXt chunk to store PlantUML source (see http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_PNG_files )

Sadly, zTXt must be encoded using ISO/IEC 8859-1 which means that Russian cannot be used there :-(

We could use iTXt chunk that could be compressed, but the use of those chunks is not very documented (at least in Java), so we did not succeed (yet) in compressing them.

Another option would be to encode PlantUML source using UTF-7 ( https://en.wikipedia.org/wiki/UTF-7) when we detect that some non ISO-8859-1 are used. Then we could store the UTF-7 encoded String in zTXt chunk.

So stay tuned, we'll post some message here when we will be ready to test.

Regards,
...