Is something planed that chinese text supports word wraps too?

0 votes
asked Oct 27, 2021 in Wanted features by kirchsth (7,140 points)
In C4-PlantUML we get an issue that chinese descriptions supports no word wrap. Is something planed in this area?

Thank you and best regards
Helmut
```

@startuml
skinparam wrapWidth 200

["this is a component, and it will wrap this descripttion"] as a

["这也是一个组件,但是这段文件不会自动换行下去,这是为什么呢。"] as b
@enduml
```

commented Oct 27, 2021 by The-Lu (74,900 points)

Hello K.,

See a possible (not so easily convenient) workaround here (with adding some spaces car.):

The main question is :

  • What are (character) boundary of words?

Regards,
Th.

commented Oct 27, 2021 by plantuml (295,800 points)
  • What are (character) boundary of words?
In unicode, there is something called "General Category" (see http://www.unicode.org/reports/tr44/#General_Category_Values )
Maybe we can use this to better handle world wrap without hardcoding special character as boundary.
We are going to investigate, stay tuned :-)
commented Oct 27, 2021 by kirchsth (7,140 points)

during my first checks I found following (I added to my origianl issue, but I got no feedback until now):

  1. https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages. But I don't know if we can implement it in C4-PlantUML at all.

  2. fullwidth characters which we could potentially support (but I don't know if it really solves the problem)

, (U+FF0C FULLWIDTH COMMA) is the comma (,). It cannot be used for enumerating a list; see "enumeration comma" below.
! (U+FF01 FULLWIDTH EXCLAMATION MARK) is the exclamation mark (!).
? (U+FF1F FULLWIDTH QUESTION MARK) is the question mark (?).
; (U+FF1B FULLWIDTH SEMICOLON) is the semicolon (;).
: (U+FF1A FULLWIDTH COLON) is the colon (:).
( , ) (U+FF08 FULLWIDTH LEFT PARENTHESIS), (U+FF09 FULLWIDTH RIGHT PARENTHESIS) are parentheses (round brackets).
The Chinese full stop (U+3002 IDEOGRAPHIC FULL STOP)

(the U+FF08 FULLWIDTH LEFT PARENTHESIS I would not support)

commented Oct 28, 2021 by The-Lu (74,900 points)

Hello all,

In addition, you can also watch the following Unicode report:

Regards.

commented Oct 28, 2021 by plantuml (295,800 points)

Many thanks for the unicode fullwidth list. So we've taken the easiest way.

So this is fixed in last beta http://beta.plantuml.net/plantuml.jar and on the online server.

Tell us if it's working for you !

commented Oct 28, 2021 by kirchsth (7,140 points)

Hello plantuml team,

thank you for the fast implementation. The orig issue requester accepted in the meantime my \n suggestion. I asked for a final feedback of your implementation, I hope we get one.

I checked your source: I would

  1. add the `)` (U+FF09 FULLWIDTH RIGHT PARENTHESIS) too
  2. for me it looks strange that the `,` starts in a new line. I think it should be part of the previous line (part of line 2 and not line 3
Best regards
Helmut
commented Oct 29, 2021 by plantuml (295,800 points)

Ok for point1

About point2, the issue is that wrapWidth is too small. If you increase is, it works.

I think it's an issue because the actual solution is not working for very long sentence. (see for example). Should we wrap the sentence even if there is no separator ?

commented Oct 29, 2021 by kirchsth (7,140 points)
edited Oct 29, 2021 by kirchsth

at 2) in my sample the line has already a space in it (it is not a "long sentence") and therefore I think that the lines of your implementation ends too early. It should include the fullwidth characters at the end.

I got in the meantime a feedback of the new implementation in the orig. issue

Seems it will wrap the content, but very strange.The fullwidth char based solution seems not the good selection.
Why the fullwidth comma , was in the preline, the fullwidth ! was in the next line.
In Chinese, a sentence end with a full width dot 。, and it may have many parts, which separated by the full width  comma`,`.If some parts are long, some parts are short, the output will very ugly.
And I saw you discuss in plantuml, Chinese sentence has no word boundary.The best choice is to wrap it with '\n' or whitespace manually, or plantuml will limit its hard length and auto turn it to next line.

Maybe the best is that a hard limit is implemented if a CJK character is found like a space (e.g. minimum supported check is U+4E00 to U+9FCC)  https://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode
BR Helmut

commented Nov 6, 2021 by kirchsth (7,140 points)
Hello @plantuml,

did you have time and work on the topic?

If not, I assume it is the best that all changes are rolled backed (until it can be solved)
because I think the current version is worser than before.

Thank you and best regards
Helmut

1 Answer

0 votes
answered Nov 6, 2021 by plantuml (295,800 points)

If not, I assume it is the best that all changes are rolled backed (until it can be solved)
because I think the current version is worser than before.

Sorry about that!

This is not a top priority for us but if you explain us why the current version is worst than before, we can have a look at it. (Discarding all changes is of course possible but we would like better to solve this issue).

Can you give us some non working example ? Thanks!

commented Nov 6, 2021 by kirchsth (7,140 points)

Did you read my comments above; e.g. based on the new implementation (see orig statement, it uses the new implementation now) a line starts with a coma and this should not be the case.
Additional the orig person which requested the feature in C4-Stdlib wrote

Seems it will wrap the content, but very strange.The fullwidth char based solution seems not the good selection.
Why the fullwidth comma , was in the preline, the fullwidth ! was in the next line.
In Chinese, a sentence end with a full width dot 。, and it may have many parts, which separated by the full width comma`,`.If some parts are long, some parts are short, the output will very ugly.
And I saw you discuss in plantuml, Chinese sentence has no word boundary.The best choice is to wrap it with '\n' or whitespace manually, or plantuml will limit its hard length and auto turn it to next line.

Best regards
Helmut

commented Nov 7, 2021 by plantuml (295,800 points)

Thanks for your comment: I think I better understand the issue.

Here is a new beta, which works slightly differently:

What do you think about it ?

commented Nov 8, 2021 by kirchsth (7,140 points)
Thank you, for me it looks much better. I think this is a good first step.

I will try to get a feedback from the requester and inform you as soon I have it.

Thank you and best regards
Helmut
...