Skip to content

Conversation

carlobeltrame
Copy link
Contributor

@carlobeltrame carlobeltrame commented Jul 14, 2025

Fixes #3018, was broken since #2600
Fixes #1642
Fixes #2456
Fixes #2564
Fixes #2739
Enables better solutions for #1238, #1380, #1416, #1662

I added each change in a separate commit, with accompanying tests to make sure the intention of the code change will not as easily be broken again in the future. Especially the textkit layout integration test took a while to write, but should be of great importance to detect unwanted changes in the future.

This is technically a breaking change for people who have previously used custom word wrapping functions, but one could argue this feature has been quite broken for a long time now, as described in #3018. I can also update the documentation website and possibly some of the examples once this is merged.

The hyphenation algorithm may change the string (e.g. by removing some characters, namely soft hyphens).
Therefore, calculating the glyphs must come after hyphenation, so that the glyphs match the final string.

Fixes diegomura#3018

This was probably broken in diegomura#2600.
The Hyphenation algorithm should be able to leave soft hyphens in, to
indicate that a hyphen should be placed there if the word breaks there.
The line breaking algorithm needs to distinguish syllables which end with
a soft hyphen from syllables that do not, and only mark a syllable for
adding a hyphen in the former case.
For the line breaking algorithm, soft hyphens should be considered to
have a width of zero, since they are never printed directly (they can
only lead to an inserted hyphen if at the end of a line).

The font package was already doing this correctly, but the pdfkit
package considered the soft hyphen to be the same as a normal hyphen
with an advanceWidth of 333 in Helvetica. Without this change, in some
edge cases the pdfkit would break apart lines already broken apart by
the line breaking algorithm in textkit.

Added tests for both packages to make sure they remain compatible in the
future.
In the best fit line breaking algorithm, the width of the hyphen must
be taken into account, in case one is to be inserted at the end of the line.
This is the most readable change I was able to find to acheive the goal.
Maybe the bestFit algorithm could be optimized in the future, along with
writing extensive tests for corner cases.
Therefore, we remove all soft hyphens from the attributed string after
linebreaking is completed, and recalculate the glyphs afterwards.
This way, pdfkit never sees the soft hyphens, and does not mistake them
for normal hyphens.
Tests the functionality of custom word splitting functions
Copy link

changeset-bot bot commented Jul 14, 2025

⚠️ No Changeset found

Latest commit: a9cb9da

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@carlobeltrame carlobeltrame changed the title Fix hyphenation Fix and rework hyphenation Jul 14, 2025
@wojtekmaj
Copy link
Contributor

Insane work. Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants