Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for strategy on font management #5043

Open
2 of 6 tasks
imagico opened this issue Nov 11, 2024 · 13 comments
Open
2 of 6 tasks

RFC for strategy on font management #5043

imagico opened this issue Nov 11, 2024 · 13 comments

Comments

@imagico
Copy link
Collaborator

imagico commented Nov 11, 2024

Since #4606 we have been including a script to download the fonts we use and this way implicitly tell style users they can rely on that taking care of the fonts. Unfortunately, upstream sources of fonts are volatile and require maintainance of the download locations as well as font file naming (#5013, #4956). Our current font download script design is not very good for managing that.

This is a proposal how a more sustainable approach could look like. This is not meant to rule out smaller changes to the existing setup to fix acute issues but to provide a discussion basis for a more long term strategy how to develop this further.

Proposal

The main ideas are:

  • separating the font data from the script functionality
  • doing the scripting in python (to match our other scripts) (done by Replace current font download with Python script #5052)
  • storing the data on fonts and their download locations in a YAML file (equally matching our approach in other cases)
  • having fallback locations for downloading fonts in case the primary location becomes unavailable. Fallbacks could contain older versions of the fonts if necessary. This would substantially increase reliability of the download process. (functionality is offered by Replace current font download with Python script #5052, no actual fallback locations configured yet)
  • having the script download the fonts as well as generating fonts.mss (where the order of entries is crucial - see Use local copy of fonts #4606) - this way we would have a single location where to make changes to the fonts rather than two (fonts.mss and get-fonts.sh) in different languages.
  • having the script by default update existing font files, but overwriting them only after a successful download. (this is what Replace current font download with Python script #5052 does now)

This is just a rough sketch of how this could look like, the details would need to be worked out of course. Discussion of those as well as of the proposal is welcome.

@dch0ph
Copy link
Contributor

dch0ph commented Nov 13, 2024

Looks a very sensible overall strategy to me.

Am I right in thinking that #4893 would address the acute problem in the meantime?

@imagico
Copy link
Collaborator Author

imagico commented Nov 13, 2024

Am I right in thinking that #4893 would address the acute problem in the meantime?

That probably depends on what you identify as the acute problem.

But yes, sourcing the emoji font from archive.org is probably a reasonably stable solution. #4893 mainly needs a thorough review.

@imagico
Copy link
Collaborator Author

imagico commented Jan 16, 2025

I turned the suggestions into a todo list and marked those implemented by #5052.

Note apart from these principal ideas there is also still the concrete need to source the Noto fonts from an up-to-date source - see #4893.

@mapmeld
Copy link
Contributor

mapmeld commented Jan 16, 2025

This is great! Glad that this is more achievable with the new Python script.

Here's where I'm at:

  • I may replace Use notofonts.github.io for newer font releases #4893 with a new PR tomorrow -- I'd like to demo support for the newer Noto download links, by fixing my original issue with Arabic like this: 40542b8b20d
  • I wrote an OSM diary post about reviewing Unicode block usage across Africa and Asia. It looks like we have great language/script coverage with the exception of Glagolitic in Croatian public art and general misuse (such as Ⰹ to represent public binoculars). So I was about to suggest adding Glagolitic. One comment from Croatia said this is very old and rare
  • The YAML change does seem more stable than having lists of fonts and lists of exceptions inside of the Python code. I'll have to think on that one

@imagico
Copy link
Collaborator Author

imagico commented Jan 16, 2025

Probably should mention the other strategic issue w.r.t. fonts we have - the need to choose fonts based on language because different character designs being mandated by different languages despite using the same unicode code points - see #2208. This is not directly related to the topics here - but relevant regarding the question if our text rendering sufficiently covers all writings used in OSM world wide.

@mapmeld
Copy link
Contributor

mapmeld commented Jan 25, 2025

I did a mini investigation of #2208. The difficulty is finding CartoCSS which could identify a country, city names within a country, or interpret from text, because CartoCSS doesn't have spatial filters.

  • I'd encourage exploring spatial filters because even a simple bounding box could incrementally improve OSM over a large area.
  • Taiwanese cities and towns have is_in:zh tags which have been there for >10 years. The value always starts with 台灣. I was having difficulty running Kosmtik. The filter we want is something like [is_in:zh >= '台灣'][is_in:zh <= '台灣鿿'] { ... } but I didn't remember or land on the correct syntax.
  • If we had access to WikiData beyond the ID, for example https://www.wikidata.org/wiki/Q167061 has a property located in the administrative territorial entity linking it to Hong Kong. Without tagging every place on OSM, maybe a script could update the local Postgres database or generate a very long CartoCSS filter [wikidata=_], [wikidata=_], .... There doesn't seem to be any grouping or prefixing where the ID alone tells us it's in HK.
  • If the default CJK font for placenames were changed, we could identify Japanese cities by hiragana, katakana, or wikipedia=ja:__ tags, but this would be detrimental to neighborhoods and other smaller locations

@mapmeld
Copy link
Contributor

mapmeld commented Jan 27, 2025

Re: the YAML for fonts, are you picturing a schema like this?

fonts:
  NotoSans:
    regular:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf
      - [theoretical backup url]
    bold:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Bold.ttf
    italic:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Italic.ttf
  NotoSansMyanmarUI:
    regular:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSansMyanmarUI/NotoSansMyanmarUI-Regular.ttf

or more like the CartoCSS

book-fonts:
  NotoSans-Regular:
  - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf
  - [theoretical backup url]
  NotoSansMyanmarUI-Regular:
  - [url]
bold-fonts:
  NotoSans-Bold:
  - [...]

@imagico
Copy link
Collaborator Author

imagico commented Jan 27, 2025

The ease of maintenance should be the primary concern here. The main maintenance tasks that need to be considered are probably:

  • adding new fonts to use
  • removing fonts no more to be used
  • changing the order of priority of fonts
  • changing download sources of (potentially a larger number of) fonts
  • changing the naming of fonts

In terms of future feature additions there is in particular the potential need to

  • differentiate the font list based on language of labels

With all of that in mind i have doubts that having one big tree structure in the YAML file is the best approach. But this is just a gut feeling, i have not thought it through. I'd probably start with the data structures you have so far in python and think how this can be improved to be more maintenance friendly.

Note you definitely don't need to have all URLs encoded literally. You can use format strings to define the form of URLs where it follows a well defined principle. The only thing you cannot do with the font definitions in YAML is having code (like with if () ... else ...) define the form of the URL.

@mapmeld
Copy link
Contributor

mapmeld commented Jan 27, 2025

I agree that's too many URLs. I want to narrow down what works in the middle between every URL and the two format strings proposed in #5053.

If I specify a URL for each language/font such as https://notofonts.github.io/arabic/fonts/NotoSansArabic/hinted/ttf/ that would capture the exception URLs which were in the code for #4893. A directory is fine because all of the files are {fontName}-{modifier}.ttf. Also CJK would be unchanged for now.

@dch0ph
Copy link
Contributor

dch0ph commented Jan 27, 2025

  • I'd encourage exploring spatial filters because even a simple bounding box could incrementally improve OSM over a large area.

That has been considered in a few contexts e.g. here and, in a related discussion on multiple languages in labels here.

It's not clear to me whether the move to the flex osm2pgsql helps at all here. You could imagine creating a table of geographic zones e.g. from suitably tagged admin boundaries. I think the challenge is how you maintain the tables and any derived "helper tagging" in the face of updates. You might need a "coastline" approach, where the tables/indices for "what language is the point in" are only updated infrequently. The display_name could be a generated column perhaps?

@mapmeld
Copy link
Contributor

mapmeld commented Jan 27, 2025

Here's my YAML concept:
mapmeld@a0a54687

Here the key is the font name (NotoSansArmenian, NotoSerifTibetan), it can have non-Regular variations (Bold), and support for multiple URLs. Then the script writes it into the MSS. I included the changes to MSS in the commit so you can see the result is about the same, except order of CJK. Small issue here is assumptions about relative paths to fonts.yaml and fonts.mss

@mapmeld
Copy link
Contributor

mapmeld commented Feb 14, 2025

Some good news re Han Unification: Mapnik merged a PR to accept a lang attribute on a TextSymbolizer which gets passed to HarfBuzz. We also noted that older Noto Chinese fonts (as included in Mapnik tests) can already be used to restrict output to Simplified or Traditional Chinese variants. That gives two options to render Chinese characters.

The real question is how we select which features get these labels. @dch0ph's links show that OSM hasn't settled on a tag, and I agree that the coastline / generated column script is the most realistic one for querying a geo area. I tested some queries in Hong Kong (https://github.com/mapmeld/osm2pgsql-cjk?tab=readme-ov-file#option-2-bounding-box-and-filters). On the cautious approach, we could select place=* features and points where the name exclusively matches name:zh-Hans xor name:zh-Hant. On the accelerated approach, we could select ~90% of Hong Kong, excepting labels which are fully Latin-1 or better match a set name:zh-Hans. I would also support including a region in Taiwan and one in mainland China when piloting this.

@imagico
Copy link
Collaborator Author

imagico commented Feb 14, 2025

I think matters specifically related to the Han unification problem (with CJK or other languages likewise) should be discussed in #2208.

If or not we need separate fonts for the different language versions is of course relevant here. But please keep in mind that Mapnik support alone is not enough, we also need Carto support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants