RFC for strategy on font management #5043

imagico · 2024-11-11T15:58:34Z

Since #4606 we have been including a script to download the fonts we use and this way implicitly tell style users they can rely on that taking care of the fonts. Unfortunately, upstream sources of fonts are volatile and require maintainance of the download locations as well as font file naming (#5013, #4956). Our current font download script design is not very good for managing that.

This is a proposal how a more sustainable approach could look like. This is not meant to rule out smaller changes to the existing setup to fix acute issues but to provide a discussion basis for a more long term strategy how to develop this further.

Proposal

The main ideas are:

separating the font data from the script functionality
doing the scripting in python (to match our other scripts) (done by Replace current font download with Python script #5052)
storing the data on fonts and their download locations in a YAML file (equally matching our approach in other cases)
having fallback locations for downloading fonts in case the primary location becomes unavailable. Fallbacks could contain older versions of the fonts if necessary. This would substantially increase reliability of the download process. (functionality is offered by Replace current font download with Python script #5052, no actual fallback locations configured yet)
having the script download the fonts as well as generating fonts.mss (where the order of entries is crucial - see Use local copy of fonts #4606) - this way we would have a single location where to make changes to the fonts rather than two (fonts.mss and get-fonts.sh) in different languages.
having the script by default update existing font files, but overwriting them only after a successful download. (this is what Replace current font download with Python script #5052 does now)

This is just a rough sketch of how this could look like, the details would need to be worked out of course. Discussion of those as well as of the proposal is welcome.

The text was updated successfully, but these errors were encountered:

dch0ph · 2024-11-13T06:45:54Z

Looks a very sensible overall strategy to me.

Am I right in thinking that #4893 would address the acute problem in the meantime?

imagico · 2024-11-13T17:26:00Z

Am I right in thinking that #4893 would address the acute problem in the meantime?

That probably depends on what you identify as the acute problem.

But yes, sourcing the emoji font from archive.org is probably a reasonably stable solution. #4893 mainly needs a thorough review.

imagico · 2025-01-16T20:57:06Z

I turned the suggestions into a todo list and marked those implemented by #5052.

Note apart from these principal ideas there is also still the concrete need to source the Noto fonts from an up-to-date source - see #4893.

mapmeld · 2025-01-16T21:13:44Z

This is great! Glad that this is more achievable with the new Python script.

Here's where I'm at:

I may replace Use notofonts.github.io for newer font releases #4893 with a new PR tomorrow -- I'd like to demo support for the newer Noto download links, by fixing my original issue with Arabic like this: 40542b8b20d
I wrote an OSM diary post about reviewing Unicode block usage across Africa and Asia. It looks like we have great language/script coverage with the exception of Glagolitic in Croatian public art and general misuse (such as Ⰹ to represent public binoculars). So I was about to suggest adding Glagolitic. One comment from Croatia said this is very old and rare
The YAML change does seem more stable than having lists of fonts and lists of exceptions inside of the Python code. I'll have to think on that one

imagico · 2025-01-16T21:41:59Z

Probably should mention the other strategic issue w.r.t. fonts we have - the need to choose fonts based on language because different character designs being mandated by different languages despite using the same unicode code points - see #2208. This is not directly related to the topics here - but relevant regarding the question if our text rendering sufficiently covers all writings used in OSM world wide.

mapmeld · 2025-01-25T18:55:44Z

I did a mini investigation of #2208. The difficulty is finding CartoCSS which could identify a country, city names within a country, or interpret from text, because CartoCSS doesn't have spatial filters.

I'd encourage exploring spatial filters because even a simple bounding box could incrementally improve OSM over a large area.
Taiwanese cities and towns have is_in:zh tags which have been there for >10 years. The value always starts with 台灣. I was having difficulty running Kosmtik. The filter we want is something like [is_in:zh >= '台灣'][is_in:zh <= '台灣鿿'] { ... } but I didn't remember or land on the correct syntax.
If we had access to WikiData beyond the ID, for example https://www.wikidata.org/wiki/Q167061 has a property located in the administrative territorial entity linking it to Hong Kong. Without tagging every place on OSM, maybe a script could update the local Postgres database or generate a very long CartoCSS filter [wikidata=_], [wikidata=_], .... There doesn't seem to be any grouping or prefixing where the ID alone tells us it's in HK.
If the default CJK font for placenames were changed, we could identify Japanese cities by hiragana, katakana, or wikipedia=ja:__ tags, but this would be detrimental to neighborhoods and other smaller locations

mapmeld · 2025-01-27T04:20:19Z

Re: the YAML for fonts, are you picturing a schema like this?

fonts:
  NotoSans:
    regular:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf
      - [theoretical backup url]
    bold:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Bold.ttf
    italic:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Italic.ttf
  NotoSansMyanmarUI:
    regular:
      - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSansMyanmarUI/NotoSansMyanmarUI-Regular.ttf

or more like the CartoCSS

book-fonts:
  NotoSans-Regular:
  - https://github.com/notofonts/noto-fonts/raw/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf
  - [theoretical backup url]
  NotoSansMyanmarUI-Regular:
  - [url]
bold-fonts:
  NotoSans-Bold:
  - [...]

imagico · 2025-01-27T10:50:21Z

The ease of maintenance should be the primary concern here. The main maintenance tasks that need to be considered are probably:

adding new fonts to use
removing fonts no more to be used
changing the order of priority of fonts
changing download sources of (potentially a larger number of) fonts
changing the naming of fonts

In terms of future feature additions there is in particular the potential need to

differentiate the font list based on language of labels

With all of that in mind i have doubts that having one big tree structure in the YAML file is the best approach. But this is just a gut feeling, i have not thought it through. I'd probably start with the data structures you have so far in python and think how this can be improved to be more maintenance friendly.

Note you definitely don't need to have all URLs encoded literally. You can use format strings to define the form of URLs where it follows a well defined principle. The only thing you cannot do with the font definitions in YAML is having code (like with if () ... else ...) define the form of the URL.

mapmeld · 2025-01-27T16:40:13Z

I agree that's too many URLs. I want to narrow down what works in the middle between every URL and the two format strings proposed in #5053.

If I specify a URL for each language/font such as https://notofonts.github.io/arabic/fonts/NotoSansArabic/hinted/ttf/ that would capture the exception URLs which were in the code for #4893. A directory is fine because all of the files are {fontName}-{modifier}.ttf. Also CJK would be unchanged for now.

dch0ph · 2025-01-27T19:39:42Z

I'd encourage exploring spatial filters because even a simple bounding box could incrementally improve OSM over a large area.

That has been considered in a few contexts e.g. here and, in a related discussion on multiple languages in labels here.

It's not clear to me whether the move to the flex osm2pgsql helps at all here. You could imagine creating a table of geographic zones e.g. from suitably tagged admin boundaries. I think the challenge is how you maintain the tables and any derived "helper tagging" in the face of updates. You might need a "coastline" approach, where the tables/indices for "what language is the point in" are only updated infrequently. The display_name could be a generated column perhaps?

mapmeld · 2025-01-27T21:23:32Z

Here's my YAML concept:
mapmeld@a0a54687

Here the key is the font name (NotoSansArmenian, NotoSerifTibetan), it can have non-Regular variations (Bold), and support for multiple URLs. Then the script writes it into the MSS. I included the changes to MSS in the commit so you can see the result is about the same, except order of CJK. Small issue here is assumptions about relative paths to fonts.yaml and fonts.mss

mapmeld · 2025-02-14T17:15:41Z

Some good news re Han Unification: Mapnik merged a PR to accept a lang attribute on a TextSymbolizer which gets passed to HarfBuzz. We also noted that older Noto Chinese fonts (as included in Mapnik tests) can already be used to restrict output to Simplified or Traditional Chinese variants. That gives two options to render Chinese characters.

The real question is how we select which features get these labels. @dch0ph's links show that OSM hasn't settled on a tag, and I agree that the coastline / generated column script is the most realistic one for querying a geo area. I tested some queries in Hong Kong (https://github.com/mapmeld/osm2pgsql-cjk?tab=readme-ov-file#option-2-bounding-box-and-filters). On the cautious approach, we could select place=* features and points where the name exclusively matches name:zh-Hans xor name:zh-Hant. On the accelerated approach, we could select ~90% of Hong Kong, excepting labels which are fully Latin-1 or better match a set name:zh-Hans. I would also support including a region in Taiwan and one in mainland China when piloting this.

imagico · 2025-02-14T17:47:24Z

I think matters specifically related to the Han unification problem (with CJK or other languages likewise) should be discussed in #2208.

If or not we need separate fonts for the different language versions is of course relevant here. But please keep in mind that Mapnik support alone is not enough, we also need Carto support.

imagico mentioned this issue Nov 11, 2024

get-fonts fails, due to fonts.google.com not returning a zip. #5013

Open

tpetillon mentioned this issue Nov 17, 2024

Adapt Docker setup to latest changes #5045

Open

imagico mentioned this issue Jan 16, 2025

Replace current font download with Python script #5052

Merged

imagico mentioned this issue Jan 19, 2025

Source Arabic font from CDN, newer repo #5053

Open

mapmeld mentioned this issue Feb 6, 2025

Add data for a CJK test mapnik/test-data-visual#86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC for strategy on font management #5043

RFC for strategy on font management #5043

imagico commented Nov 11, 2024 •

edited

Loading

dch0ph commented Nov 13, 2024

imagico commented Nov 13, 2024

imagico commented Jan 16, 2025

mapmeld commented Jan 16, 2025

imagico commented Jan 16, 2025

mapmeld commented Jan 25, 2025

mapmeld commented Jan 27, 2025

imagico commented Jan 27, 2025

mapmeld commented Jan 27, 2025

dch0ph commented Jan 27, 2025

mapmeld commented Jan 27, 2025

mapmeld commented Feb 14, 2025 •

edited

Loading

imagico commented Feb 14, 2025

RFC for strategy on font management #5043

RFC for strategy on font management #5043

Comments

imagico commented Nov 11, 2024 • edited Loading

Proposal

dch0ph commented Nov 13, 2024

imagico commented Nov 13, 2024

imagico commented Jan 16, 2025

mapmeld commented Jan 16, 2025

imagico commented Jan 16, 2025

mapmeld commented Jan 25, 2025

mapmeld commented Jan 27, 2025

imagico commented Jan 27, 2025

mapmeld commented Jan 27, 2025

dch0ph commented Jan 27, 2025

mapmeld commented Jan 27, 2025

mapmeld commented Feb 14, 2025 • edited Loading

imagico commented Feb 14, 2025

imagico commented Nov 11, 2024 •

edited

Loading

mapmeld commented Feb 14, 2025 •

edited

Loading