I'm enumerating Windows fonts like this:
LOGFONTW lf = {0};
lf.lfCharSet = DEFAULT_CHARSET;
lf.lfFaceName[0] = L'\0';
lf.lfPitchAndFamily = 0;
::EnumFontFamiliesEx(hdc, &lf,
reinterpret_cast<FONTENUMPROCW>(FontEnumCallback),
reinterpret_cast<LPARAM>(this), 0);
My callback function has this signature:
int CALLBACK FontEnumerator::FontEnumCallback(const ENUMLOGFONTEX *pelf,
const NEWTEXTMETRICEX *pMetrics,
DWORD font_type,
LPARAM context);
For TrueType fonts, I typically get each face name multiple times. For example, for multiple calls, I'll get pelf->elfFullName and pelf->elfLogFont.lfFaceName set as "Arial". Looking more closely at the other fields, I see that each call is for a different script. For example, on the first call pelf->elfScript will be "Western" and pelf->elfLogFont.lfCharSet will be the numeric equivalent of ANSI_CHARSET. On the second call, I get "Hebrew" and HEBREW_CHARSET. Third call "Arabic" and ARABIC_CHARSET. And so on. So far, so good.
But the font signature (pMetrics->ntmFontSig) field for all versions of Arial is identical. In fact, the font signature claims that all of these versions of Arial support Latin-1, Hebrew, Arabic, and others.
I know the character sets of the strings I'm trying to draw, so I'm trying to instantiate an appropriate font based on the font signatures. Because the font signatures always match, I always end up selecting the "Western" font, even when displaying Hebrew or Arabic text. I'm using low level Uniscribe APIs, so I don't get the benefit of Windows font linking, and yet my code seems to work.
Does lfCharSet actually carry any meaning or is it a legacy artifact? Should I just set lfCharSet to DEFAULT_CHARSET and stop worrying about all the script variations of each face?
For my purposes, I only care about TrueType and OpenType fonts.
I think I found the answer. Fonts that get enumerated multiple times are "big" fonts. Big fonts are single fonts that include glyphs for multiple scripts or code pages.
The Unicode portion of the FONTSIGNATURE (fsUsb) represents all the Unicode subranges that the font can handle. This is independent of the character set. If you use the wide character APIs, you can use all the included glyphs in the font, regardless of which character set was specified when you create the font.
The code page portion of the FONTSIGNATURE (fsCsb) represents the code pages that the font can handle. I believe this is only significant when the font is not a "big" font. In that case, the fsUsb masks will be all zeros, and the fsCsb will specify the appropriate character set(s). In those cases, it's important to get the lfCharSet correct in the LOGFONT.
When instantiating a "big" font and using the wide character APIs, it apparently doesn't matter which lfCharSet you specify.
Related
Windows has font substitution logic - if you try to render a character which isn't in the currently selected font, Windows would quietly pull a glyph from another font where a glyph
for that character is present.
Imagine the current font is, for example, a serif one. When picking the source for substitution, will Windows prefer serif fonts to sans-serif ones and vice versa?
As far as I know Windows uses the PANOSE values of a font to find a suitable replacement. Those values categorise the font into descriptive values, and there are in fact multiple values to describe the serif style.
The problem is, that only font with PANOSE values can be replaced by fonts with PANOSE values.
So if the font you’re using doesn’t have PANOSE values, Windows can’t find a replacement. Also, if it does and there are no fonts with fitting PANOSE values in your collection, you will get bad substitutions.
However, the PANOSE system was established for font replacement for PostScript printers.
I’m not sure how other people do it but I don’t provide all the information to the PANOSE values in fonts I produce (unless its explicitly asked). I stick to familytype, weight and letterform (though I use this only to decide between upright or italic).
I'm trying to use freetype to enumerate the glyphs (name and unicode) in a font file.
For getting the name, I'm using FT_Get_Glyph_Name.
But how can I get the glyph unicode value?
I'm a newbie to glyph and font.
The Unicode codepoint is not technically stored together with the glyph in the TrueType/OpenType font. One has to iterate the font cmap table in the font to get the mapping, which could also be a non-Unicode one and also multiple mappings pointing to the same glyph may exist. The good news is that FreeType provides facilities in the API to iterate the glyphs codepoints in the currently selected character map, which are very well documented. So, with code:
// Ensure an unicode characater map is loaded
FT_Select_Charmap(face, FT_ENCODING_UNICODE);
FT_ULong charcode;
FT_UInt gid;
charcode = FT_Get_First_Char(face, &gid);
while (gid != 0)
{
std::cout << std::format("Codepoint: {:x}, gid: {}", charcode, gid) << std::endl;
charcode = FT_Get_Next_Char(face, charcode, &gid);
}
With this information you can create a best effort map from glyphs to Unicode code points.
One would expect the FT_CharMap to hold this info:
[...] The currently active charmap is available as face->charmap.
but unfortunately it only defines the kind of encoding (Unicode, MacRoman, Shift-JIS etc.). Apparently the act of looking up a code is done elsewhere – and .notdef simply gets returned when that character is unavailable after all.
Looking in one of my own FreeType-based OpenType renderers which reports 'by name', where possible, I found in the initialization sequence some code that stores the name of a glyph if it has one, the Unicode else. But that code was based on the presence of glyph names.
Thinking further: you can test every possible Unicode codepoint and see if it returns 0 (.notdef) or a valid glyph index. So initialize an empty table for all possible glyphs and only fill in each one's Unicode if the following routine finds it.
For a moderately modern font you need only check up to Unicode U+FFFF; for something like a heavy Chinese font (up to U+2F9F4 for Heiti SC) or Emoji (up to U+1FA95 for Segoe UI Emoji) you need quite a larger array. (Getting that max number out of a font is an entirely different story, alas. Deciding what to do depends on what you want to use this for.)
printf ("num glyphs: %u\n", face->num_glyphs);
for (code=1; code<=0xFFFF; code++)
{
glyph_index = FT_Get_Char_Index(face, code);
/* 0 = .notdef */
if (glyph_index)
{
printf ("%d -> %04X\n", glyph_index, code);
}
}
This short C snippet prints out the translation table from font glyph index to a corresponding Unicode. Beware that (1) not all glyphs in a font need to have a Unicode associated with them. Some fonts have tons of 'extra' glyphs, to be used in OpenType substitutions (such as alternative designs and custom ligatures) or other uses (such as aforementioned Segoe UI Emoji; it contains color masks for all of its emoji). And (2) some glyphs may be associated with multiple Unicode characters. The glyph design for A, for example, can be used as both a Latin Capital Letter A and a Greek Capital Letter Alpha.
Not all glyphs in a font will necessarily have a Unicode code point. In OpenType text display, there is a m:n mapping that occurs between Unicode character sequences and glyph sequences. If you are interested in a relationship between Unicode code points and glyphs, the thing that makes most sense would be to use the mapping from Unicode code points to default glyph that is contained in a font's 'cmap' table.
For more background, see OpenType spec: Advanced Typographic Extensions - OpenType Layout.
As for glyph names, every glyph can have a name, regardless of whether it is mapped from a code point in the 'cmap' table or not. Glyph names are contained in the 'post' table. But not all fonts necessarily include glyph names. For example, a CJK font is unlikely to include glyph names.
How can I tell freetype to use a fallback font when a string does contain a character that is not present in the Font I'm using as a default?
I need to render non-latin glyphs correctly in my application.
Do I have to manage a fallback myself?
If so: how do I detect if there is a missing glyph in a given string?
I'm sorry, I don't know if you need to handle fallback yourself, but my guess would be that you do. As for how to detect if there is missing glyph, you could use this method: FT_Get_Char_index If it returns 0, it means symbol was not found.
The GNU Unifont can serve as a fallback font for every codepoint in the Basic Multilingual Plane (BMP), which would be 0x0000-0xFFFF. That should cover the vast majority of what you might encounter. Available for download here (archive).
The Unicode Last Resort font can serve as a final fallback for every codepoint in all of the planes. These glyphs only show broad categories. Available for download here.
It looks like you would have to detect the absence of a glyph with FT_Get_Char_Index() as SMart explained, and in those cases turn to Unifont or the Last Resort font.
I have the common Adobe Myriad Pro fonts installed. These include Myriad Pro Regular, Myriad Pro Bold and Myriad Pro Semibold. Assume that I have a CTFontRef baseFont that points to Myriad Pro Regular, and that the font size I desire is size. I run the following code:
CTFontRef boldFont = CTFontCreateCopyWithSymbolicTraits(baseFont, size, NULL, kCTFontBoldTrait, kCTFontBoldTrait);
The returned font is Myriad Pro Semibold, not Myriad Pro Bold.
Is there a way of coercing this to return Myriad Pro Bold instead, other than requesting the named style 'Bold'? I wanted to keep this code entirely generic without hard-wiring style names.
I have tried this in various permutations, including passing the bold trait as part of an attribute dictionary when I initially create my font, avoiding the two-step process described here, but it still returns the semibold font in preference to the normal bold. I've also poked around the fonts themselves a little. The full bold font has a weight of 700 in its <OS/2> table, and the semibold font has a weight of 600. The PANOSE weights correspond with this. However, the macStyle fields in the <head> table of the semibold and bold fonts both have the bold flag set, so presumably this is what Core Text is using. But is there any way to make it more discriminating?
Based on a reading of the documentation, backed up by some knowledge of font handling in general but not Core Text specifically, I'd say it may be possible, but it's not straightforward.
The CTFontCreateCopyWithSymbolicTraits() documentation specifies that the symTraitValue and symTraitMask parameters have type CTFontSymbolicTraits. The CTFontDescriptor() documentation defines that "Bold" value that you are using as
kCTFontBoldTrait = (1 << 1)
So this is clearly a boolean trait. However, as you've seen, font weight is a spectrum, not a boolean trait, even though decades of "bold" buttons in word processor UIs have presented it as a boolean trait. CTFontCreateCopyWithSymbolicTraits() doesn't have the expressive power you need.
One other approach which might work is to try calling CTFontDescriptorCreateMatchingFontDescriptors(). You pass this function a CTFontDescriptorRef to an initial font, and a CFSetRef with attributes which must be present. This function returns an array of font descriptors, all of which match the attributes you requested.
So, you could pass it a CTFontDescriptorRef for Myriad Pro Regular, and maybe a CFSetRef saying you want bold, and then look through every font descriptor in the returned array to find the one with the heaviest weight.
I haven't written this code, and my ignorance of Core Text means I may be missing something, but that seems like a plausible approach.
For the CTFontDescriptor you can specify an attribute kCTFontTraitsAttribute which should be an CFDictionaryRef where you can specify the kCTFontWeightTrait which takes a CFNumberRef that represents floating point between -1 and 1, giving you a spectrum of weights, 1 being the most bold variant, and 0 being the regular/medium.
Given an HFONT, how do I tell if it's a symbol font? A pdf library I'm using needs to treat symbol fonts differently, so I need a way to programatically tell if any given font is a symbol font or not.
Use GetObject to get the font's properties to a LOGFONT structure. Check the lfCharSet member; if it's SYMBOL_CHARSET, you have a symbol font.
Mark Ransom's answer is going to work 99.999% of the time, but there's a theoretical possibility that it could give the wrong answer.
To avoid this possibility, you should use GetTextMetrics to get the TEXTMETRICS of the actual font and check if the tmCharSet is SYMBOL_CHARSET.
What's the difference between checking lfCharSet and tmCharSet?
When you create an HFONT, Windows makes an internal copy of the LOGFONT. It describes the font you want, which could be different than the font you get.
When you select the HFONT into a device (or information) context, the font mapper finds the actual font that best matches the LOGFONT associated with that HFONT. The best match, however, might not be an exact match. So when you need to find out something about the actual font, you should take care to query the HDC rather than the HFONT.
If you query the HFONT with GetObject, you just get the original LOGFONT back. GetObject doesn't tell you anything about the actual font because it doesn't know what actual font the font mapper chose (or will choose).
APIs that ask about the font selected into a particular DC, like GetTextMetrics, GetTextFace, etc., will give you information about the actual font.
For this problem, Mark's answer (using GetObject) is probably always going to work, because the odds of the font mapper choosing a symbol font when you want a textual font (or vice versa) are minuscule. In general, though, when you want to know something about the actual font, find a way to ask the HDC.