Is it possible to determine the fonts Windows chooses for font-linking? - winapi

Suppose you have a string with text in two or more scripts. When you use a GDI function like TextOut, (modern versions of) Windows will do "font-linking". That is, GDI will draw what it can with your selected font and draw the rest in an appropriate font that it chooses automagically. For example, if part of your text is in English (using the Roman alphabet), and part of it is Chinese (using CJK characters), and you have Arial selected, the English portion will be drawn in Arial, and the Chinese portion will be drawn in another font that has the CJK glyphs.
My question is, is there a way to determine which fonts TextOut will choose (or did choose) for the font linking?
I have to draw some text with the low-level Uniscribe API, which doesn't do automatic font-linking. I've implemented my own font-linking, but sometimes my algorithm chooses a different font than TextOut does for the same text. I'm trying to understand the Windows algorithm better, but I'm not real good at identifying fonts on sight (especially in unfamiliar scripts).

The font is selected by a registry entry. It is well described in this article. Quoting the relevant part:
If font linking is enabled on your
device, you can examine the registry
by enumerating the subkeys of the
registry key at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\FontLink\SystemLink
to determine the mappings of linked
fonts to base fonts. You can add links
by using Regedit to create additional
subkeys.

Related

Is Windows font substitution serif aware?

Windows has font substitution logic - if you try to render a character which isn't in the currently selected font, Windows would quietly pull a glyph from another font where a glyph
for that character is present.
Imagine the current font is, for example, a serif one. When picking the source for substitution, will Windows prefer serif fonts to sans-serif ones and vice versa?
As far as I know Windows uses the PANOSE values of a font to find a suitable replacement. Those values categorise the font into descriptive values, and there are in fact multiple values to describe the serif style.
The problem is, that only font with PANOSE values can be replaced by fonts with PANOSE values.
So if the font you’re using doesn’t have PANOSE values, Windows can’t find a replacement. Also, if it does and there are no fonts with fitting PANOSE values in your collection, you will get bad substitutions.
However, the PANOSE system was established for font replacement for PostScript printers.
I’m not sure how other people do it but I don’t provide all the information to the PANOSE values in fonts I produce (unless its explicitly asked). I stick to familytype, weight and letterform (though I use this only to decide between upright or italic).

Find out if font has monospaced numbers

There are proportional fonts (i.e. not monospaced) that nevertheless provide monospaced numbers. E.g. see this Excel screenshot using Arial:
Note how the numbers are nicely aligned. How can I find out programmatically (probably WinAPI) if a font supports this feature?
You won't find an API for that because there isn't any specific metadata value within the font file to indicate "the glyphs for digits in this font have fixed width". Some fonts may support both proportional and fixed-width ("lining") digits, in which case the font is likely to support the 'lnum' OpenType Layout feature. You should pick a font that supports this feature and then explicitly activate that feature when drawing the text.

Direct2D: How to convert fallback to SystemLink mode?

I am now converting a project's render engine from GDI to D2D. The GDI use "CreateFontIndirect" to assign font size "-13", font family "Segeo UI". The D2D use "CreateTextFormat" to assign font size "13", font family "Segeo UI". The effect is shown as follow picture:
In GDI case, the system didn't find chinese character in "Segeo UI", it will find in regedit "SystemLink" to locate the chinese font, on my machine is "YaHei". But In D2D case, the system didn't find "YaHei", Which chinese font it will choose to draw, How does it work?
It works according to DirectWrite layout logic. See IDWriteTextLayout2::SetFontFallback(), you'll be able to provide your own fallback implementation, if default configuration is not satisfactory.
Basically, layout object will call your custom fallback methods to map characters to fonts, you can then detect which characters you want to map to which font, potentially reusing system fallback implementation for cases you don't care about.

How can I get the original font name of some text using PDFKit?

I wrote a script which parses information from PDF files and outputs it to HTML. It's written in Python, using pdfminer.
On some text segments, the font style can have semantic significance. For instance: bold, italic and color should trigger different behavior. Pdfminer provides scripts with the font name, but not the color, and it has a number of other issues; so I'm working on a Swift version of that program, using Apple's PDFKit, to extract the same features.
I now find that I have the opposite problem. While PDFKit makes it easy to retrieve color, retrieving the original font name seems to be non-obvious. PDFSelection objects have an attributedString property, but for fonts that are not installed on my computer, the NSFont object is Helvetica. Of course, the fonts in question are fairly expensive, and acquiring a copy just for this purpose would be poor form.
Short of dropping to CGPDFContentStream (which is way too big of a hammer for what I want to get), is there a way of getting the original font name? I know in advance what the fonts are going to be, can I use that to my advantage?
PDFKit seems to use the standard font lookup system and then falls back on some default, so this can be resolved by spoofing the font to ensure that PDFKit doesn't need to fall back. Inspecting the document, I was able to identify that it uses the following fonts (referenced with their PostScript name):
"NeoSansIntel"
"NeoSansIntelMedium"
"NeoSansIntel,Italic"
I used a free font creation utility to create dummy fonts with these PostScript names, and I added them to my app bundle. I then used CTFontManagerRegisterFontsForURLs to load these fonts (in the .process scope), and now PDFKit uses these fonts for attributed strings that need them.
Of course, the fonts are bogus and this is useless for rendering. However, it works perfectly for the purpose of identifying text that uses these font.

Display of Asian characters (with Unicode): Difference in character spacing when presented in a RichEdit control compared with using ExtTextOut

This picture illustrates my predicament:
All of the characters appear to be the same size, but the space between them is different when presented in a RichEdit control compared with when I use ExtTextOut.
I would like to present the characters the same as in the RichEdit control (ideally), in order to preserve wrap positions.
Can anyone tell me:
a) Which is the more correct representation?
b) Why the RichEdit control displays the text with no gaps between the Asian Characters?
c) Is there any way to make ExtTextOut reproduce the behaviour of the RichEdit control when drawing these characters?
d) Would this be any different if I was working on an Asian version of Windows?
Perhaps I'm being optimistic, but if anyone has any hints to offer, I'd be very interested to hear.
In case it helps:
Here's my text:
快的棕色狐狸跳在懶惰狗1 2 3 4 5 6 7 8 9 0
apologies to Asian readers, this is merely for testing our Unicode implemetation and I don't even know what language the characters are taken from, let alone whether they mean anything
In order to view the effect by pasting these characters into a RichEdit control (eg. Wordpad), you may find you have to swipe them and set the font to 'Arial'.
The rich text that I obtain is:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 Arial;}}{\colortbl ;\red0\green0\blue0;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\fs22\u24555?\u30340?\u26837?\u33394?\u29392?\u29432?\u36339?\u22312?\u25078?\u24816?\u29399?1 2 3 4 5 6 7 8 9 0\par\pard\'a3 $$ \'80\'80\cf1\lang2057\fs16\par}
It doesn't appear to contain a value for character 'pitch' which was my first thought.
I don't know the answer, but there are several things to suspect:
There are several versions of the rich edit control. Perhaps you're using an older one that doesn't have all the latest typographic improvements.
There are many styles and flags that affect the behavior of a rich editcontrol, so you might want to explore which ones are set and what they do. For example, look at EM_GETEDITSTYLE.
Many Asian fonts come in two versions on Windows. One is optimized for horizontal layout, and the other for vertical layout. That latter usually has the same name, but has # prepended to it. Perhaps you are using the wrong one in the rich edit control.
UPDATE: By messing around with Wordpad, I was able to reproduce the problem with the crowded text in the rich edit control.
Open a new document in Wordpad on Windows 7. Note that the selected font is Calibri.
Paste the sample text into the document.
Text appears correct, but Wordpad changed the font to SimSun.
Select the text and change the font back to Calibri or Arial.
The text will now be overcrowded, very similar to your example. Thus it appears the fundamental problem is with font linking and fallback. ExtTextOut is probably selecting an appropriate font for the script automatically. Your challenge is to figure out how to identify the right font for the script and set that font in the rich edit control.
This will only help with part of your problem, but there is a way to draw text to a DC that will look exactly the same as it does with RichEdit: what's called the windowless RichEdit control. It not exactly easy to use: I wrote a CodeProject article on it a few years back. I used this to solve the problem of a scrollable display of blocks of text, each one of which can be edited by clicking on it: the normal drawing is done with the windowless RichEdit, and the editing by showing a "real" RichEdit control on the top of it.
That would at least get you the text looking the same in both cases, though unfortunately both cases would show too little character spacing.
One further thought: if you could rely on Microsoft Office being installed, you could also try later versions of RichEdit that come with office. There's more about these on Murray Sargent's blog, as well as some interesting articles on font binding that might also help.
ExtTextOut allows you to specify the logical spacing between records. It has the parameter lpDx which is a const pointer to an array of values that indicate the distance between origins of adjacent character cells. The Microsoft API documentation notes that if you don't set it, then it sets it's own default spacing. I would have to say that's why ExtTextOut is working fine.
In particular, when you construct a EMR_EXTTEXTOUTW record in EMF, it populates an EMR_TEXT structure with this DX array - which looking at one of your comments, allowed the RichEdit to insert the EMF with the information contained in the record, whereby if you didn't set a font binding then the RTF record does some matching to work out what font to use.
In terms of the RichEdit control, the following article might be useful:
Use Font Binding in a Rich Edit Control
After character sets are assigned, Rich Edit scans the text around the
insertion point forward and backward to find the nearest fonts that
have been used for the character sets. If no font is found for a
character set, Rich Edit uses the font chosen by the client for that
character set. If the client hasn't specified a font for the character
set, Rich Edit uses the default font for that character set. If the
client wants some other font, the client can always change it, but
this approach will work most of the time. The current default font
choices are based on the following table. Note that the default fonts
are set per-process, and there are separate lists for UI usage and for
non-UI usage.
If you haven't set the characterset, then it further explains that it falls back to ANSI_CHARSET. However, it's most definitely a lot more complicated than that, as that blog article by Murray Sargent (a programmer at Microsoft) shows.

Resources