How can I get the original font name of some text using PDFKit? - cocoa

I wrote a script which parses information from PDF files and outputs it to HTML. It's written in Python, using pdfminer.
On some text segments, the font style can have semantic significance. For instance: bold, italic and color should trigger different behavior. Pdfminer provides scripts with the font name, but not the color, and it has a number of other issues; so I'm working on a Swift version of that program, using Apple's PDFKit, to extract the same features.
I now find that I have the opposite problem. While PDFKit makes it easy to retrieve color, retrieving the original font name seems to be non-obvious. PDFSelection objects have an attributedString property, but for fonts that are not installed on my computer, the NSFont object is Helvetica. Of course, the fonts in question are fairly expensive, and acquiring a copy just for this purpose would be poor form.
Short of dropping to CGPDFContentStream (which is way too big of a hammer for what I want to get), is there a way of getting the original font name? I know in advance what the fonts are going to be, can I use that to my advantage?

PDFKit seems to use the standard font lookup system and then falls back on some default, so this can be resolved by spoofing the font to ensure that PDFKit doesn't need to fall back. Inspecting the document, I was able to identify that it uses the following fonts (referenced with their PostScript name):
"NeoSansIntel"
"NeoSansIntelMedium"
"NeoSansIntel,Italic"
I used a free font creation utility to create dummy fonts with these PostScript names, and I added them to my app bundle. I then used CTFontManagerRegisterFontsForURLs to load these fonts (in the .process scope), and now PDFKit uses these fonts for attributed strings that need them.
Of course, the fonts are bogus and this is useless for rendering. However, it works perfectly for the purpose of identifying text that uses these font.

Related

Ghostscript - Indentation of postscript code

Is there an option for to me to ask Ghostscript to indent the Postscript it creates?
Everything starts at the beginning of a line and I find it difficult to follow.
Alternatively, I am using Emacs and ps-mode.
If anyone know how to indent code in this mode I would appreciate a tip (apologize because this may not be relevant to this StackExchange)
No, there is no option for indenting the output.
PostScript is pretty much regarded as a write-only language anyway, and the output of ps2write (which is what I assume you are using though you don't say) is particularly difficult since it fundamentally outputs PDF syntax with a PostScript program on the front to parse it into PostScript operations.
Why do you want to read it ?
[EDIT]
You can always edit your question, you don't need to post a new answer.
I'm afraid what you want to do isn't as simple as you might think.
It might be possible for this use case if the PDF files you receive are always created the same way, but there are significant problems.
The font you use as a substitute for the missing font must be encoded the same way. Say for example the font in the PDF file is encoded so that 0x41 is 'A', you need to make sure that the replacement font is also encoded so that 0x41 is an 'A'. So just the findfont, scalefont, setfont sequence is not always going to be sufficient, sometimes you will need to re-encode the font.
CIDFonts will be a major stumbling block. Firstly because ps2write simply doesn't emit CIDFonts at all. These were not part of level 2 PostScript. As a result all text in a CIDFont will be embedded as bitmaps. If your original file doesn't contain the CIDFont then you'll get the fallback CIDFont bitmapped.
Secondly CIDFonts can use multiple-byte character codes, of variable length. You can't simply replace a CIDFont with a Font, it just won't work.
The best solution, obviously, is to have the PDF files created with the fonts required embedded. This is best practice. If you can't get that, then I'd suggest that rather than trying to hand edit PostScript, you use the fontmap.GS and cidfmap files which Ghostscript uses to find font.
Ghostscript already has a load of code to do font substitution automatically, using both Fonts and CIDFonts as substitutes, and it does all the hard work of re-encoding the fonts or building CMaps as required. If you are on Windows much of this may already be done for you, when you install Ghostscript it will ask if you want to create font mappings. If you said yes then it will
Add the font substitutions you want to use in those files (they have comments explaining the layout) and then use the pdfwrite device to make a new PDF file. Set EmbedAllFonts to true (you may need to add a AlwayEmbed font array as well, listing the fonts specifically) and SubsetFonts to false.
That should create a new PDF file where the missing fonts have been replaced by your defined substitutes, those substitutes will have been embedded in the new PDF file and they have will not been subset (Acrobat will generally refuse to edit text in a subset font).
The switches I mentioned above are standard Adobe Distiller parameters, but they are documented for pdfwrite here. There's some documentation on adding fonts here and here and specifically for CIDFonts here.
Basically I'd suggest you define your substitutions and let Ghostscript do the work for you.
This is not an answer to the problem but rather an answer to KenS's question about "Why do you want to read it?"
I tried to put it in the comment box but it was too long.
I am a retired engineer with a strong programming background.
I would like to read and understand the postscript code for the reason shown below.
I play duplicate bridge as a hobby. I recieve a PDF file of what is know as a convention card (a single page document of bridge agreements).
Frequently I would like to edit these files.
When I open with Adobe Illustrator I have to spend a significant amount of time replacing fonts that are not on my system with fonts that I do have.
I can take the PDF and export it as a postscript file using Ghostscript.
I was going to write a little program to replace the embedded fonts with the fonts that I use to replace them.
I was going to leave the postscript file unaltered and insert things like
/HelveticaMonospacedPro-RG findfont
12 scalefont setfont
just above where the text is written.
I was planning on using the fonts that I have on my system (e.g., HelveticaMonospacedPro-RG).

Replacing fonts in Powerpoint view does not replace font

I have a PowerPoint template. When this template was passed off It included some special fonts that I needed to remove because it was throwing warnings when users opened them up.
When I use the "replace fonts" feature it does not remove the font. I deal a lot with the XML properties of these templates because some of the content is generated dynamically when a report is run. I can still see in the slides the font is present
<a:buFont typeface="Poppins"/> the other is <a:buFont typeface="Noto Sans Symbols"/>
Which both appear to be bullet list fonts? There are no lists in the view though...
Removing it from the XML itself is not an option because when I update the template again it will override that and given that doesn't happen often I will have forgotten all about this. I need to fix this in the template so I can then export it out.
I have edited all the text I can see to either Ariel or Calibri but this Poppins font is still in there and I have no idea how to get it out.
Specifics are
Powerpoint version is 16.36
The program is actually Powerpoint for Mac (if that matters)
If anyone solved a similar issue and can give me some direction it would be much appreciated.
The buFont tag means that font is being used for a bullet rather than actual text. Probably a text level somewhere uses a custom bullet specced with this font. Each content or text placeholder can have up to 9 text levels, you may hove to create 9 levels using Home>Indent More to find the right one.
Start with the Slide Master (View>Slide Master>the larger thumbnail at the top). Then check each placeholder on each Layout (smaller thumbnails below the Master). Finally, check each multilevel placeholder on each slide, in case this was added with local formatting.
My go-to technique is to unzip the presentation into the XML files and do a find and replace on them. That's the quickest way to replace fonts, which can be tucked away in all kinds of obscure places in a presentation. On a Mac, this takes a bit of preparation to avoid problems caused by the OS. If you regularly create PowerPoint files, it may be worth it to set this up. Here's my article on this: OOXML Hacking: Editing in macOS. Look for the part about using a USB or network drive that is set to not create hidden .DS_Store files. Then use a text editor like BBEdit to do multi-file find and replace operations on the font name.
I have PowerPoint 16.39 on my MacBook Pro. Try to click on PowerPoint in the upper left. Then Preferences, then the Save icon. At the bottom you'll have Font Embedding. If you un-check this option, it should not save fonts to the template anymore.

transform a svg into a copy/paste-able emoticon like this one 🦄

Not sure if SO is the best place for this question, but don't know where else to ask.
Is there any way to transform a svg like this one for ex: (https://svgsilh.com/image/1775543.html) into something that i can use inside an editor with copy/paste like this one? 🦄
No, because the unicorn emoticon is one example of a character. And just as with letters, digits, and punctuation, the appearance of emoticons and other plain-text symbols is decided by fonts.
LSerni wrote the following:
The reason you can "copy and paste" that icon is that the icon already has a UTF-8 code and your editor is UTF-8 aware. And this is why the same emoticon is slightly different between Apple, Android and so on: it's because it's always code XYZ, but code XYZ is rendered with different icons on different platforms.
But that's not entirely correct. The difference in rendering lies more in the font than in the operating system that displays emoticons. Unless the font supplies its own version of a symbol, that symbol will usually be supplied by the font specified by default by the operating system, and different operating systems supply different symbol fonts.

Ghostscript - can we substitute to ignore embedded fonts in PS?

I am trying to convert a Postscript file to PDF. The PS file has an embedded font that I want to ignore and substitute with a local system font. This is because the font is OCR based and it makes more sense to read the character strings in this case.
I set up a Fontmap file but it only works when I delete the font data from the PS file, so that the font is actually missing. Is there a way to do this without modifying the PS file?
There is no switch or command to do this for the very good reason that it would break conformance with the specification. If you embed a font in a PostScript program that font will be used in preference to any other font.
This allows you (for example) to use specific versions of a font by embedding them, rather than relying on the font present in the interpreter which may be different.
However, because PostScript is a programming language, you could redefine the 'definefont' operator so that it examined the dictionary operand for the FontName, before defining the font, and if it is the font you want to ignore you could fail to define it. You would then go through the missing font machinery which would find your substitute.

How to set g:text style to bold font in a Windows Gadget?

I'm developing a Vista/Win7 Desktop Gadget that uses a translucent g:background (doc) area with g:text (doc) on top. I'm adding the text via addTextObject (doc), and this all works as expected.
However, I can't figure out how to set that text to bold style. There doesn't seem to be a way to do this directly via the exposed properties that I can see, and I can't use regular text + CSS in this case due to the fact this text is placed onto a g:background object.
I have also tried specifying a bold font directly, such as Arial Bold (doesn't work) instead of Arial (works).
So how can this be done?
Edit: I have tried setting font-weight:bold for both the body and the g:background object that parents my text; no luck.
See Flip Calendar, by Jonathan Abbott. His code is usually well commented so maybe you can get some ideas from that.
EDIT
The source of my information was from the early days of Vista Beta 2 where that was the official word from MS. I also found the following response to a thread on the MSDN forums regarding the Flip Calendar gadget itself:
http://social.msdn.microsoft.com/Forums/en-US/sidebargadfetdevelopment/thread/841e9d5e-32e9-453f-bd0e-dc5a4e607c33/
The gadget has options for setting bold font on the day of the month (a g:text object) but on closer inspection it doesn't work. Sorry about that. The MS guys have been known to be wrong as well on one or more occasions. I can honestly say that I don't use the g:text object.
This means your only (well, non activex route) option is VML text, which provides a lot of flexibility on layout. However, you will have to place it on a fully opaque area of the gadget which is probably why you wanted to use the addTextObject in the first place. Gary Beene's site really helped me out when I was getting started, but it doesn't go into any detail on the v:textbox element and the v:textpath element, though the MSDN documentation goes into enough detail on these.
If you need to place the text on a non-fully opaque area of the gadget, then you could still go the VML route and place an image behind the text that acts as a shadow, starting out fully opaque and fading to fully transparent. This is how Microsoft does text in window title bars with aero enabled.
Alternatively, you could create an ActiveXObject that draws the text you need in the font you want and saves the image to a temporary file in the gadget folder. Then you set that to the src of an addImageObject. I've done something similar in a gadget and it's fast enough not to be noticeable. You can also set min/max dimensions so shrinking/stretching to fit becomes a breeze.

Resources