Does anyone knows how to extract the characters image from a font(ttf) file?
TTF is a vector format, so there are no characters shapes, really. Load the font, select it into a device context (a memory one), render a character, grab a bitmap.
Relevant APIs: AddFontResource, CreateFont, CreateDC, CreateBitmap, SelectObject, TextOut (or DrawText).
You can use GetGlyphOutline with GGO_BEZIER to get the shape of a single character.
For the sake of completeness I'd like to add a GUI and Python way to this pretty old thread.
If the goal is to extract images (as e.g. png) from a .ttf file I found two pretty straight forward ways which both involve the open-source program fontforge (Link to their website):
GUI Way (Suitable for extracting a handful of characters): Open the .ttf file in fontforge click on the character you want to export. Then: file -> export -> format:png
CLI / Python Way (Suitable for automation): FontForge has a cli api for python 2.7 which allows to automate the extraction of the images. Refer to this superuser thread for a complete script.
Link 1: https://fontforge.org/en-US/
Link 2: https://superuser.com/questions/1337567/how-do-i-convert-a-ttf-into-individual-png-character-images
Related
As in the title - imagine there is some Gimp .xcf file containing many layers. Part of these layers contain text. Is there any format I can export .xcf file to, that it somehow preserve 'human readable' text ?
The final goal is to process that text and put it again into the file, I am aware that this sounds unusual but maybe some of you have an idea how to achieve scenario like that.
I did some research and I saw I can export image to .psd format and then using NPM package process that image and extract text. This is just partially solves the problem, because I will not know how to put the processed text back into this .psd file (unless I decompile this NPM package and try to write some implementation myself...)
Any solutions and alternatives higly appreciated
You can script Gimp (using Scheme or Python). Technically you cannot change the text in a layer (there is no API for that), but you can recover the characteristics of a text layer (original text, font type, font size...) and recreate a new layer with a new text. Here is some Python code to recover the text information:
def text_info(img,layer):
parasites=None
try:
parasites=layer.parasite_list()
except Exception as e:
pass;
if parasites and 'gimp-text-layer' in parasites:
data=layer.parasite_find('gimp-text-layer').data
pdb.gimp_message('Text layer "%s": %s' % (layer.name,data))
else:
pdb.gimp_message('No text information found for layer "%s"' % layer.name)
(this information is only present of the file has been saved, it is not available on a newly created layer, but this shouldn't bea problem in your case)
Of course if the text is in a plain bitmap layer of its own this cannot be done, you have to guess the font type & size (but sometimes the code above can still recover the text information)
But if your XCF has a simple structure, it can be a lot simpler to decompose it into individual images, and build a new image with ImageMagick, using some of these layers plus new text images (or directly rendered text).
I'm trying to convert my existing asciidoc documentation into pdf. Asciidoctor-pdf seems quite easy and I'm able to convert single files into pdf.
asciidoctor-pdf -a pdf-theme='./theme/styles.yml' -a pdf-fontsdir='GEM_FONTS_DIR, theme/fonts/' 01-intro.adoc
But my docs are spread across many files. I want do create a single pdf from all those files. Does anyone know how to do this?
Secondly I don't want the generated pdf to be located next du the adoc file. I want to specify a target path.
I'd appreciate every hint. Thanks and best regards. Sebastian
(Dec 26, 2021)
The easiest and most convenient way is to use the VSCode editor with the AsciiDoc extension installed. This extension is developed by the same team that develops the AsciiDoctor text processor. This is a GUI-based approach to solve all your problems so I'm pretty sure u're gonna love it.
(Step 1) After the extension is installed, use the keyboard shortcut Cmd + , to go to the settings and then enter asciidoc.use_asciidoctorpdf in the search bar and tick the check box (see the demonstration below)
(Step 2) To create a single pdf file from multiple .adoc files, just simply put all of them in a single .adoc file with include::directory-to-the-adoc-file.adoc[] (see the illustration below)
(Step 3) Press F1, then type in as pdf and hit Enter to export this single .adoc file as a single PDF file, this will allow u to specify the target export directory for the PDF. Please be patient and wait for a few seconds for the export to complete, the editor will immediately inform u as soon as the export is complete (see the image at the bottom)
Have you considered to work with includes?
Just add to your document "01-intro.adoc" an any position this line:
include::02-next-file.adoc[]
When you build the 01-intro.adoc with your regular command, the contents of 02-next-file.adoc will be put to the position of the include line. Using this method we create a file with many includes and just build that file. We're very happy with that.
Is there an option for to me to ask Ghostscript to indent the Postscript it creates?
Everything starts at the beginning of a line and I find it difficult to follow.
Alternatively, I am using Emacs and ps-mode.
If anyone know how to indent code in this mode I would appreciate a tip (apologize because this may not be relevant to this StackExchange)
No, there is no option for indenting the output.
PostScript is pretty much regarded as a write-only language anyway, and the output of ps2write (which is what I assume you are using though you don't say) is particularly difficult since it fundamentally outputs PDF syntax with a PostScript program on the front to parse it into PostScript operations.
Why do you want to read it ?
[EDIT]
You can always edit your question, you don't need to post a new answer.
I'm afraid what you want to do isn't as simple as you might think.
It might be possible for this use case if the PDF files you receive are always created the same way, but there are significant problems.
The font you use as a substitute for the missing font must be encoded the same way. Say for example the font in the PDF file is encoded so that 0x41 is 'A', you need to make sure that the replacement font is also encoded so that 0x41 is an 'A'. So just the findfont, scalefont, setfont sequence is not always going to be sufficient, sometimes you will need to re-encode the font.
CIDFonts will be a major stumbling block. Firstly because ps2write simply doesn't emit CIDFonts at all. These were not part of level 2 PostScript. As a result all text in a CIDFont will be embedded as bitmaps. If your original file doesn't contain the CIDFont then you'll get the fallback CIDFont bitmapped.
Secondly CIDFonts can use multiple-byte character codes, of variable length. You can't simply replace a CIDFont with a Font, it just won't work.
The best solution, obviously, is to have the PDF files created with the fonts required embedded. This is best practice. If you can't get that, then I'd suggest that rather than trying to hand edit PostScript, you use the fontmap.GS and cidfmap files which Ghostscript uses to find font.
Ghostscript already has a load of code to do font substitution automatically, using both Fonts and CIDFonts as substitutes, and it does all the hard work of re-encoding the fonts or building CMaps as required. If you are on Windows much of this may already be done for you, when you install Ghostscript it will ask if you want to create font mappings. If you said yes then it will
Add the font substitutions you want to use in those files (they have comments explaining the layout) and then use the pdfwrite device to make a new PDF file. Set EmbedAllFonts to true (you may need to add a AlwayEmbed font array as well, listing the fonts specifically) and SubsetFonts to false.
That should create a new PDF file where the missing fonts have been replaced by your defined substitutes, those substitutes will have been embedded in the new PDF file and they have will not been subset (Acrobat will generally refuse to edit text in a subset font).
The switches I mentioned above are standard Adobe Distiller parameters, but they are documented for pdfwrite here. There's some documentation on adding fonts here and here and specifically for CIDFonts here.
Basically I'd suggest you define your substitutions and let Ghostscript do the work for you.
This is not an answer to the problem but rather an answer to KenS's question about "Why do you want to read it?"
I tried to put it in the comment box but it was too long.
I am a retired engineer with a strong programming background.
I would like to read and understand the postscript code for the reason shown below.
I play duplicate bridge as a hobby. I recieve a PDF file of what is know as a convention card (a single page document of bridge agreements).
Frequently I would like to edit these files.
When I open with Adobe Illustrator I have to spend a significant amount of time replacing fonts that are not on my system with fonts that I do have.
I can take the PDF and export it as a postscript file using Ghostscript.
I was going to write a little program to replace the embedded fonts with the fonts that I use to replace them.
I was going to leave the postscript file unaltered and insert things like
/HelveticaMonospacedPro-RG findfont
12 scalefont setfont
just above where the text is written.
I was planning on using the fonts that I have on my system (e.g., HelveticaMonospacedPro-RG).
I'm currently looking for a way to create a 'configurator' for a upholsters, similar to http://digitaldraping.com/configurator/furniture-sofa/?Cushions_Plain-Cream.png,Sofa_Stripe-Orange.png - you select your fabrics and they are 'drawn' on the sofa automatically.
Unfortunately, all the sites I've looked at seem to use pre-rendered transparent PNGs that are overlaid over each other to build up the full picture. The problem here is that we've figured out that we'd require over 120,000 different images to cover all models, fabrics etc!!
I've looked at a few 3d texture tools such as http://www.arahne.si/products/arah-drape.html, hoping that one of them would have a CLI option where you give it a pre-created wireframe, and a fabric to overlay, and it generates the required image on the fly, but so far everything seems to require real-time use of the GUI to use it.
So, is there a CLI tool that would do what I'm after, or can anyone suggest a way to manipulate the GUI automatically? (from a tech point of view, I'm comfortable with C, Bash, Python or PHP as a solution!)
Thanks!
ArahDrape 2.2 can now work from a command line without any GUI interface. You can also call ArahDrape as a C library. In this way, it can be used in a web server to create texture mapped images on the fly. The command line options are explained below.
ArahDrape 2.2j command line version, ©2015 Arahne
usage:
adCommand -o /tmp/outputImage.png -tN /home/user/texture.png [-hidemodel] [-divide 2] [-filterPNG] [-compressPNG 2] [-m /home/user/model.png] -owner name -activation 174b3cfb49e9 /home/user/project.drape
Input and output images can have png, .tif or .jpg extensions
-o output_image_file
-tN texture_image_file [N goes from 0 to 199]
-hidemodel will render all areas not in region as white
-divide N [N goes from 2 to 5] divide resulting image pixel size
-filterPNG if you do not filter it, rendering is faster
-compressPNG N [N goes from 0 to 9] lower number saves faster, but bigger files
-m model_image_file use this if you want to replace model image from the project; must have same pixel size
-owner owner_name pass the given owner name
-activation activation_code pass the given activation code
last parameter should be ArahDrape project file
All files should be entered with full path.
If you need spaces in filenames, use quotes "" around the filename.
If you provide only Owner name, without activation code, program returns registration code.
ArahDrape supports batch export.
Open ArahDrape project, click on texture you wish to replace, put all your texture in a directory, select from menu
Textures > Browse textures, and as you click the texture to load it, program will save the draped picture. If you have thousands of images, use keyboard shortcut = and program will automatically do them all.
Alpha channel transparency is supported in loading model images or textures, and saving the draped images, as long as you use PNG or TIFF.
Please check this video to see how
ArahDrape works in batch mode.
we (http://digitaldraping.com/) can do just what you are asking. We have two options creating images and rendering a meshed image on the fly. Just get in touch if you still need this solution.
I've read much about PDF extractions and libraries (as iText) but i just haven't found a solution to extract images and text (with coordinates) from a PDF.
The task is to scan PDF with catalog of products and extract each image. There is an image code printed next to each image and also a list of product codes for products that are shown on the image.
I know that there is no way to extract structured info from a PDF like this but with coordinates of all image and text objects I could write code to identify linked text by its distance from the image. Then I could split text using a RegExp and find out what is a product code, what is an image code etc.
Could you recommend a good and working solution for the task?
Use XPDF (http://www.foolabs.com/xpdf/)
It can extract all the characters in the PDF with co-ordinates (pdftotext -bbox [sourcefile] [outputfile]) and also all the images and SVGs in the PDF.
It's open source (GPLv2) and supports a lot of additional extraction functionalities as well.
Several Java libraries can do this. Have you looked at JPedal or PdfBox?
If a commercial library is an option for you, you could try Amyuni PDF Creator .Net or Amyuni PDF Creator ActiveX. You could use the method IacDocument.GetObjectsInRectangle to retrieve all the "graphic objects" of your interest, then use the ObjectType attribute to separate images from text. The library already provides an algorithm for putting close text together. From the documentation:
IacDocument.GetObjectsInRectangle Method
The GetObjectsInRectangle method gets all the objects that are in the specified rectangle.
Usual disclaimer applies.