How to create png from unicode character? - image

I have been looking far and wide on the internet for images/vectors of unicode characters in any font, and have not found any. I need image files of unicode characters for the project I am working on, where I cannot just use text. Is there a way to "convert" unicode characters from a font into an image file? Or does anyone know where I can find this? Thank you.

Try BMFont Bitmap Font Generator Supports Unicode, generates PNG images - looks like a perfect match.

Related

How to recognize an image file format using its contents?

If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)
If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)
I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?
My Questions are:-
Is this behavior same in all image file formats (or formats in general)?
Could a image file (of no extension) be recognized just using this data?
Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?
Is this behavior same in all image file formats (or formats in
general)?
For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).
Could a image file (of no extension) be recognized just using this
data?
Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.
On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.
Is there information available on how this metadata is broken down? By
that I mean, data at which position in the metadata has what meaning?
There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.
Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.
As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:
You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.
Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.
This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:
There is no magic, it is just a mathematical statistics, so it can be
wrong in some cases. But for most ordinary images guessing method will
work fine.
Using Imagemagick, you can get that information (if available) for formats that Imagemagick can read from its "magick" data in the header file as follows:
convert image -format "%m\n" info:
For example:
convert lena.png -format "%m\n" info:
PNG
convert lena.jpg -format "%m\n" info:
JPEG
convert lena.pnm -format "%m\n" info:
PPM
Even if the suffix is removed, this still works:
convert lena_copy -format "%m\n" info:
PNG

How to use ImageMagick with Chinese fonts on Text to Image Handling

I try to use ImageMagick to handle a Chinese character to a image on my MacBook.
I use command to check the Chinese fonts available on my system.
convert -list font | grep Font
I did not get any.
Seen from the ImageMagick guide Text to Image Handling, Chinese font seems like supported , such as ZenKaiUni
And seen from the application Font Album of my MacBook. There are so many Chinese fonts.
I think it is OK. How to figure it out?
You can either tell ImageMagick about all the fonts on your system, like this and then they will show up if you do:
convert -list font
Then you can use shorthand:
convert -font Arial ...
Or, you can just tell ImageMagick the full path to any font on a per-invocation basis:
printf "Hello" | convert -pointsize 72 \
-font "/Applications/iMovie.app/Contents/Frameworks/Flexo.framework/Versions/A/Resources/Fonts/Zingende Regular.ttf" \
label:#- result.png
You would probably put Unicode for Chinese characters in place of my "Hello".
I do not have any chinese fonts on my system, but here is an example of what I would suggest using the symbol font. First download a proper UTF-8 chinese font, i.e. one that supports UTF-8 characters. Then open a UTF-8 compatible text editor, choose that font and type your string. For example, here is a screensnap of the symbols.txt file that I created using the symbol font in my BBEdit UTF-8 compatible text editor on my Mac.
Then using ImageMagick,
convert -size 100x -font "/library/fonts/GreekMathSymbols Normal.ttf" label:#symbols.txt symbol.gif
And the resulting symbol.gif image is:
Adding .utf8 as suffix to your file is not adequate. You must create a text file in a UTF-8 compatible text editor using a UTF-8 compatible font.
Furthermore, most terminal windows do not support UTF-8 characters / fonts. So typing your characters directly into the command line in the terminal window does not always work. See http://www.imagemagick.org/Usage/text/#unicode
You can't do it in ImagMagick as the font information it uses doesn't include language support, but it's easy with Font Book.app by creating a Smart Collection as follows:
On my Mac I have 35 fonts which include Chinese characters.
(The dimmed/greyed fonts are available but will need to be downloaded from Apple servers before I can use them, an automatic process done when selecting those fonts in any app.)

GIF format as png?

I'm seeing some images online that end in .png but appear as GIF. How is this possible?
Example:
https://www.khanacademy.org/computer-programming/loading/6267221601681408/5689792285114368.png
This is a GIF file, with an .png extension. Though the extension is "wrong", many image viewers (including browsers) can still it interpret them correctly because they don't believe blindly what the extension says (remember that the "extension" is just a hint), but they look into the image content. The first bytes of most common image formats allow to easily identify the image type. In this case, you can check (looking at the image content, say, in some hexadecimal editor/viewer) that the file content starts with the ASCII characters "GIF89a".

Is it possible to check if a PDF is CMYK or RGB using GhostScript?

Is it possible to check if a PDF is CMYK or RGB using GhostScript?
I am aware of the inkcov feature, but this just returns values in terms of CMYK (with silent conversion)?
Is the real check, a check for RGB colours or RGB images within the PDF? not sure if both RGB and CMYK images can exist in the same PDF?
Images aren't the only thing that can be in a PDF file, you can also have text, linework and shadings. Also transparency blending can be specified in specific colour spaces. Colour spaces are not limited to RGB or CMYK but can also include Gray and spot (Separation) colours, as well as ICCBased colour spaces and certain specific CIE colour spaces such as Lab.
All of these colour spaces can potentially be present in a PDF file simultaneously.
Ghostscript doesn't contain any tools currently to tell you what colour spaces are used in a PDF file, though the pdf_info.ps script could be modified to do so for unusual (not grey/RGB/CMYK) spaces. You could also write a small piece of PostScript which could tell you when a colour space was used, and what kind of colour it is.
The inkcov device is a CMYK device, so all colours specified in the PDF are converted to CMYK before being 'printed' to the inkcov device which counts up the coverage. It doesn't tell you anything about the original PDF file.
My understanding is that a PDF can contain both RGB and CMYK images, so you'd need to have a tool that can review all images and report on their mode.
If GhostScript doesn't include options to do so, you may have to write a script to use a PDF library for parsing the image and reporting details on the elements it contains.
For example, this Cam::PDF module in Perl says it can parse any PDF v1.5 formatted file.

How to save text file in UTF-8 format using pdftotext

I am using pdftotext opensource tool to convert the PDF to text files. How can I save the text files in UTF-8 format so that I can retain all the accent characters in text files. I am using the below command to convert which extracts the content to text file but not able to see any accented characters.
pdftotext -enc UTF-8 book1.pdf book1.txt
Please help me to resolve this issue.
Thanks in advance,
You can get a list of available encodings using the command:
pdftotext -listenc
and pick the right one using the -enc argument. Mine here seems to do UTF-8 by default. i.e. your "UTF-8" is superflous
pdftotext -enc UTF-8 your.pdf
You may want to check your locale (LC_ALL, LANG, ...).
EDIT:
I downloaded the following PDF:
http://www.i18nguy.com/unicode/unicodeexample.pdf
and converted it on a Windows 7 PC (german) and XPDF 3.02PL5 using the command:
pdftotext.exe -enc UTF-8 unicodeexample.pdf
The text file is definitely UTF-8 encoded, as all characters are displayed correctly. What are you using the text file for? If you're displaying it through a web application, your content encoding might simply be wrong, while the text file has been converted as you wanted it to.
Double-check using either a browser (force the encoding in Firefox to ISO-8859-1 and UTF-8) or using a hex editor.
Things are getting a little bit messy, so I'm adding another answer.
I took the PDF apart and my best guess would be a "problem" with the font used:
open the PDF file in Acrobar Reader
select all the text on the page
copy it and paste it into a Unicode-aware text editor (there's no "hidden" OCR, so you're copying actual data)
You'll see that the codepoints you end up with aren't the ones you're seeing in the PDF reader. Whatever the font is, it may have a mapping different from the one defined in the Unicode standard. As such, your content is "wront" and there's not much you can do about it.

Resources