Multilingual text with rmagick

Multilingual text with rmagick - ruby

I'm using rmagick to annotate images programatically with text. The text will need to support a range of languages including Chinese, Korean, English among others. The font requirements I'm dealing with are very specific, and the font chosen for English supports a wide variety of western languages, but it doesn't support Chinese or Korean. I'll have other fonts for those languages.
The approach I have in mind is to map character ranges to particular fonts and programmatically tell rmagick what font to use based on that. Am I missing anything obvious, or is this a good approach to take?

Here is how I ended up solving this:
def font_for(verb)
return "#{Rails.root}/app/uploaders/fonts/Gotham-Bold.ttf" if verb =~ /\p{Latin}/
return "#{Rails.root}/app/uploaders/fonts/ArialUnicode.ttf"
end
That method will take some text and return the path to an appropriate font face. Regex's character property matching comes in handy here! Then i can use the font_for method inside my rmagick script for annotating an image.
def create_image_with_text
canvas = Magick::ImageList.new
canvas.new_image(640, 480) {self.background_color = "white"}
text = Magick::Draw.new
text.font = font_for "english"
text.pointsize = 23
text.gravity = ::Magick::NorthGravity
text.annotate(canvas, 0,0,0,28, "ENGLISH") { self.fill = '#343434' }
text.font = font_for self.verb
text.pointsize = 65
text.gravity = ::Magick::CenterGravity
text.annotate(canvas, 0,0,0,18, self.verb.upcase) { self.fill = '#343434' }
tempfile = Tempfile.new(['new_center_stripe', '.jpg'])
canvas.write tempfile.path
self.image.store!(tempfile)
end
It is worth noting that this simplistic approach wouldn't handle input with mixed languages.

Related

In InDesign, is there a way to bold a whole word that has one bold character?

I'm working on an index in InDesign. Some of the page numbers are in bold, others are in italics or regular. During editing, somehow the first numbers of some of the bold page numbers got changed. I've figured out how to highlight those page numbers by coloring the bold numbers and recoloring the page numbers that are correct using a GREP search for bold words (\b\w+\b). What I can't figure out is how to select the "bad" page numbers that have only some numbers and make the entire "word" bold. Any ideas? It would be nice not to have to fix them manually.

I just tried this on a document and added a few numbers that were only partially bold.
I was able to fix it by doing a search for only digits with (\b\d+\b), changing all to $1. I left find format blank and change format to regular font. This changed all numbers to regular with no mixed bold and regular.
After that you can run the same find and replace again but switching format to bold. This will change all numbers to be fully bold.

It heavily depends on the text you have. If it's just one first digit that need to change, if you don't use character styles, if you have no digits in your body text, if the font you're using has the common names for styles, if ... there is a lot of 'if's, actually. I'd recommend to share a sample of your file (IDML).
So, here is the script that could do the job (if all of those "if"'s are true):
var doc = app.activeDocument;
var styles = doc.characterStyles;
// STEP 1 -- apply style1 (regular) to all regular numbers \d\d+
var style1 = styles.add();
style1.name = 'digits_regular';
style1.fontStyle = 'Regular';
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = '\\b\\d\\d+'; // two or more digits
app.findGrepPreferences.fontStyle = 'Regular';
app.changeGrepPreferences.changeTo = '$0';
app.changeGrepPreferences.appliedCharacterStyle = style1;
doc.changeGrep();
// STEP 2 -- apply style2 (italic) to all italic numbers \d\d+
var style2 = styles.add();
style2.name = 'digits_italic';
style2.fontStyle = 'Italic';
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = '\\b\\d\\d+';
app.findGrepPreferences.fontStyle = 'Italic';
app.changeGrepPreferences.changeTo = '$0';
app.changeGrepPreferences.appliedCharacterStyle = style2;
doc.changeGrep();
// STEP 3 -- apply style3 (bold) to all unstyled numbers
var style3 = styles.add();
style3.name = 'digits_bold';
style3.fontStyle = 'Bold';
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = '\\b\\d\\d+';
app.findGrepPreferences.appliedCharacterStyle = styles[0]; // syle '[None]'
app.changeGrepPreferences.changeTo = '$0';
app.changeGrepPreferences.appliedCharacterStyle = style3;
doc.changeGrep();
// clean prefs
app.findGrepPreferences = NothingEnum.nothing;
Input:
Result:
Then you can remove the character styles you don't need them. But I'd recommend to use styles. They make the life easier exactly in such cases.

It's much easier to use the Find/Change interface in Indesign.

How do I use multiple fonts ie a composite font in HexaPDF

Our users are giving us Emoji and a lot of other weird characters and the built-in Helvetica can't handle it. Neither can Google's Noto fonts by themselves - I need to figure out how to declare the Noto Font Family in HexaPDF and I can't figure out how to do that with the given documentation. OpenSans was an improvement, but I still want more glyph coverage than that.
Update:
I used this method to set the font:
def self.pdf_summary_font
##pdf_summary_font ||= File.open(Rails.root.join('public',
'OpenSansEmoji.ttf'), 'r')
end
canvas = page.canvas(type: :overlay)
canvas.font(self.class.pdf_summary_font, size: 10)
However, no Noto font ever worked with this - I would get errors like "Missing glyph - 'A'"
The best I could do was to use OpenSansEmoji, and replace missing glyphs with the following block:
begin
style = HexaPDF::Layout::Style.new(font: canvas.font, fill_color: color, stroke_color: color, align: :left, valign: :center)
fragment = HexaPDF::Layout::TextFragment.create(str, style)
layouter = HexaPDF::Layout::TextLayouter.new(style)
layouter.fit([fragment], w, h).draw(canvas, x1, y2)
rescue HexaPDF::Error => e
if e.message.include?('Glyph for')
glyph = e.message.match(/\{(.*?)\}/).captures.first
str = str.grapheme_clusters.map do |char|
if char.dump.include?(glyph)
"\u{FFFD}"
else
char
end
end.join
retry
end

Have a look at https://hexapdf.gettalong.org/documentation/reference/api/HexaPDF/index.html and the configuration option "font_map". This allows you to declare any TrueType file and use it.
You could also use the path to the font file directly with the Canvas#font method.
If you need to cover a wide array of characters you need to use a single font that covers all of them, one of the fonts included in this ZIP file should probably work (Google says 582 languages, 237 regions included).

PDFClown Copy annotations and then manipulate them

I have the need to copy annotations from one PDF File to another. I have used the excellent PDFClown library but unable to manipulate things like color,rotation etc. Is this possible? I can see the baseobject information but also unsure how to manipulate that directly.
I can copy the appearance via cloning appearance but can't "edit" it.
Thanks in advance.
Alex
P.S If Stephano the author is listeing ,is project dead?

On annotations in general and Callout annotations in particular
I looked into it a bit, and I'm afraid there is not much you can deterministically manipulate for arbitrary inputs using high level methods. The reason is that there are numerous alternative ways to set the appearance of a Callout annotation and PDF Clown only supports the less prioritized ways with explicit high level methods. From high priority downwards
An explicit appearance in an AP stream. If it is given, it is used, ignoring whether this appearance looks like a Callout annotation at all, let alone like one defined by the other Callout properties.
PDF Clown does not create an appearance for callout annotations from the other values yet, let alone update existing appearances to follow up to some specific attribute (e.g. Color) change. For ISO 32000-2 support, PDF Clown here will have to improve as appearance streams have become mandatory.
If it exists, you can retrieve the appearance using getAppearance() but you only get a FormXObject with its low level drawing instructions, nothing Callout specific.
One thing you can manipulate quite easily given a FormXObject, though, you can rotate or skew the appearance quite easily by setting its Matrix accordingly, e.g.
annotation.getAppearance().getNormal().get(null).setMatrix(AffineTransform.getRotateInstance(100, 10));
A rich text string in the RC string or stream. Unless an appearance is given, the text in the Callout text box is generated from this rich text datum (rich text here uses a XHTML 1.0 subset for formatting).
PDF Clown does not create a rich text representation of the Callout text yet, let alone update existing ones to follow up to some specific attribute (e.g. Color) change..
If it exists, you can retrieve the rich text by low level access using getBaseDataObject().get(PdfName.RC), change this string or stream, and set it again using getBaseDataObject().put(PdfName.RC, ...). Similarly you can retrieve, manipulate, and set the rich text default style string using its name PdfName.DS instead.
A number of different settings for separate aspects used to build the Callout from in the absence of appearance stream and (as far as the text content is concerned) rich text string.
PDF Clown supports (many of) these attributes, in particular if you cast the cloned annotation to StaticNote, e.g. the opacity CA using get/set/withAlpha, the border Border / BS using get/set/withBorder, the background color C using get/set/withColor, ...
It by the way has an error in its line ending style LE support: Apparently the code for the Line annotation LE property was copied without checking; unfortunately that attribute there follows a different syntax...
Your tasks
Concerning the attributes you stated you want to change, therefore,
Rotation: There is no rotation attribute in the Callout annotation per se (other than the flag whether or not to follow the page rotation). Thus, you cannot set a rotation as a simple annotation attribute. If the source annotation does have an appearance stream, though, you can manipulate its Matrix to rotate it inside the annotation rectangle, see above.
Border color and font: If your Callout has an appearance stream, you can try and parse its content using a ContentScanner and manipulate color and font setting operations. Otherwise, if rich text information is set, for the font you can try and parse the rich text using some XML parser and manipulate font style attributes. Otherwise, you can parse the default appearance DA string and manipulate its font and color setting instructions.
Some example code
I created a file with an example Callout annotation using Adobe Acrobat: Callout-Yellow.pdf. It contains an appearance stream, rich text, and simple attributes, so one can use this file for example manipulations at different levels.
The I applied this code to it with different values for keepAppearanceStream and keepRichText (you didn't mention whether you used PDF Clown for Java or .Net; so I chose Java; a port to .Net should be trivial, though...):
boolean keepAppearanceStream = ...;
boolean keepRichText = ...;
try ( InputStream sourceResource = GET_STREAM_FOR("Callout-Yellow.pdf");
InputStream targetResource = GET_STREAM_FOR("test123.pdf");
org.pdfclown.files.File sourceFile = new org.pdfclown.files.File(sourceResource);
org.pdfclown.files.File targetFile = new org.pdfclown.files.File(targetResource); ) {
Document sourceDoc = sourceFile.getDocument();
Page sourcePage = sourceDoc.getPages().get(0);
Annotation<?> sourceAnnotation = sourcePage.getAnnotations().get(0);
Document targetDoc = targetFile.getDocument();
Page targetPage = targetDoc.getPages().get(0);
StaticNote targetAnnotation = (StaticNote) sourceAnnotation.clone(targetDoc);
if (keepAppearanceStream) {
// changing properties of an appearance
// rotating the appearance in the appearance rectangle
targetAnnotation.getAppearance().getNormal().get(null).setMatrix(AffineTransform.getRotateInstance(100, 10));
} else {
// removing the appearance to allow lower level properties changes
targetAnnotation.setAppearance(null);
}
// changing text background color
targetAnnotation.setColor(new DeviceRGBColor(0, 0, 1));
if (keepRichText) {
// changing rich text properties
PdfString richText = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.RC);
String richTextString = richText.getStringValue();
// replacing the font family
richTextString = richTextString.replaceAll("font-family:Helvetica", "font-family:Courier");
richText = new PdfString(richTextString);
targetAnnotation.getBaseDataObject().put(PdfName.RC, richText);
} else {
targetAnnotation.getBaseDataObject().remove(PdfName.RC);
targetAnnotation.getBaseDataObject().remove(PdfName.DS);
}
// changing default appearance properties
PdfString defaultAppearance = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.DA);
String defaultAppearanceString = defaultAppearance.getStringValue();
// replacing the font
defaultAppearanceString = defaultAppearanceString.replaceFirst("Helv", "HeBo");
// replacing the text and line color
defaultAppearanceString = defaultAppearanceString.replaceFirst(". . . rg", ".5 g");
defaultAppearance = new PdfString(defaultAppearanceString);
targetAnnotation.getBaseDataObject().put(PdfName.DA, defaultAppearance);
// changing the text value
PdfString contents = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.Contents);
String contentsString = contents.getStringValue();
contentsString = contentsString.replaceFirst("text", "text line");
contents = new PdfString(contentsString);
targetAnnotation.getBaseDataObject().put(PdfName.Contents, contents);
// change the line width and style
targetAnnotation.setBorder(new Border(0, new LineDash(new double[] {3, 2})));
targetPage.getAnnotations().add(targetAnnotation);
targetFile.save(new File(RESULT_FOLDER, "test123-withCalloutCopy.pdf"), SerializationModeEnum.Standard);
}
(CopyCallOut test testCopyCallout)
Beware, the code only has proof-of-concept quality: For arbitrary PDFs you cannot simply expect a string replace of "font-family:Helvetica" by "font-family:Courier" or "Helv" by "HeBo" or ". . . rg" by ".5 g" to do the job: fonts can be given using different style attributes or names, and different coloring instructions may be used.
Screenshots in Adobe
The original file:
keepAppearanceStream = true:
keepAppearanceStream = false and keepRichText = true:
keepAppearanceStream = false and keepRichText = false:

As a post commment Mkl
Your great advice is really helpful for when creating new annotations. I did apply the following as a method of "copying" an existing annotation where note is the "cloned" annotation ad baseAnnotation the source
foreach (PdfName t in baseAnnotation.BaseDataObject.Keys)
{
if (t.Equals(PdfName.DA) || t.Equals(PdfName.DS) || t.Equals(PdfName.RC) || t.Equals(PdfName.Rotate))
{
note.BaseDataObject[t] = baseAnnotation.BaseDataObject[t];
}
}
Thanks again

Pango select multiples fonts

I have three fonts i want to use in my software with pango:
Font1: latin, Cryllic characters
Font2: Korean characters
Font3: Japanese characters
Pango render the text correctly but i want select a font
There any way to indicate this preference pango font?
I use: linux and pango 1.29

The simplest way is to use PangoMarkup to set the fonts you want:
// See documentation for Pango markup for details
char *pszMarkup = "<span face=\"{font family name goes here}\">"
"{text requiring font goes here}"
"</span>"; // Split for clarity
char *pszText; // Pointer for text without markup tags
PangoAttrList *pAttr; // Attribute list - will be populated with tag info
pango_parse_markup (pszMarkup, -1, 0, &attr_list, &pszText, NULL, NULL);
You now have a buffer of regular text and an attribute list. If you want to set these up by hand (without going through the parser), you will need one PangoAttribute per instance of the font and set PangoAttribute.start_index and PangoAttribute.end_index by hand.
However you get them, you now give them to a PangoLayout:
// pWidget is the windowed widget in which the text is displayed:
PangoContext *pCtxt = gtk_widget_get_pango_context (pWidget);
PangoLayout *pLayout = pango_layout_new (pCtxt);
pango_layout_set_attributes(pLayout, pAttr);
pango_layout_set_text (pLayout, pszText, -1);
That's it. Use pango_cairo_show_layout (cr, pLayout) to display the results. The setup only needs changing when the content changes - it maintains the values across draw signals.

Arabic font in Web UI and itextsharp

I'm not able to find a reason why my MVC 3 web site shows arabic font correctly and my pdf not.
I use a bliss font in my web site;
#font-face {
font-family: 'blissregular';
src: url('/Fonts/blissregular-webfont.eot');
src: url('/Fonts/blissregular-webfont.eot?#iefix') format('embedded-opentype'),
url('/Fonts/blissregular-webfont.ttf') format('truetype');
font-weight: normal;
font-style: normal;}
All working fine.
After that I want to create the pdf of the output but arabic fonts does not appears.
I've googled and understand that the font must have the arabic character to show up correctly. I've changed to arial font (that contains arabic character) and... pdf worked.
So... How is possible that with bliss font (that does NOT have arabic characters) I see arabic font in web site?
I'm really confused....
thanks a lot to everybody!

For every character your browser encounters it looks for a matching glyph in the current font. If the font doesn't have that glyph it looks for any fallback fonts to see if they have that glyph. Ultimately every browser has a core set of default fonts that are the ultimate fallback. When you specify the font Bliss but use Arabic characters you are probably just seeing your browser's fallback fonts.
PDFs don't work that way. If you say something is using font XYZ then it will try to render it using that font or fail.
The easiest way probably is to just add a font to your CSS that supports those characters.
.myclass{font-family: blissregular, Arial}
If that doesn't work you might need to inject the fonts manually. (Actually, I'm not 100% certain the iText support #font-face, either.) iText has a helper class that can figure things out for you that Bruno talks about it here but unfortunately the C# link isn't working anymore. It's very simple, you just create an instance of the FontSelector class, call AddFont in the order that you want characters to be looked up up in and then pass a string to the Process() method which spits back a Phrase that you can add. Below is basic sample code that shows this off. I apologize for my sample text, I'm English-native so I just searched for something to use, I hope I didn't mangle it or get it backwards.
You'll need to jump through a couple of extra hoops when processing the HTML but you should be able to work it out, hopefully.
//Sample string. I apologize, this is from a Google search so I hope it isn't backward
var testString = "يوم الاثنين \"monday\" in Arabic";
var outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");
//Standard PDF setup
using (var fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//This is a font that I know *does not* support Arabic characters, substitute with your own font if you don't have it
var gishaFontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "gisha.ttf");
var gishaBaseFont = BaseFont.CreateFont(gishaFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
var gishaFont = new iTextSharp.text.Font(gishaBaseFont, 20);
//Add our test string using just a normal font, this *will not* display the Arabic characters
doc.Add(new Phrase(testString, gishaFont));
//This is a font that I know *does* support Arabic characters
var arialFontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
var arialBaseFont = BaseFont.CreateFont(arialFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
var arialFont = new iTextSharp.text.Font(arialBaseFont, 20);
//Create our font selector specifying our most specific font first
var Sel = new FontSelector();
Sel.AddFont(gishaFont);
Sel.AddFont(arialFont);
//Have the font selector process our text into a series of chunks wrapped in a phrase
var newPhrase = Sel.Process(testString);
//Add the phrase, this will display both characters
doc.Add(newPhrase);
//Clean up
doc.Close();
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Multilingual text with rmagick - ruby

Related

In InDesign, is there a way to bold a whole word that has one bold character?

How do I use multiple fonts ie a composite font in HexaPDF

PDFClown Copy annotations and then manipulate them

Pango select multiples fonts

Arabic font in Web UI and itextsharp

Categories

Resources