HexaPDF add font when importing other document's page - ruby

I have app that adding texts for original pdf and generate new pdf.
All is good until I have page that contain different font, then target pdf have no glyphs(boxes instead of characters), when source_doc saved, it displays font properly.
Perhaps something to do with how .import method work but i did not found way :/
Here is part of code:
target_doc = HexaPDF::Document.new
source_doc = HexaPDF::Document.open("source.pdf")
page = source_doc.pages[0]
canvas = page.canvas(type: :overlay)
# ... some code filling the doc with the text
font_file = "new_font.ttf"
source_doc.fonts.add(font_file)
canvas.font font_file
canvas.text(text, at: [x, y])
# back to default font
canvas.font(FONT_FAMILY, size: FONT_SIZE)
source_doc.pages.each { |page| target_doc.pages << target_doc.import(page) }
target_doc.write(output_file)
I have tried to .add font to target_doc but it did not added(tried before and after import)
In the target_doc.fonts I can see font loaded in loaded_fonts_cache and in glyphs.
Anyone has any clue how can I import pages including font used in it ?
Document used: https://hexapdf.gettalong.org/examples/merging.html

In order to import page with missing information(like new fonts), need to call this method before importing pages to a new pdf, after source_doc.fonts.add(font_file) because this info available only after all glyps are known to the source document.
source_doc.dispatch_message(:complete_objects)
Thanks to Thomas, author of HexaPDF <3
https://github.com/gettalong/hexapdf/issues/214

Related

RinohType sphinx customize the styles in PDF

I am using RinohType for generating my RST files to PDF.
I am trying to understand how to provide custom styles in the PDF for my logo and other elements.
I somehow felt the explanation in the Default matcher doesn't provide examples on how to do this.
conf.py
rinoh_documents = [dict(doc='index', # top-level file (index.rst)
target='manual',
template='rinohtype.rtt',
logo='_static/rr-logo-vertical2022-1100px-transp.png')]
rhinotype.rtt
[TEMPLATE_CONFIGURATION]
name = my article configuration
template = article
stylesheet = my_stylesheet.rts
parts =
title
;front_matter
contents
language = fr
abstract_location = title
[SectionTitles]
contents = 'Contents'
[AdmonitionTitles]
caution = 'Careful!'
warning = 'Please be warned'
[VARIABLES]
paper_size = A5
[title]
page_number_format = lowercase roman
end_at_page = left
[contents]
page_number_format = number
[title_page]
top_margin = 2cm
my_stylesheet.rts
Here I am trying to change the width of my logo in the PDF.
What is the correct way to give the css properties here.
width: 100px
The default matcher defines the title page logo style. To adjust the style of this element, you can create a style sheet that builds upon the default sphinx style sheet and tweak the title page logo style:
[STYLESHEET]
name=My Style Sheet
description=My tweaks to the Sphinx style sheet
base=sphinx
[title page logo]
width = 4cm
This style accepts the FlowableStyle style attributes. In the linked documentation, you can see the width attribute supports a bunch of units but not px.
Please stay tuned for better documentation. Something is actually happening in that area!
P.S. If you want to make more changes to the styling of your document, the style log can be very useful to find out which style name corresponds to a particular document element.

PhantomJS - Rendering fails to show all images

I have a phantomjs script that is stepping through the pages of my site.
For each page, I use page = new WebPage() and then page.close() after finishing with the page. (This is a simplified description of the process, and I'm using PhantomJS version 1.9.7.)
While on each page, I use page.renderBase64('PNG') one or more times, and add the results to an array.
When I'm all done, I build a new page and cycle through the array of images, adding each to the page using <img src="data:image/png;base64,.......image.data.......">.
When done, I use page.render(...) to make a PDF file.
This is all working great... except that the images stop appearing in the PDF after about the 20th image - the rest just show as 4x4 pixel black dots
For troubleshooting this...
I've changed the render to output a PNG file, and have the same
problem after the 19th or 20th image.
I've outputted the raw HTML. I
can open that in Chrome, and all the images are visible.
Any ideas why the rendering would be failing?
Solved the issue. Turns out that PhantomJS was still preparing the images when the render was executed. Moving the render into the onLoadFinished handler, as illustrated below, solved the issue. Before, the page.render was being called immediately after the page.content = assignment.
For those interested in doing something similar, here's the gist of the process we are doing:
var htmlForAllPages = [];
then, as we load each page in PhantomJS:
var img = page.renderBase64('PNG');
...
htmlForAllPages.push('<img src="data:image/png;base64,' + img + '">');
...
When done, the final PDF is created... We have a template file ready, with all the required HTML and CSS etc. and simply insert our generated HTML into it:
var fs = require('fs');
var template = fs.read('DocumentationTemplate.html');
var finalHtml = template.replace('INSERTBODYHERE', htmlForAllPages.join('\n'));
var pdfPage = new WebPage();
pdfPage.onLoadFinished = function() {
pdfPage.render('Final.pdf');
pdfPage.close();
};
pdfPage.content = finalHtml;

Arabic font in Web UI and itextsharp

I'm not able to find a reason why my MVC 3 web site shows arabic font correctly and my pdf not.
I use a bliss font in my web site;
#font-face {
font-family: 'blissregular';
src: url('/Fonts/blissregular-webfont.eot');
src: url('/Fonts/blissregular-webfont.eot?#iefix') format('embedded-opentype'),
url('/Fonts/blissregular-webfont.ttf') format('truetype');
font-weight: normal;
font-style: normal;}
All working fine.
After that I want to create the pdf of the output but arabic fonts does not appears.
I've googled and understand that the font must have the arabic character to show up correctly. I've changed to arial font (that contains arabic character) and... pdf worked.
So... How is possible that with bliss font (that does NOT have arabic characters) I see arabic font in web site?
I'm really confused....
thanks a lot to everybody!
For every character your browser encounters it looks for a matching glyph in the current font. If the font doesn't have that glyph it looks for any fallback fonts to see if they have that glyph. Ultimately every browser has a core set of default fonts that are the ultimate fallback. When you specify the font Bliss but use Arabic characters you are probably just seeing your browser's fallback fonts.
PDFs don't work that way. If you say something is using font XYZ then it will try to render it using that font or fail.
The easiest way probably is to just add a font to your CSS that supports those characters.
.myclass{font-family: blissregular, Arial}
If that doesn't work you might need to inject the fonts manually. (Actually, I'm not 100% certain the iText support #font-face, either.) iText has a helper class that can figure things out for you that Bruno talks about it here but unfortunately the C# link isn't working anymore. It's very simple, you just create an instance of the FontSelector class, call AddFont in the order that you want characters to be looked up up in and then pass a string to the Process() method which spits back a Phrase that you can add. Below is basic sample code that shows this off. I apologize for my sample text, I'm English-native so I just searched for something to use, I hope I didn't mangle it or get it backwards.
You'll need to jump through a couple of extra hoops when processing the HTML but you should be able to work it out, hopefully.
//Sample string. I apologize, this is from a Google search so I hope it isn't backward
var testString = "يوم الاثنين \"monday\" in Arabic";
var outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");
//Standard PDF setup
using (var fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//This is a font that I know *does not* support Arabic characters, substitute with your own font if you don't have it
var gishaFontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "gisha.ttf");
var gishaBaseFont = BaseFont.CreateFont(gishaFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
var gishaFont = new iTextSharp.text.Font(gishaBaseFont, 20);
//Add our test string using just a normal font, this *will not* display the Arabic characters
doc.Add(new Phrase(testString, gishaFont));
//This is a font that I know *does* support Arabic characters
var arialFontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
var arialBaseFont = BaseFont.CreateFont(arialFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
var arialFont = new iTextSharp.text.Font(arialBaseFont, 20);
//Create our font selector specifying our most specific font first
var Sel = new FontSelector();
Sel.AddFont(gishaFont);
Sel.AddFont(arialFont);
//Have the font selector process our text into a series of chunks wrapped in a phrase
var newPhrase = Sel.Process(testString);
//Add the phrase, this will display both characters
doc.Add(newPhrase);
//Clean up
doc.Close();
}
}
}

TYPO3 - Change order of elements within image element

I have a default text/image element on a TYPO3 page and the content inside is in this order:
title
image
text
I need this order:
image
title
text
I've been trying to modify the tt_content std.header object within my typoscript but it is not working properly.
Does anyone know an answer to this?
Take a look into: /typo3/sysext/css_styled_content/static/setup.txt
Search for: CType: image (line ~650), then copy it whole into your typoscript template, purge tt_content.image.10 (it's header) and try to place lib.stdheader in required place in tt_content.image.20... just a concept, I did something similar years ago and don't remember details...
Other option is using CSS/JS for changing the order, maybe it will be easier ?
tt_content.textpic {
# remove default header
10 >
20 = < tt_content.image.20
20 {
layout = TEXT
layout.value = <div class="your-classes###CLASSES###">###IMAGES###</div>###TEXT###
# insert the header as part of text
text.15 = < lib.stdheader
text.20 = < tt_content.text.20
text.wrap >
}
}

How to remove link tag from image using Nokogiri

I'm parsing an HTML document using Nokogiri. The code contain several images like this:
<img alt="alternative-text" border="0" height="427" src="http://url_to_my_photo.jpg?" title="Image Title" width="640">
I'm trying to save that image to my S3 storage, change the style and remove the link. All the images have the css tag ".post-body img".
So far, the closest I got is this:
#doc.css(".post-body img").each do |image|
#new_photo = Photo.create!(
#Params required to save and upload the photo to S3.
...
...
)
# The url of the photo upload to S3 is #new_photo.photo.url
image['src']= #new_photo.photo.url
image['class'] = "my-picture-class"
image.parent['src] = '#'
puts image.parent.content
#doc.to_html
end
This removes the link to the big photo but obviously it isn't a good solution.
I've tried to replace the parent using image.parent << image as suggested on http://rubyforge.org/pipermail/nokogiri-talk/2009-June/000333.html but doesn't do anything and image.parent = image returns "Could not reparent node (RuntimeError)"
To convert that mailing list example over to apply to your situation, you have to remember that node is the node they are trying to get rid of, which in your case is image.parent.
So instead of image.parent['src] = '#' you should try:
link = image.parent
link.parent << image
link.remove
Edit:
Actually, the above code would probably move all the images to the bottom of whatever element contains the link, so try this instead:
link = image.parent
link.replace(image)

Resources