Arabic text is not being displayed in proper format. - pdf-generation

We are generating pdf templates from Html text. My template contains combination of Arabic and english words.
Arabic texts are not being displayed correctly in generated pdf. (They should be displayed from right to left).
I have loaded valid pdfcalligraph license key in my code.
Please let me know if I need to write any code for this; as I searched in web and found that pdf calligraph will automatically detect the Arabic text and do the job.

Related

SpireDoc html to word preserving data tags

I am working on a template generator usecase which involves a tinyMCE editor with the option to export and import to/from Microsoft Word files.
I use SpireDoc to save html files as docx and restore docx back to html.
The problem is, as this is a template generator, I need to include some metadata along some of the html nodes. Say a user's name is inserted in the tinyMCE and the html code would look like this:
<span data-type="user-name" data-id="234">Some Name</span>
Once converted to docx, all the data- tags are stripped off. Is there a way I could tell SpireDoc to preserve those tags and to put them back in the html file when I convert docx back to html?
I know I could resort to ugly placeholders like ##USERNAME|ID## but I would prefer if I could keep my metadata tags because I also need to do that for images and the user will not be able to size/format the image if it was just a text placeholder.

Is there any way to convert ckeditor html content to ms word document?

I have a Laravel project where I need to create a doc/docx document based on user input in Ckeditor. I have previously worked with PHPword where I can convert simple text input to a docx document. But the problem with ckeditor is it gives you html with inline css (which i need) and PHPWORD can not convert this to a docx.
I also tried to convert the html to word by xml but no luck. I know there is a paid tool called phpdocx but I am looking for a free solution.
Just a note, I can actually convert the html to pdf. But again, there is no solution from pdf to doc.
So, any help in converting the html to word or pdf to word?
thanks

How to maintain HTML internal links when converting with Pandoc

I am trying to convert from html to pdf with Pandoc. The output is pretty nice, still with the command pandoc index.html -o output.pdfI lose all my internal links (from table of contents to chapters, from text to footnotes, etc).
In my HTML this is the outdegree link
<p class="calibre18"><span class="calibre8">CHAPTER ONE</span><br class="calibre19"></br>The Ever Expanding Domain of Computation</p>
which then lands here
Chapter 1 makes the case that because of...
and here
<p class="calibre18"><span class="calibre8">CHAPTER ONE</span><br class="calibre19"></br>The Ever Expanding Domain of Computation</p>...
Is there any way to keep all the links also in the output?
The Pandoc User's Guide section on Internal Links says
Internal links are currently supported for HTML formats (including HTML slide shows and EPUB), LaTeX, and ConTeXt.
This suggests that internal links aren't currently supported for PDF output, even though the PDF output is generated via LaTeX.
Internal links should work straightforwardly in PDF. However, for printing purposes, the default is not to color them. Have you tried clicking on the text that should be a link?

Not able to put text on image in table while creating PDF using XMLWorker

I am have html file and giving that to XMLWorkerHelper to parse it, so that it can generate PDF.
In Html. In table cell I can put text on image. but when I do
worker.parseXHtml(pdfWriter, document, isr);
it does not put text on image,
isr - it is ByteArrayInputStream of mockHttpServletResponse.
I have used all different kind of styling in html. but it is not working.
Please let me know how can I do this.

Searchable PDF Files (Image+Text PDF) validation

I am checking if a PDF document is searchable if I can get any text from every single page in a PDF.
But checking every page seems to take forever when I am trying to extract text from a PDF that contains more than 500~2000 pages.
Is it possible for a PDF to contain text for one page but not in the rest?
What I am trying to do here is that, if a first page of PDF contains text, then it is a searchable PDF else not..
Yes, it is very possible for a PDF to contain text on one page but not the rest. You could very well have a 500 page PDF that contains images on the first 499 pages, but contain text on the last page.
Unless you want to open the PDF file yourself and scan it for text/text operations, you will need to use an existing third-party PDF library that allows you to extract text from a PDF.
Also, see Ferruccio's response to a related question, which is to use the IFilter interface, specifically made for search indexing and text extraction.
Try this version of Searcharoo, which lets you search Word and PDF documents.

Resources