PDFReactor - Generation pdf from a large html file

PDFReactor - Generation pdf from a large html file - pdf-reactor

I'm having an issue when trying to generate a pdf file using PDFReactor (version 11.2.2, but it's the same with other versions) from a large html file, in this case a 3.3mb html file. Is there any limitation PDFReactor wise using such large size files?
Somehow it doesn't generate when it's using that amount of data, but when the same test is done in a smaller batch of data, works like a charm, so it's weird to me that this is happening.
Does anyone have any ideia of this? Or what could be happening here?
Thank you in advance

Related

Apache Tika performance impact due to Tesseract

We are using Tika 2.4.0 and we scan hundreds of file to extract content from file, we have combination of file like pdfs, documents (docx) and plain text(.txt) files.
pdfs and docx can have only text or both text and images or only images.
Since we are using ocr to extract content from images, we observe that it is having more resource consumption in terms of memory and cpu(although this is expected behavior). But due to this we see processing of other files is also impacted due to choking of resources.
Is there a way we can have tesseract to run in different machine and give call to it on need basis. Little skeptical on this point as we provide tesseract path in tika config and its not service call.
Or any other solution recommended, to overcome such performance impact, so that when one file is under OCR'ing other files are still processed.

has anyone made an xsl file to generate HTML like, Jmeter Dashboard report from XML

I want to group the results in HTML Report by their threadName, which is available in CSV, but not displayed in the Dashboard Report, Other option I saw to get what I want was using an XML file and customising the XSL file to obtain the output, But, I also want to have those graphs available from Dashboard Report!
Has anyone else faced this and solved it??

It is possible to generate HTML report from *.jtl in XML format, but it could be difficult. JTL files in XML format could be many times very big (~hundreds of MB) and you can also have problem to process such a big file with XSLT script. Another problem could be get the graphs in this report.
I suggest you use other options.
Use SmartMeter's Report Generator. You can use it form GUI or from command line
Use Blazemeter's Sense - this option is very simple and easy, but could be more expensive

Merge PDFs while preserving hyperlinks

I'm currently building a download function that will generate numerous PDFs, and then merge them all together.
However, the links on the Table of Contents page stop working once the PDFs are merged.. All of the links are functioning when the PDF is originally generated, but once it's merged they stop working.
I'm currently using DomPDF to generate the original PDFs, and then PHPdocx's MultiMerge class to combine them all.
I've tried using different libraries to merge the PDFs, but all of them have the same result.
Note: the site is built on Laravel 5.2.
So, my question is, is it possible to somehow merge PDFs using PHP while still preserving all hyperlinks ? Or even going in and editing the final PDF once it's generated to insert the links..
Any help or tips would be really appreciated.
Thanks !

Creating a variable zip archive on the fly, estimating file size for content-length

I'm maintaining a site where users can place pictures and other files in a kind of shopping cart. After selecting all the various contents the user wishes to download, he can checkout. Till' now an archive was generated beforehand and the user got an email with the link to the file after the generation finished.
I've changed this now by using web api and push stream to directly generate the archive on the fly. My code is offering either a zip, a zip64 or .tar.gz dynamically, depending on the estimated filesize and operating system. For performance reasons compression ist set to best speed ('none' would make the zip archives incompatible with Mac OS, the gzip library I'm using doesn't offer none).
This is working great so far, however the user is no longer having a progress bar while downloading the file because I'm not setting the content-length. What are the best ways to get around this? I've tried to guess the resulting file size, but either the browsers are canceling the downloads to early or stopping at 99,x% and are waiting for the missing bytes resulting for the difference between the estimated and actual file size.
My first thought was to guess the resulting file size always a little bit to big and filling the rest with zeros?
I've seen many file hosters offering the possibility to select files from a folder and putting them into a zip file and all are having the correct (?) file size with them and a progress bar. Any best practises? Thanks!

This is just some thoughts, maybe you can use them :)
Using Web API/HTTP the normal way to go about is that the response contains the lenght of the file. Since the response is first received after the call has finished, the actual time for generating the file will not show any progress bar in any browser other than a Windows wait cursor.
What you could do is using a two steps approach.
Generating the zip file
Create a duplex like channel using SignalR to give feedback on the file generation.
Downloading the zip file
After the file is generated you should know the file size, and the browser will show a progress bar while downloading.

It looks that this problem should have been addressed using chunk extensions, but it seems to never got further than a draft.
So I guess you are stuck with either no progress or sending the file size up front.
It seems that generating exact size zip archives is trickier than adding zero padding.
Another option might be to pre-generate the zip file without storing it just to determine the size.
But I am just wondering why not just use tar? It has no compression, so it is easy determine it's final size up front from the size of individual files and it should be also supported by both OSx and Linux. And Windows should be able to handle none compressed zip archives, so a similar trick might work as well.

How to generate a PDF within application with no reporting framework

I need to create pdf reports in my app. I'm using asp.net mvc3. What's the best way to do this? I don't really want to use a reporting framework if i can avoid it, it's just a few reports, table layout, groupings, pagination possibly, totals, ability to merge pdfs into 1 pdf....any ideas? what would be ideal is if i could convert my html view into a pdf simply...

There is nothing built into .NET allowing to create PDF files. So you have two possibilities: write one yourself from scratch or use one that exists.
In case you decide to go with the second you may take a look at flying-saucer which along with ikvmc.exe could be used to convert XHTML files into PDF. I have blogged about some of the required steps in order to get this working.

Some possibilities:
I think you can do this with SQL Server reporting services (in SQL rather than a 3rd party reporting framework)
Low level PDF libraries that can be used: PDFSharp, iTextSharp.
You could print an html file to a postscript driver using word automation, then convert the PS to PDF via GhostScript

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

PDFReactor - Generation pdf from a large html file - pdf-reactor

Related

Apache Tika performance impact due to Tesseract

has anyone made an xsl file to generate HTML like, Jmeter Dashboard report from XML

Merge PDFs while preserving hyperlinks

Creating a variable zip archive on the fly, estimating file size for content-length

How to generate a PDF within application with no reporting framework

Categories

Resources