W.r.t. JMeter document upload and download,
I would like to know, Can we validate and do ample to ample content comparison (e.g. same Text, Space, Lines, Images etc) of PDF document which is converted using Libra Office/PDF Box in Document upload scenarios from different type of documents like Doc/Docx/Text/Jpg/Png/Rtf etc
Scenario-
Upload a Docx document ( Document should convert in PDF Format and user can view the same in pdf)
View the Docx Document in PDF Format after document upload
-Compare the Docx Contents (e.g. Text, Space, Lines, Images etc) in PDF doc, is same or not
Take a look at Apache Tika project, if you add the tika-app.jar to JMeter Classpath you would be able to use Document (text) "Field to Test" of the Response Assertion:
so you can check document content against reference text.
If it is not sufficient for your needs take a look at JSR223 Test Elements, Groovy language, Apache POI and Apache PDFBox APIs
Related
I am working on a template generator usecase which involves a tinyMCE editor with the option to export and import to/from Microsoft Word files.
I use SpireDoc to save html files as docx and restore docx back to html.
The problem is, as this is a template generator, I need to include some metadata along some of the html nodes. Say a user's name is inserted in the tinyMCE and the html code would look like this:
<span data-type="user-name" data-id="234">Some Name</span>
Once converted to docx, all the data- tags are stripped off. Is there a way I could tell SpireDoc to preserve those tags and to put them back in the html file when I convert docx back to html?
I know I could resort to ugly placeholders like ##USERNAME|ID## but I would prefer if I could keep my metadata tags because I also need to do that for images and the user will not be able to size/format the image if it was just a text placeholder.
I have a Laravel project where I need to create a doc/docx document based on user input in Ckeditor. I have previously worked with PHPword where I can convert simple text input to a docx document. But the problem with ckeditor is it gives you html with inline css (which i need) and PHPWORD can not convert this to a docx.
I also tried to convert the html to word by xml but no luck. I know there is a paid tool called phpdocx but I am looking for a free solution.
Just a note, I can actually convert the html to pdf. But again, there is no solution from pdf to doc.
So, any help in converting the html to word or pdf to word?
thanks
i am looking for a method to convert a pdf document into corresponding html document using abcpdf. kindly let me know if it is feasible. FYI, My pdf document has rich text along with images.
You can. Try this. Hopefully it'll work.
var doc = new WebSupergoo.ABCpdf10.Doc();
doc.Read('your Pdf byte array');
doc.Save('your HTML file path with .html extension');
doc.Clear();
doc.Dispose();
For documentation please have a look at the note section
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/doc/1-methods/save.htm
To export as XPS, PostScript, DOCX or HTML you need to specify a file path with an appropriate extension - ".xps", ".ps", ".docx", ".htm", ".html" or ".swf". If the file extension is unrecognized then the default PDF format will be used.
You can definitely convert HTML to PDF, but I am not sure the inverse is possible to do with abcpdf.
Perhaps you can give a try to iText (iTextsharp)
I would like to have a two page Indesign document. First page has text + image and second page has 2 images. The images should come from a csv file that gets data merged with the Indesign document. Is this achievable. I have only been able to do a data merge when I have one page, but then all pages have the same layout. Is this possible and how do I do it? Thanks.
The solution is to work with XML files (File -> Import XML) instead of CSV. You can make any type of document and put XML objects in text fields, images, ... XML is much more flexible than CSV.
a CSV field that is named #photo1 will link to a file location
column F (for instance)
#photo1
c:\foldername\filename.jpg
c:\folder2name\subfolder\otherfile.png
f:\file3.tiff
I am checking if a PDF document is searchable if I can get any text from every single page in a PDF.
But checking every page seems to take forever when I am trying to extract text from a PDF that contains more than 500~2000 pages.
Is it possible for a PDF to contain text for one page but not in the rest?
What I am trying to do here is that, if a first page of PDF contains text, then it is a searchable PDF else not..
Yes, it is very possible for a PDF to contain text on one page but not the rest. You could very well have a 500 page PDF that contains images on the first 499 pages, but contain text on the last page.
Unless you want to open the PDF file yourself and scan it for text/text operations, you will need to use an existing third-party PDF library that allows you to extract text from a PDF.
Also, see Ferruccio's response to a related question, which is to use the IFilter interface, specifically made for search indexing and text extraction.
Try this version of Searcharoo, which lets you search Word and PDF documents.