I know there are similar articles like PDF file conversion in JMeter but they do not answer the actual problem which is "when converting a PDF to a variable/object/property then back to PDF the document whilst the correct number of pages is 'whit on white'= blank.
is there a way to :
Create a runtime variable/object/property from an existing PDF file that can be used in a subsequent action.
Here other actions happen in the Test Plan but they do not
Convert the variable/object/property back to a pdf so that when viewed it does not contain just blanks.
Notes: I do not just wish to just copy a to pdf to pdf.
I have also tried creating a UDV form the pdf using the following posted on here without success too.
${__groovy(vars.putObject("hoping_its_a_pdf"), new File("my_original.pdf"))}
Reading other posts here I have also noticed strange character strings like "%âãÏÓ" when using both putObect and props.put when viewing them post creation but as the article said, most probably page break characters or similar so I have ignored those for now as I assumed it is the conversion and not the reason for the blank content.
Can someone please assist as this is now 4 weeks in and I still have white pdf's.
We cannot state what's wrong with the code which you're copying and pasting from some random sources.
There is not problem with storing the PDF file in JMeter Variables or properties and creating the file back from them.
Demo:
There are 2 problems the only piece of "code" you're sharing:
Your way of using vars.putObject() function is wrong, it takes 2 parameters: variable name and the object value. See Top 8 JMeter Java Classes You Should Be Using with Groovy article for more information on this and other JMeter API shorthands
Apart from this the function itself is syntactically incorrect, you need to escape any comma in the function with a backslash
So if you change your:
${__groovy(vars.putObject("hoping_its_a_pdf"), new File("my_original.pdf"))}
to
${__groovy(vars.putObject('hoping_its_a_pdf'\, new File('my_original.pdf')),)}
at least this bit will start working as you expect.
Related
I'm using NReco PDF generator to create PDFs of some fairly lengthy html tables. Most of the time it works fine, but sometimes it will generate a PDF that's just two blank pages (one blank page where the cover would be, followed by a blank page with the correct header and footer). I don't think anything is wrong with the html itself, since it does successfully generate the full document with the same input other times.
Could this be a timeout issue due to the large number of pages? Per another post I saw, I tried initializing the converter with this optional argument, but it didn't help:
NReco.PdfGenerator.HtmlToPdfConverter pdfConverter = new NReco.PdfGenerator.HtmlToPdfConverter{ CustomWkHtmlPageArgs = " --no-stop-slow-scripts" };
Is there anything else I should adjust, or does anyone know what else could be causing this?
Update: This is primarily happening in Chrome. I have the PDF generating in the browser in a new tab, and I thought it might be a caching issue, so I added a timestamp parameter in the url so it would be unique each time, but that didn't seem to help.
Final Update: adding --javascript-delay 2500 to the CustomWkHtmlPageArgs seems to have fixed the problem, so I think it must have been an issue with the PDF generating before the data was fully loaded.
adding --javascript-delay 2500 to the CustomWkHtmlPageArgs seems to have fixed the problem, so I think it must have been an issue with the PDF generating before the data was fully loaded.
I'm using PowerBuilder 10.5 and as a newbie I'm a bit stuck and since Google isn't giving me a satisfying answer I'm asking some advice from the Stack Overflow group.
I have a Rich Text Edit field in which the user can write something, insert pictures and so forth. Once finished, he goes to the „Search“ command button and clicking it searches for the batch file that will suit his needs (copy that text into an existing word document, create a new word and place the folder on web, and so fort – there are 6 different batches). The code in the clicked event of „Search“ command button is this:
String ls_s
GetFileOpenName('PB_app', ls_s, ls_s, 'BAT', "Win Batch Files (*.BAT),*.BAT", 'C:\Programs\Test')
And here come my problems: I can't connect my app and the selected batch file. I'd like the path of the selected batch file to be visible in the Single Line Edit filed, but I have no idea how to get there, not to mention I'm point blank at how to connect PB app, batch file, how to even say to the batch file – „That text in rich text edit field is the one you have to work with?“…?
So I need some advice, guidance, perhaps some links or names of any literature that would help me understand how it should be done. I've lost two days and got nowhere, and I just need some piece of advice to get me going…
Your problem is that the original programmer used one variable for two return values. If you declare a new string variable and pass it instead of the first ls_s, you'll see this will return you the path. If you run into trouble, PB has a good help file (and the manuals are also online) which covers GetFileOpenName().
Good luck,
Terry
I'm looking into the possibilities of jmeter, and it looks just great. However, one of the things my testing script should be able to do, is search for some values, and click on a random resulting link.
So what I would need to automate is:
Entering the values in the searchbox (I could do this by using the correct GET url in a second page, but how do I do this 5000 times?)
Clicking on one of the results listed.
Thanks for the help!
This can be done using the CSV Data Set, it loads a CSV file, and feeds the content into the variables of your choice:
http://jmeter.apache.org/usermanual/component_reference.html#CSV_Data_Set_Config
After that, you can use the regular expression extractor to extract the URL's you want from the resulting HTML, and follow those links:
http://jmeter.apache.org/usermanual/component_reference.html#Regular_Expression_Extractor
You could use a HTML Link Parser for your second part (the clicking on one of the results):
http://jmeter.apache.org/usermanual/component_reference.html#HTML_Link_Parser
See http://theworkaholic.blogspot.co.at/2009/11/randomly-clicking-links-in-jmeter.html for an example of using the link parser in a context similar to your question.
I am currently developing some functionality that implements some complex calculations. The calculations themselves are explained and defined in Word documents.
What I would like to do is create a hyperlink in each code file that references the assocciated Word document - just as you can in Word itself. Ideally this link would be placed in or near the XML comments for each class.
The files reside on a network share and there are no permissions to worry about.
So far I have the following but it always comes up with a file not found error.
file:///\\165.195.209.3\engdisk1\My Tool\Calculations\111-07 MyToolCalcOne.docx
I've worked out the problem is due to the spaces in the folder and filenames.
My Tool
111-07 MyToolCalcOne.docx
I tried replacing the spaces with %20, thus:
file:///\\165.195.209.3\engdisk1\My%20Tool\Calculations\111-07%20MyToolCalcOne.docx
but with no success.
So the question is; what can I use in place of the spaces?
Or, is there a better way?
One way that works beautifully is to write your own URL handler. It's absolutely trivial to do, but so very powerful and useful.
A registry key can be set to make the OS execute a program of your choice when the registered URL is launched, with the URL text being passed in as a command-line argument. It just takes a few trivial lines of code to will parse the URL in any way you see fit in order to locate and launch the documentation.
The advantages of this:
You can use a much more compact and readable form, e.g. mydocs://MyToolCalcOne.docx
A simplified format means no trouble trying to encode tricky file paths
Your program can search anywhere you like for the file, making the document storage totally portable and relocatable (e.g. you could move your docs into source control or onto a website and just tweak your URL handler to locate the files)
Your URL is unique, so you can differentiate files, web URLs, and documentation URLs
You can register many URLs, so can use different ones for specs, designs, API documentation, etc.
You have complete control over how the document is presented (does it launch Word, an Internet Explorer, or a custom viewer to display the docs, for example?)
I would advise against using spaces in filenames and URLs - spaces have never worked properly under Windows, and always cause problems (or require ugliness like %20) sooner or later. The easiest and cleanest solution is simply to remove the spaces or replace them with something like underscores, dashes or periods.
I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text