Rendering large collections of articles to PDF fails in MediaWiki with mwlib - pdf-generation

I have installed the Mediawiki Collection Extension and mwlib to render articles (or collections of articles) to PDF. This works very well for single articles and collections with up to 20 articles.
When I render larger collections, the percentage counter in the parsing page (which counts to a 100% when rendering succeeds) is stuck at 1%.
Looking at the mwrender.log I see an Error 32 - Pipe Broken error. Searching the internet reveals that Error 32 can be caused by the receiving process (the part after the pipe) crashing or not responding.
From here it is hard to proceed. Where should I look for more clues? Could it be the connection to the MySQL server that dies?
The whole applicance is running on a Turnkey Linux Mediawiki VM.

I'm using PDF Export Extension and it works with more than 20 articles. Maybe try that?

I figured out the problem myself.
Mw-render spawns a parallel request for every article in a collection. This means that for a collection of 50 pages, 50 simultaneous requests are made. Apache could handle this, but not the MySQL db of MediaWiki.
You can limit the amount of threads that mw-render spawns with the --num-threads=NUM option. I couldn't find where mw-serve calls mw-render, so I just limited the maximum amount of threads (workers) Apache could spawn to 10.
mw-render automatically repeats requests for articles if the first ones fail, so this approach worked.
I rendered a PDF with 185 articles within 4 minutes, the resulting PDF had 300+ pages.

Related

Not able to generate more than 20 pages in wkhtmltopdf (approx)

I am trying to generate pdf with 30-35 pages using wkhtmltopdf, but there are blank pages after 20 pages(sometimes 21/22). To confirm this I have tried to generate same page 35 times using a loop (please note there is no error in html file).
I am using NReco.PdfGenerator (C#).
Use JavaScript profiler(chrome ) and optimize your code as much as possible if
wkhtmltopdf.exe throws timeout exception.
After doing refactoring based on above approach we also added
--no-stop-slow-scripts parameter along with 3 minutes timeout to the
wkhtmpltopdf exe. And now i can generate more than 80 pages :)

Python subprocess.call or Popen limit CPU resources

I would like to set 4 additional properties on 8000 images using the Google Earth Engine command line tool. Properties are unique per image. I was using Python 2.7 and the subprocess.call(cmd) or subprocess.check_output(cmd) methods. Both are very slow (takes 9 seconds per image i.e. 20 hours total). Therefore I tried to send the commands without waiting for a response using supprocess.Popen(). It causes my PC to crash due to the amount of tasks (CPU close to 100%)
I was looking for ways to have my PC use say 80% of its CPU or even better, scale down CPU use if I am using other things. I found the os.nice() and nice argument for subprocess.Popen(["nice",20]) but struggle to use them in my code.
this is an example of a command I send using the subprocess method:
earthengine asset set -p parameter=DomWN_month users/rutgerhofste/PCRGlobWB20V04/demand/global_historical_PDomWN_month/global_historical_PDomWN_month_millionm3_5min_1960_2014I000Y1960M01

wkhtmltopdf runtime for many pdf-creations

I am using wkhtmltopdf on my ubuntu server to generate pdfs out of html-templates.
wkhtmltopdf is therefore started from a php-script with shell_exec.
My problem is, that I want to create up to 200 pdfs at (almost) the same time, which makes the runtime of wkhtmltopdf kind of stack for every pdf. One file needs 0.6 seconds, 15 files need 9 seconds.
My idea was to start wkhtmltopdf in a screen-session to decrease the runtime, but I can't make it work from php plus this might not make that much sense, because I want to additionally summarize all pdfs in one after creation, so I would have to check if every session is terminated?!
Do you have any ideas how I can decrease the runtime for this amount of pdfs or can you give me advice how to realize this correctly and smart with screen?
My script looks like the following:
loop up to 200times {
- get data for html-template from database
- fill template-string and write .html-file
- create pdf out of html-template via shell_exec("wkhtmltopdf....")
- delete template-file
}
merge all generated pdfs together to one and send it via mail
Thank you in advance and sorry for my bad english.
best wishes
Just create a single large HTML file and convert it in one pass instead of merging multiple PDFs afterwards.

php (and codeigniter) upload size limited to 1MB

I have a function (I'm using CodeIgniter) that uploads a file, resizes it, and saves details into a database.
I have no problems in uploading images up to 1MB, so I know that permissions work ok.
However, as soon as I try to upload something above 1MB, the function becomes really slow, and after a while I'm presented with a blank page.
These are the main values in the php ini file:
post_max_size: 32M
max_input_time: 60
max_execution_time: 30
file_uploads: 1
upload_max_filesize: 32M
According to this I should have plenty of time and megabytes to upload the file successfully.
What else this could depend on?
UPDATE (following Mike's and Minboost questions below)
a. logs are clean, no sign of problems there and actually the log shows that the page has been processed on 0.03 seconds!
b. Memory_limit is 96 MB
c. I'm not applying XSS filters on this
...any additional ideas?
the thing i don't understand is that it takes a very long time to upload a file even on my Mac (localhost); i've managed to upload a 2.7mb picture, however i had to wait there for a few minutes. there seem to be a step change (for the worse) above the 500KB threshold. Upload is smooth and fast below that, and becomes very slow above it..
It could also depend on memory_limit.
Are you checking error_logs? What are the errors returned? Make sure you're not XSS filtering the upload file form field. Also I've had to try this before:
set the max_allowed_packets higher in /etc/my.cnf and restart MySQL.

Sencha Touch - big XML file issue

I am reading the content out from a xml file over the internet!
The file contains about 10000 xml-elements and is loaded into a list (one picture and headline for each element)!
This slows down the app extremly!
Is there a way to speed this up?
Maybe with a select-command?
Are there some examples or tutorials out there?
You are out of luck for a easy-straight forward answer.
If you control the server that the XML file is coming from, you should make the changes on it to support pagination of the results instead of sending the complete document.
If you don't control the server, you could set up one to proxy the results and do the pagination for the application on the server side.
The last option is the process the file in chunks. This would mean, processing sub-strings of the text. Just take a sub-string of the first x characters, parse it and then do something with the results. If you needed more you would process the next x characters. This could get very messy fast (as XML doesn't really parse nicely in this manner) and just downloading a document with 10k elements and loading it into memory is probably going to be taxing/slow/expensive (if downloading over a 3G connection) for mobile devices.

Resources