I am stucked with pdf generation in a Ruby on Rails 6.0 (Ruby v2.7.1) website on HEROKU.
The goal: generate on the run a pdf photo gallery including a list of pictures. The gallery is from an external service and it is imported by API. The PDF should include 1 or 2 images per page and may be very long (up to 30/40pages). There should be multiple users to serve with the pdfs, and the request could be more than 10 per minute. This is fully functioning on other websites.
What I have tried: I tried several ways to generate PDFs with Rails, using gems like wicked_pdf, pdf_kit (both based on wkhtmltopdf) or grover (based on puppeteer).
When I'm in localhost I can download the pdf with good style, it is very slow, but I get them, but i got very big issues in production.
the issues:
In production environment (Heroku) my slug size is Enormous (approx 400 Megabyte) due to wkhtmltopdf or puppeteer that occupies approx 250Mb. This seems to heavily impact on memory usage of the server.
The request to create PDFs are really slow, more than 20 sec and it often goes to timeout.
After the same request I see an big increase of memory usage. I expect to go out of memory after a few requests.
I got the same issues even I create smaller pdfs, of only a few pages
I've tried several versions of standard code provided by the docs of and all of them generates the pdf, but the performance issues are blocking the usage of them in production. What should be useful is to have some guidelines to understand how to proceed.
my questions:
The usage of a background job may solve timeout issues? but I expect that cannot solve the very long creation time of the pdfs.
Is it a good idea to use more workers or jobs on Heroku? do this may increase the performance of PDF creation?
Any suggestion on other ways to proceed or using lighter libraries or services?
I can think to generate only one time the pdf and save it on S3, but the data is created on another server and I get that through API and I cannot check for any modification of that
I got the information from the old developers of the same website that the same exact data I need was served in a few seconds using the chain XML - XSLFO - PDF through FOP on .net and apache, totally incompatible with Rails and Heroku.
here below I'm posting one version of my code to generate the pdf with wicked_pdf gem, but it is something that I clearly have to update.
def book_pdf
# code to generate the picture list and title of the gallery #
respond_to do |format|
format.html
format.pdf do
render pdf: #model_name.parameterize,
orientation: "Landscape",
page_size: 'A4',
show_as_html: false,
disposition: 'attachment',
header: { :html => { :template => 'pdf/book_header.pdf.erb' } },
footer: { :html => { :template => 'pdf/book_footer.pdf.erb' } },
quality: 50,
zoom: Rails.env.production? ? 0.81 : 1.00,
layout: "pdf.html"
end
end
end
WickedPdf.config = {
layout: 'pdf.html.erb',
print_media_type: true,
page_size: 'A4',
encoding: 'utf-8',
}
If the same pdf is going to be served to several people, and the pdf itself won't change often, it might be better to generate it once and store it in S3, with a DB record in your application having the URL stored with some identifying parameters
If several people are asking for the same pdf (no data change) and it is in S3 already (you can identify from your DB record), you can just serve it without fresh generation
Moving PDF generation to a background worker in sidekiq will really free up the web-application for actual http requests and prevent your current timeout issues
Having more workers might improve performance for concurrently occurring pdf requests, but the time taken for each pdf generation (within a worker) will not improve
Since you say the pdf is only images, and you don't know when the other server has made changes, maybe you can have a polling job in the background trying to find out when the data has changed to proactively generate a new pdf and store it on S3 even before someone asks for it.
When the pdf is generating in the background, if the DB record has the identifying tags for that pdf and some people are asking for it (http requests) you can implement some sort of polling or websocket flow where the user’s browser constantly asks and waits for the server to say that the pdf is ready.
Related
I am new to web application world. I have made an app and ready to deploy on Heroku. I have tested and it is working.
Problem is that I have static 62 text files (each 1KB, so total 62 KB size) which I feed data (numbers) from the files and display on web page.
I am using java for back-end / js, JQuery, CSS, ajax, etc for front-end.
Application is working fine and I have tested for the deployment on Heroku.
Originally, I used external sever for the data and fetch the data from the server and display it on my web application.
However, the sever connection wants to keep the confidentiality, so I was recommended to use static files.
The files only include numbers. ex) 356, 200, 100, 280,....etc
So here are my questions.
1) How can I upload 62 text files (each 1KB) on Heroku?
(I don't know about django, Ruby on rails.. what so ever???? All answers are about those. but I do NOT think I am USING those. but I don't know, I am very newbie.)
2) when my app on Heroku try to use data from the text files on Heroku (after I figure out how to upload), How my connection URL can be in java code to fetch the data from those text files (I mean usually there are url address I am giving )
Current to code to fetch data from the server.. I am doing like this
String addressURL = "http://url_address_bla:xxxx/data/bla";
URL url = new URL(addressURL);
//then I open the file and read the file..etc
Thanks in advance.
My website is apparently becoming very slow to access with only around 20 people is currently downloading from my website. My website provide large file around 100MB to download, so these 20 people is currently downloading from it.
When enough people is downloading, the website becomes inaccessible, basically when I try to browse the page, it loads for a long time before appearing. I do below
FileLocation fl = _store.GetFileLocation(id);
if (fl == null) return;
if (System.IO.File.Exists(fl.Location) == false) return;
RangeDownloader rd = new RangeDownloader(fl.Location, new FileInfo(fl.Location));
rd.ProcessRequest(System.Web.HttpContext.Current);
The RangeDownloader class is basically a class to allow range download (multi section/multi part).
Am I doing anything wrong here? Must I use separate thread for providing download, meaning that I use ThreadStart? Isn't each request to the website is a separate process? Do I really need to use threading? Is 20 person downloading from my shared hosting plan considered as heavy?
I found the answer, FIleContent does not slow down the ASP.NET pool however it consumes process, when too many user hit the same file, the server may be slowed down, this also happen in other web server.
I've Rails 3.1 application which generates some images in 'public/scene/ticket_123/*.png' on fly. It works normally in development mode, but in production all assets should be precompiled. So I can't use files that I've generated after application started.
Setting config.assets.compile = true hasn't solve my problem. Situation is only worse since ticket number changes - so images are in different directories which are continiously created on fly too.
How should I setup assets to be able to show images that're created after an application was started?
I had the same problem. I only found a work around by copying all my images into "public/images" and changed all the links to the new path.
That worked for me for the moment. I wait until somebody comes up with a better idea.
I hope that helps.
If found solution.
# In view I wrote
<img src=<%= mycontroller_image_get_path :filename=>file_name %> >
# In controller I created GET action
def image_get
send_file params[:filename], :disposition => 'inline', :type => 'image/png'
end
But you should care that file you're trying to send is in "#{Rails.root}/public" directory otherwise send_file says it can't found the file. (May be it is not necessary in /public but in Rails.root anyway). To change this behavior it can be useful to read this topic Can I use send_file to send a file on a drive other than the Rails.root drive?
My Rails 3.1 app is using PDFkit to render specific pages, and I'm running into (what seems like) a common problem with where trying to generate the pdf is causing the process to hang.
I found this solution here on stackoverflow: rails 3 and PDFkit. Where I add a config.threadsafe! entry in my development.rb file and this works BUT it requires that for every change anywhere in the app I have to stop and restart my server to see my changes. NOT acceptable from a workflow - I'm currently setting up the styling for the PDF pages, and it's painfully slow process having to do this.
I also found the same issue reported here: https://github.com/jdpace/PDFKit/issues/110, and the issue points to this workaround: http://jguimont.com/post/2627758108/pdfkit-and-its-middleware-on-heroku.
ActionController::Base.asset_host = Proc.new { |source, request|
if request.env["REQUEST_PATH"].include? ".pdf"
"file://#{Rails.root.join('public')}"
else
"#{request.protocol}#{request.host_with_port}"
end
}
This removes the need to restart the change, BUT now when I load the pdf it's without the styles rendered from the asset pipeline because it's taking the assets from the public directory. I think I could work with this solution if I could know how to create the stylesheets for the pdf templates in the public folder. IS anyone developing with PDFKit and Rails3.1 where this is all working in sync?
Any help would be greatly appreciated!
Thanks!
Tony
Here is the setup I am using:
I run a second instance of rails server with rails server -p 3001 -e test which will handle my assets for the PDF. The server will print the assets requests as they come in, so I can check that everything works as expected.
I use the following asset_host in my config/environments/development file:
config.action_controller.asset_host = ->(source, request = nil){
"http://localhost:3001" if request && request.env['REQUEST_PATH'].include?(".pdf")
}
If you are using Pow, you can use multiple workers. Add this to your ~/.powconfig
export POW_WORKERS=3
(taken from Pow manual)
There's a problem with pdfkit in Rails 3.1. See my answer to this related question:
pdfkit does not style pdfs
currently I am running a Rails app on Heroku, and everything is working great with exception of generating PDF documents that sometimes contain thousands of records. Heroku has a built-in timeout of 30 seconds, so if the request takes more than 30 seconds, it's abandoned.
That's fine, since they offer delayed_job support built-in. However, all of the PDF's i generate follow a typical restful pattern. For instance, a request to "/posts.pdf" generates a pdf (using PRAWN and PRAWNTO) and it's delivered to the browser.
So my basic question is, how do I create dynamically generated PDF's with delayed_job while maintaining the basic RESTful patterns Rail's so conveniently provides. Thanks.
You could do something like:
Send a request to generate the pdf: POST /generate_new_pdf
Have that action return the ID of the new pdf before it's created
Poll the endpoint for that resource ID until it's done (returning 202's in the meantime): GET /pdfs/:id