HTTP streaming on Heroku (upload lots of data) - ruby-on-rails-3.1

I have one app hosted on Heroku and this app saving lots of data information to database (it takes about 70 seconds).
Heroku after 30 seconds period of every request display the error page H12 about timeout, how could I display some info-message while the upload is in progress instead of displaying H12 error?
I have been looking for some example of this, but I wasn't much successful... I just found some notes, that I have to send every time (eg. 15 seconds) some control string from server, but I already didn't find some specific example how to do that...
Any advices how to do that?
Thank in advance.

It is a poor practice to have your users wait for 70 seconds for a request to complete on any platform. Heroku just enforces this best practice by implementing the 30-second timeout. So the real question is how to better architect the application.
Heroku has an article on implementing background workers which are designed to solve this very problem: https://devcenter.heroku.com/articles/queueing
The basic approach is to have the web request schedule a background job (using Delayed Job, Queue Classic, Resque etc...) and immediately respond to the user with some indicator of progress. Then a dyno running a background worker does the heavy lifting of saving the info to the db. When it's done it flips some flag in a db or other storage mechanism which notifies to the web client that the job is now complete.
Running a background worker does require another dyno. If you're looking to avoid that expense you can look into Girl Friday which many report having success with.
Hope that helps.

Related

What's the recommended way of dealing with Heroku's dyno restarts on a worker?

I'm familiar with Heroku's policy on dyno restarts (once every 24 hours, if underlying hardware fails, etc).
I've also looked elsewhere on Stackoverflow and found questions like
Controlling Heroku's random dyno restarts.
Our apps are good from a web perspective – sessions are handled via external database and multiple dynos balance the load perfectly. Restarts aren't an issue there. The issue is workers. For example, our worker receives a message and begins processing a job. It's 99% done, awaiting some final asynchronous request to return, and it receives SIGTERM. Before it can even clean up the job, the process is killed. The code can handle local cleanup for a job that needs to be restarted, but external services can't really be part of the transaction.
For example, if a report was built and an email was sent, but some 3rd async operation didn't complete before SIGTERM, I can't really roll back that transaction. For hardware failures or other rare events, it's understandable that a multi-step transaction could get truncated, but with Heroku's policy it seems that I need to assume this will happen at least once a day. Can anyone help me get a better grip on this problem?

Heroku Webhook Fails Sometimes

I'm creating a chatbot with dialogflow and I have a webhook hosted on heroku (so that I can use python scripts). The webhook works fine most of the time. However when I haven't used it in a while it will always fail on the first use with a request timeout. Has anyone else come across this issue? Is there a way to wake up the webserver before running the script I have written?
Heroku's free dynos will sleep after 30 minutes of inactivity.
Preventing them from sleeping is easy. You need to use any of their paid plans.
See https://www.heroku.com/pricing
Once you use a Hobby dyno, your app will never sleep anymore and you shouldn't be getting request timeouts.
Alternatively, you can also benchmark what's taking a long time to boot your app. With a faster boot time, the first request would be slow but wouldn't get a timeout.
Heroku times out requests after 30 seconds.

Possible explanation of sudden spike in memcached get

Newbie in Newrelic here. I have an API service hosted on Heroku and being monitored at Newrelic.
While I was studying how to use newrelic. I found out my 2 workers are being underutilised with very low RPM and low transaction time. So I decided to cut down to one worker which saves me $36 a month. =]
Shortly after that I received tonnes of logEntries emails stating request timeouts of one of my web dynos. Looking into Newrelic. I found out that one of my actions are being called suspciously high number of times for 2-3 minutes.
The action being V1::CarsController#Index, which basically shows a collection of cars.
While I was not sure whether the deletion of one worker dyno has caused memcached to do something, I also suspect that may be someone is trying scrap the data off the database. I am not too sure how to further investigate into the issue. I wonder if I can track down the request IP and see it is the same? or how can I further investigate?
If further information is needed I am happy to provide in Edits!
Thanks

Heroku Scheduler for recurring tasks and Delayed Jobs for asynchronous tasks, all using one web dyno

Very similar question to Is it feasible to run multiple processeses on a Heroku dyno?, or Running Heroku background tasks with only 1 web dyno and 0 worker dynos except I'm talking about a Ruby on Rails app.
Context:
I understand that it's encouraged to separate worker and web dynos... but I'm still testing and don't want to pay the expense. Especially because with my app, all the web requests pretty much happen either in the AM or in the PM, and during the whole middle of the day (and also middle of the night), literally nothing is happening.
I'd like the web dyno to run two types of background processing on the "downtime":
A recurring, long-running task every day (mailings)
An asynchronous, long-running task that is triggered when a user performs a certain action (it's a mailer)
I've done quite a bit of reading on this, but this is my first time doing anything asynchronous, so I wanted to ask the community a couple of questions just to ensure what I'm trying to do is feasible.
Questions
How do I do activity #1 for free?
To put it bluntly... considering my context above, if I use Heroku's Scheduler add-on, this runs a one-off dyno which I'll be charged for since I use NewRelic now to constantly ping my web dyno so it never actually sleeps meaning my one web dyno is my free dyno. Is there another way of doing this with the one web dyno that, in the middle of the night, won't be processing any requests? Alternatively, is there a way to tell New Relic to ping except at certain times, which will also then allow me to spin up a one-off dyno but still be within my free dyno hours?
For activity #2, I'm thinking of using Delayed Jobs, but how do I tell Delayed Jobs to delay until end of user 1's session, and then run mailer for user 1, but then pause again if user 2 sends a request, and then when user 2 is finished, start where left off on user 1's mailer, and then do user 2's mailer... and so forth? I think the root of the challenge here is that from what I've read, Delayed Jobs needs to be started with a script. But I'm not going to be at my computer starting a script all the time. How do I make the start (and the queue as illustrated in the question) something that happens automatically?
Would love even just point me directional pointers on what methods/ what considerations, etc.
I'm going to check out a nifty gem https://github.com/brandonhilkert/sucker_punch to do this. According to the author, it was written specifically to use Heroku's single dyno for hobby websites that have no need to spin up another dyno. It basically creates another thread.
FYI also, there is an add-on link that allows sucker_punch to do recurring tasks, called https://github.com/facto/fist_of_fury

Heroku: web dyno vs. worker dyno? How many/what ratio do I need?

I was curious as to what the difference between web and worker dynos is on Heroku. They give a one sentence explanation on their pricing page, but this just left me confused. How do I know how many to pick of each? Is there a ratio I should aim for? I'm pretty new to this stuff, so can someone give an in depth explanation, or maybe some sort of way I can calculate how many and which kind of dynos I would need?
Also, I'm confused about what they mean by the amount of hours for each dyno.
http://www.heroku.com/pricing
I also happened upon this article. As one of their suggested solutions, they said to increase the amount of dynos. Which type of dyno are they referring to here?
http://devcenter.heroku.com/articles/backlog-too-deep
Your best indication if you need more dynos (aka processes on Cedar) is your heroku logs. Make sure you upgrade to expanded logging (it's free) so that you can tail your log.
You are looking for the heroku.router entries and the value you are most interested is the queue value - if this is constantly more than 0 then it's a good sign you need to add more dynos. Essentially this means than there are more requests coming in than your process can handle so they are being queued. If they are queued too long without returning any data they will be timed out.
There's no ideal ratio I'm afraid, you could have an app doing 100 requests a second needing many web processes but just doesn't make use of workers. You only need worker processes if you are doing processing in the background like sending emails etc etc.
ps Backlog too deep would be a Dyno web process that would cause it.
UPDATE: On March 26 2013 Heroku removed queue and wait fields from the log out put.
queue and wait fields have been removed from router log messages.
Also, the Heroku router no longer sets X-Heroku-Dynos-In-Use,
X-Heroku-Queue-Depth and X-Heroku-Queue-Wait-Time HTTP headers for
incoming requests.
Dynos are basically processes that run on your instance. With the new Cedar stack, they can be set up to execute any arbitrary shell command. For web applications, you generally have one process called "web" which is responsible for responding to HTTP requests from users. All other processes are what were previously called "workers." These run continuously in the background for things like cron, processing queues, and any heavy computation that you don't want to tie up your web processes. You can also scale each type of process, so that multiple processes of each type will be booted up for additional concurrency. The amount of each that you use really depends on the needs of your application and the load it receives. You can use tools like the New Relic plugin to monitor these things. Take a look at the articles on the Process Model and the Procfile in Heroku's dev center for more details.
A number of people have mentioned that there is no known ratio and that the ratio of web workers to 'background' workers you will want is dependent on how you designed your application - that is correct. However I thought it might be useful to add that as a general rule of thumb, you want your web workers - and thus the controller actions they are serving - to be lightning quick and very lightweight, to reduce latency in response times from browser actions. If there is some browser action that would require more than, say, about a half a second of real time to serve, then you will probably want to architect some sort of system that pushes the bulk of that action on to a queue.
You would then design an offline worker dyno(s) that will service this queue. They can take much longer because there are no HTTP responses pending on their output. Perhaps the page you rendered from the initial browser request that pushed the action will serve some Javascript that starts a thread that checks to see if the request has finished every 5 seconds, or something along those lines.
I still can't give you a ratio to work with for the same reason others have given, but hopefully this helps you decide how to architect your app. (I should also mention this is just one design out of many valid ones.)
https://stackoverflow.com/a/19965981/1233555 - Heroku has gone to random routing, so some dynos can have queues stacking up (while they serve a lengthy request) while other dynos are free. Avoid this by making sure that all requests are handled very quickly in your web dynos. This will reduce the number of web dynos you need, while requiring more worker dynos.
You also need to care about your web app supporting concurrency, which only some Rails configs do - try Unicorn, or carefully-written code (for I/O that doesn't block the EventMachine) with Thin.
You probably have to try, rather than calculate, to see how many dynos of each kind you need. Make sure their New Relic reports the dyno queue - see the above link.
Short answer is that you need as many as you need to keep your queues down.
As John describes, if you start seeing a queue in your logs then you need more dynos. If you start seeing your background queues getting too long (how you get this info is dependant on what you have implemented) then you need more workers.
There is no ratio as it's very much dependent on your application design and usage.

Resources