Laravel 5.2 - Creating a Job API - laravel

So in my API there are a few places where it is running a process / report that is either hitting a timeout or simply just taking WAY too long. I'd like to defer these jobs off to a queue and instead return a key in my response. The front end would then ping a service using that key to determine the status of its particular job in the queue. This way we don't have hanging ajax calls for 2 - 3 minutes. Maybe I could even create a queue viewer that would allow you to review the jobs in it and even cancel some etc.
Does Laravel have something built in or is there a package for this already? Are there other better options for dealing with this kind of issue?

this is what you are lokking for laravels queues

I don't believe this existed when I first posted this quesiton. However, Laravel now has this built for it: https://laravel.com/docs/5.6/horizon which is everything I was looking for.

Related

Laravel - Efficiently consuming large external API into database

I'm attempting to consume the Paypal API transaction endpoint.
I want to grab ALL transactions for a given account. This number could potentially be in the 10's of millions of transactions. For each of these transactions, I need to store it in the database for processing by a queued job. I've been trying to figure out the best way to pull this many records with Laravel. Paypal has a max request items limit of 20 per page.
I initially started off with the idea of creating a job when a user gives me their API credentials that gets the first 20 items and processes them, then dispatches a job from the first job that contains the starting index to use. This would loop forever until it errored out. This doesn't seem to be working well though as it causes a gateway timeout on saving those API credentials and the request to the API eventually times out (before getting all transactions). I should also mention that the total number of transactions is unknown, so chaining doesn't seem to be the answer as there is no way to know how many jobs to dispatch...
Thoughts? Is getting API data best suited for a job?
Yes job is way to go . I’m not familiar with paypal api but it’s seems requests are rate limited paypal rate limiting.. you might want to delay your api requests a bit.. also you can make a class to monitor your api requests consumption by tracking the latest requests you made and in the job you can determine when to fire the next request and record it in the database...
My humble advise
please don’t pull all the data your database will get bloated quickly and you’ll need to scale each time you have a new account it’s not easy task.
You could dispatch the same job at the end of the first job which queries your current database to find the starting index of the transactions for that job.
So even if your job errors out, you could dispatch it again, then it will resume from where it was ended previously
May be you will need link your app with another data engine like AWS, anyway I think the best idea is creating an APi, pull only the most important data, indexed, and keep the all big data in another endpoint, where you can reach them if you need

How do I check to see if a job is in the Laravel queue?

Here's the situation:
I have a Laravel 4.2 application that retrieves (from a third party API) an asset. This is a long-lived asset (it only changes once every 12-24 hours) and is kind of time consuming (a large image file). I do cache the asset, so the impact has been more or less minimized, but there's still the case where the first person who logs in to my application in the morning has to wait while the application loads the asset for the first time.
I have set up a job which will be queued up and will run every eight hours. This ought to ensure that the asset in the cache is always fresh. It works by re-enqueueing the job for eight hours later after it runs.
The problem is this: I'm about to deploy this job system to production & I'm not sure how to start this thing running for the first time.
Ideally, I'd like to have an administration option where I have a button which says "Click here to submit the job", but I'd like to make it as foolproof as possible & prevent people (I'm not the only administrator) from submitting the job multiple times. To do this, however, the application would need to check & see if the job is already in the queue. I can't find a way to do that in an implementation-independent way (I'm using redis, but that may change in the future).
Another option would be to add an artisan command to run the initial process. That way I could deploy the application, run an artisan command, and forget about it.
So, to recap, I have two questions:
Is there a way to check a queue to see what jobs are in there?
Is there a better way to do this?
Thanks
When a job is in laravel queue, it will be saved in jobs table, so you can check by DB.
If it's guaranteed to be the only thing ever in the queue, you could use something like:
if (Queue::size() === 0) {
Queue::push(...);
}
You would need to run the php artisanqueue:listen in the terminal.
Here is the complete documentation if you want to learn more about:
https://laravel.com/docs/5.2/queues#running-the-queue-listener
You can use the Laravel Telescope package.
Laravel Telescope is an elegant debug assistant for the Laravel framework. Telescope provides insight into the requests coming into your application, exceptions, log entries, database queries, queued jobs, mail, notifications, cache operations, scheduled tasks, variable dumps and more. Telescope makes a wonderful companion to your local Laravel development environment.
(Source: https://laravel.com/docs/7.x/telescope)

Sending Email from Django at Heroku and not having idle workers

I have a django application in heroku and one thing I need to do sometimes that take a little bit of time is sending emails.
This is a typical use case of using workers. Heroku offers support for workers, but I have to leave them running all the time (or start and stop them manually), which is annoying.
I would like to use a one-off process to send every email. One possibility I first thought of was using IronWorker, since I thought that I could simply add the job to ironworker's queue and it would be exectuted with a mex of 15 min delay, which is ok for me.
The problem is that with ironworker, I need to put in a zip file all the modules and their dependencies in order to run the job, so in my email use case, as I use "EmailMultiAlternatives" from "django.core.mail.message", I would need to include all the django framework in my zip file in order to be able to use it.
According to this link, it's possible to add/remove workers from the app. Is it possible to start one-off processes from the app?
Does anyone has a better solution?
Thanks in advance

tweepy Streaming API integration with Django

I am trying to create a Django webapp that utilizes the Twitter Streaming API via the tweepy.Stream() function. I am having a difficult time conceptualizing the proper implementation.
The simplest functionality I would like to have is to count the number of tweets containing a hashtag in real time. So I would open a stream, filtering by keywords, every time a new tweet comes over the connection i increment a counter. That counter is then displayed on a webpage and updated with AJAX or otherwise.
The problem is that the tweepy.Stream() function must be continuously running and connected to twitter (thats the point). How can I have this stream running in the background of a Django app while incrementing counters that can be displayed in (near) real time?
Thanks in advance!
There are various ways to do this, but using a messaging lib (celery) will probably be the easiest.
1) Keep a python process running tweepy. Once an interesting message is found, create a new celery task
2) Inside this carrot task persist the data to the database (the counter, the tweets, whatever). This task can well run django code (e.g the ORM).
3) Have a regular django app displaying the results your task has persisted.
As a precaution, it's probably a good ideal to run the tweepy process under supervision (supervisord might suit your needs). If anything goes wrong with it, it can be restarted automatically.

async execution of tasks for a web application

A web application I am developing needs to perform tasks that are too long to be executed during the http request/response cycle. Typically, the user will perform the request, the server will take this request and, among other things, run some scripts to generate data (for example, render images with povray).
Of course, these tasks can take a long time, so the server should not hang for the scripts to complete execution before sending the response to the client. I therefore need to perform the execution of the scripts async, and give the client a "the resource is here, but not ready" and probably tell it a ajax endpoint to poll, so it can retrieve and display the resource when ready.
Now, my question is not relative to the design (although I would very much enjoy any hints on this regard as well). My question is: does a system to solve this issue already exists, so I do not reinvent the square wheel ? If I had to, I would use a process queue manager to submit the task and put a HTTP endpoint to shoot out the status, something like "pending", "aborted", "completed" to the ajax client, but if something similar already exists specifically for this task, I would mostly enjoy it.
I am working in python+django.
Edit: Please note that the main issue here is not how the server and the client must negotiate and exchange information about the status of the task.
The issue is how the server handles the submission and enqueue of very long tasks. In other words, I need a better system than having my server submit scripts on LSF. Not that it would not work, but I think it's a bit too much...
Edit 2: I added a bounty to see if I can get some other answer. I checked pyprocessing, but I cannot perform submission of a job and reconnect to the queue at a later stage.
You should avoid re-inventing the wheel here.
Check out gearman. It has libraries in a lot of languages (including python) and is fairly popular. Not sure if anyone has any out of the box ways to easily connect up django to gearman and ajax calls, but it shouldn't be do complicated to do that part yourself.
The basic idea is that you run the gearman job server (or multiple job servers), have your web request queue up a job (like 'resize_photo') with some arguments (like '{photo_id: 1234}'). You queue this as a background task. You get a handle back. Your ajax request is then going to poll on that handle value until it's marked as complete.
Then you have a worker (or probably many) that is a separate python process connect up to this job server and registers itself for 'resize_photo' jobs, does the work and then marks it as complete.
I also found this blog post that does a pretty good job summarizing it's usage.
You can try two approachs:
To call webserver every n interval and inform a job id; server processes and return some information about current execution of that task
To implement a long running page, sending data every n interval; for client, that HTTP request will "always" be "loading" and it needs to collect new information every time a new data piece is received.
About second option, you can to learn more by reading about Comet; Using ASP.NET, you can do something similiar by implementing System.Web.IHttpAsyncHandler interface.
I don't know of a system that does it, but it would be fairly easy to implement one's own system:
create a database table with jobid, jobparameters, jobresult
jobresult is a string that will hold a pickle of the result
jobparameters is a pickled list of input arguments
when the server starts working on a job, it creates a new row in the table, and spwans a new process to handle that, passing that process the jobid
the task handler process updates the jobresult in the table when it has finished
a webpage (xmlrpc or whatever you are using) contains a method 'getResult(jobid)' that will check the table for a jobresult
if it finds a result, it returns the result, and deletes the row from the table
otherwise it returns an empty list, or None, or your preferred return value to signal that the job is not finished yet
There are a few edge-cases to take care of so an existing framework would clearly be better as you say.
At first You need some separate "worker" service, which will be started separately at powerup and communicated with http-request handlers via some local IPC like UNIX-socket(fast) or database(simple).
During handling request cgi ask from worker state or other data and replay to client.
You can signal that a resource is being "worked on" by replying with a 202 HTTP code: the Client side will have to retry later to get the completed resource. Depending on the case, you might have to issue a "request id" in order to match a request with a response.
Alternatively, you could have a look at existing COMET libraries which might fill your needs more "out of the box". I am not sure if there are any that match your current Django design though.
Probably not a great answer for the python/django solution you are working with, but we use Microsoft Message Queue for things just like this. It basically runs like this
Website updates a database row somewhere with a "Processing" status
Website sends a message to the MSMQ (this is a non blocking call so it returns control back to the website right away)
Windows service (could be any program really) is "watching" the MSMQ and gets the message
Windows service updates the database row with a "Finished" status.
That's the gist of it anyways. It's been quite reliable for us and really straight forward to scale and manage.
-al
Another good option for python and django is Celery.
And if you think that Celery is too heavy for your needs then you might want to look at simple distributed taskqueue.

Resources