tweepy Streaming API integration with Django - ajax

I am trying to create a Django webapp that utilizes the Twitter Streaming API via the tweepy.Stream() function. I am having a difficult time conceptualizing the proper implementation.
The simplest functionality I would like to have is to count the number of tweets containing a hashtag in real time. So I would open a stream, filtering by keywords, every time a new tweet comes over the connection i increment a counter. That counter is then displayed on a webpage and updated with AJAX or otherwise.
The problem is that the tweepy.Stream() function must be continuously running and connected to twitter (thats the point). How can I have this stream running in the background of a Django app while incrementing counters that can be displayed in (near) real time?
Thanks in advance!

There are various ways to do this, but using a messaging lib (celery) will probably be the easiest.
1) Keep a python process running tweepy. Once an interesting message is found, create a new celery task
2) Inside this carrot task persist the data to the database (the counter, the tweets, whatever). This task can well run django code (e.g the ORM).
3) Have a regular django app displaying the results your task has persisted.
As a precaution, it's probably a good ideal to run the tweepy process under supervision (supervisord might suit your needs). If anything goes wrong with it, it can be restarted automatically.

Related

Streamlink with Youtube plugin: How to get scheduledStartTime value?

I am using Streamlink to help some of my technically challenged older friends watch streams from selected sites that webcast LIVE two or thee times a week and are ingested into Youtube. In between webcasts, it would nice to show the User when the next one will begin via the apps Status page.
The platform is Raspberry Pi 3 B+. I have modified the Youtube plugin to allow/prohibit non-live streams. If '--youtube-live-required' is in the command line, then only LIVE streams will play. This prevents the LIVE webcast from re-starting after it has ended, and also prevents videos that Youtube randomly selects, from playing. I have also applied a 'soon to be released' patch that fixes a breaking-change that Youtube made recently. I mention these so you know that I have at least a minimal understanding of the Streamlink code, and am not looking for a totally free ride. But for some reason, I cannot get my head around how to add a feature to get the 'scheduledStartTime' value from the Youtube.py plugin. I am hoping someone with a deep understanding of the Streamlink code can toss me a clue or two.
Once the 'scheduledStartTime' value is obtained (it is in epoch notation), a custom module will send that value to the onboard Python server, via socketio, which can then massage the data and push it to the Status page of connected clients.
Within an infinite loop, Popen starts Streamlink. The output of Popen is PIPEd and observed in order to learn what is happening, and then sends that info to the Server, again using socketio. It is within this loop that the 'scheduledStartTime' data would be gleaned (I think).
How do I solve the problem?
I have a solution to this problem that I am not very proud of, but it solves the problem and I can close this project, finally. Also, it turns out that this solution did not have to utilize the streamlink youtube.py plugin, but since it fetches the contents of the URL of interest anyways, I decided to hack the plugin and keep all of this business in one place.
In a nutshell, a simple regex gets the value of scheduledStartTime IF it is present in the fetched URL contents. The Hack: That value is printed out as a string 'SCHEDULE START TIME:epoch time value', which surfaces through streamlink via Popen PIPE which is polled for such information, in a custom module. Socket.io then sends the info to the on-board server, that sends a massaged version of the info to the app's Status Page (Ionic framework, typescript, etc). Works. Simple. Ugly. Done.

How to allow sinatra poll for data smartly

I am wanting to design an application where the back end is constantly polling different sensors while the front end (sinatra) allows for this data to be viewed either via json api, or by simply displaying the results in html.
What considerations should I take to develop such an application and how should I structure the application for best scaling and ease of maintenance.
My first thought is to simply let sinatra poll the sensors every time it receives a request to the proper end points, but this seems like it could bog down quiet fast especially seeing how that some sensors only update themselves every couple seconds.
My second thought is to have a background process (or thread) poll the sensors and store the values for sinatra. When a request is received sinatra can then simply poll the background process for a cached value (or pull it from the threaded code) and present it to the client.
I like the second thought more, but I am not sure how I would develop the "background application" so that sinatra could poll it for data to present to the client. The other option would be for sinatra to thread the sensor polling code so that it can simply grab values from it inside the same process rather than requesting it from another process.
Due note that this application will also be responsible for automation of different relays and such based off the sensors and sinatra is only responsible for relaying the status of the sensors to the user. I think separating the backend (automation + sensor information) in a background process/daemon from the frontend (sinatra) would be ideal, but I am not sure how I would fetch the data for sinatra.
Anyone have any input on how I could structure this? If possible I would also appreciate a sample application that simply displays the idea that I could adopt and modify.
Thanks
Edit::
After a bit more research I have discovered drb (distributed ruby http://ruby-doc.org/stdlib-1.9.3/libdoc/drb/rdoc/DRb.html) which allows you to make remote calls on objects over the network. This may be a suitable solution to this problem as the daemon can automate the relays, read the sensors and store the values in class objects, and then present the class objects over drb so that sinatra can call the getters on the remote object to obtain up to date data from the daemon. This is what I initially wanted to attempt to do.
What do you guys think? Is this advisable for such an application?
I have decided to go with Sinatra, DRB, and Daemons to meet the requirements I have stated above.
The web front end will run in its own process and only serve up statistical information via DRB interactions with the backend. This will allow quick response times for the clients and allow me to separate front end code from backend code.
The backend will run in its own process and constantly poll the sensors for updates and store them as class objects with getters so that Sinatra can fetch the information over DRB when required. It will also use the gathered information for automation that is project specific.
Finally the backend and frontend will be wrapped with a Daemons wrapper so that the project will have the capabilities of starting, restarting, stopping, run status, and automatic restarting of the Daemons if it crashes or quits for what ever reason.
Source information:
http://phrogz.net/drb-server-for-long-running-web-processes
http://ruby-doc.org/stdlib-1.9.3/libdoc/drb/rdoc/DRb.html
http://www.sinatrarb.com/
https://github.com/thuehlinger/daemons

Sending Email from Django at Heroku and not having idle workers

I have a django application in heroku and one thing I need to do sometimes that take a little bit of time is sending emails.
This is a typical use case of using workers. Heroku offers support for workers, but I have to leave them running all the time (or start and stop them manually), which is annoying.
I would like to use a one-off process to send every email. One possibility I first thought of was using IronWorker, since I thought that I could simply add the job to ironworker's queue and it would be exectuted with a mex of 15 min delay, which is ok for me.
The problem is that with ironworker, I need to put in a zip file all the modules and their dependencies in order to run the job, so in my email use case, as I use "EmailMultiAlternatives" from "django.core.mail.message", I would need to include all the django framework in my zip file in order to be able to use it.
According to this link, it's possible to add/remove workers from the app. Is it possible to start one-off processes from the app?
Does anyone has a better solution?
Thanks in advance

Real-time webbrowser game

I'm trying to create a real-time webbrowser game in ASP.Net MVC3, but I'm unsure about what's the best approach to processing 'real-time' events on the server side.
Imagine that the client wants to upgrade a building, upgrading a building takes time. A record gets inserted in the database that holds the end-time and on the client a ajax timer start's running. I was thinking about having a windows service running all the time. Every second the service checks the table and does the real processing when the time passed the end-time. However I could imagine that when you have a huge amount of data to process this can get problematic.
What would be the best way of doing this?
Cheers.
I've seen games like that before. You do not need to count down yourself, you just need to store the datetime for whenever the building will be ready.
When a page is requested you send a timestamp of when the building will be completed, and use javascript to count down. For example, like this.

async execution of tasks for a web application

A web application I am developing needs to perform tasks that are too long to be executed during the http request/response cycle. Typically, the user will perform the request, the server will take this request and, among other things, run some scripts to generate data (for example, render images with povray).
Of course, these tasks can take a long time, so the server should not hang for the scripts to complete execution before sending the response to the client. I therefore need to perform the execution of the scripts async, and give the client a "the resource is here, but not ready" and probably tell it a ajax endpoint to poll, so it can retrieve and display the resource when ready.
Now, my question is not relative to the design (although I would very much enjoy any hints on this regard as well). My question is: does a system to solve this issue already exists, so I do not reinvent the square wheel ? If I had to, I would use a process queue manager to submit the task and put a HTTP endpoint to shoot out the status, something like "pending", "aborted", "completed" to the ajax client, but if something similar already exists specifically for this task, I would mostly enjoy it.
I am working in python+django.
Edit: Please note that the main issue here is not how the server and the client must negotiate and exchange information about the status of the task.
The issue is how the server handles the submission and enqueue of very long tasks. In other words, I need a better system than having my server submit scripts on LSF. Not that it would not work, but I think it's a bit too much...
Edit 2: I added a bounty to see if I can get some other answer. I checked pyprocessing, but I cannot perform submission of a job and reconnect to the queue at a later stage.
You should avoid re-inventing the wheel here.
Check out gearman. It has libraries in a lot of languages (including python) and is fairly popular. Not sure if anyone has any out of the box ways to easily connect up django to gearman and ajax calls, but it shouldn't be do complicated to do that part yourself.
The basic idea is that you run the gearman job server (or multiple job servers), have your web request queue up a job (like 'resize_photo') with some arguments (like '{photo_id: 1234}'). You queue this as a background task. You get a handle back. Your ajax request is then going to poll on that handle value until it's marked as complete.
Then you have a worker (or probably many) that is a separate python process connect up to this job server and registers itself for 'resize_photo' jobs, does the work and then marks it as complete.
I also found this blog post that does a pretty good job summarizing it's usage.
You can try two approachs:
To call webserver every n interval and inform a job id; server processes and return some information about current execution of that task
To implement a long running page, sending data every n interval; for client, that HTTP request will "always" be "loading" and it needs to collect new information every time a new data piece is received.
About second option, you can to learn more by reading about Comet; Using ASP.NET, you can do something similiar by implementing System.Web.IHttpAsyncHandler interface.
I don't know of a system that does it, but it would be fairly easy to implement one's own system:
create a database table with jobid, jobparameters, jobresult
jobresult is a string that will hold a pickle of the result
jobparameters is a pickled list of input arguments
when the server starts working on a job, it creates a new row in the table, and spwans a new process to handle that, passing that process the jobid
the task handler process updates the jobresult in the table when it has finished
a webpage (xmlrpc or whatever you are using) contains a method 'getResult(jobid)' that will check the table for a jobresult
if it finds a result, it returns the result, and deletes the row from the table
otherwise it returns an empty list, or None, or your preferred return value to signal that the job is not finished yet
There are a few edge-cases to take care of so an existing framework would clearly be better as you say.
At first You need some separate "worker" service, which will be started separately at powerup and communicated with http-request handlers via some local IPC like UNIX-socket(fast) or database(simple).
During handling request cgi ask from worker state or other data and replay to client.
You can signal that a resource is being "worked on" by replying with a 202 HTTP code: the Client side will have to retry later to get the completed resource. Depending on the case, you might have to issue a "request id" in order to match a request with a response.
Alternatively, you could have a look at existing COMET libraries which might fill your needs more "out of the box". I am not sure if there are any that match your current Django design though.
Probably not a great answer for the python/django solution you are working with, but we use Microsoft Message Queue for things just like this. It basically runs like this
Website updates a database row somewhere with a "Processing" status
Website sends a message to the MSMQ (this is a non blocking call so it returns control back to the website right away)
Windows service (could be any program really) is "watching" the MSMQ and gets the message
Windows service updates the database row with a "Finished" status.
That's the gist of it anyways. It's been quite reliable for us and really straight forward to scale and manage.
-al
Another good option for python and django is Celery.
And if you think that Celery is too heavy for your needs then you might want to look at simple distributed taskqueue.

Resources