Processing a video stream using Streaming Engines - spark-streaming

I need to process two video streams, co-relate them and respond with actions in real-time. Can someone help me with the following
How can i use open Streaming Engines like Flink or Spark for the above ?
How do i achieve this without using something like an OpenCV library ? By bringing in a library in between, the overall performance gets affected.
Any other suggestions for the above ?
Thanks a lot for the help.

Related

Structured Streaming Python API

In the doc it says that Stateful Operations like mapGroupsWithState in Structured Streaming supported only in Scala and Java but I do need statful capabilities in Python. What should I do?
If you insist on using Pyspark -
Perform the preprocessing action in one spark job, then store the necessary "state" stream to a file sink. In another job, read this stream and perform the output action. There's an extra memory/disk/latency overhead involved.
Use updateStateByKey API instead. This will require DStreams approach instead of Structured Streaming.
Neither approach is great. If you need the latest and the greatest API features, I'd recommend transitioning to Scala now. As your project progresses, you will run into this problem repeatedly. Since Spark is written in Scala, the Python API always lags behind.

Laravel project + media server both for live and vod streaming deployed on Docker

After many hours of research and nothing relevant coming up I decided to ask.
I am pretty new to the concept of video streaming, so please forgive me if my questions may seem elementary.
I am building a project that needs to include media streaming functionality. It should has the following options:
VOD - user uploads a file to the server, that needs to be transcoded to few MP4 files of different resolutions. For transcoding I am trying the approach using CloudTranscode (https://github.com/bfansports/CloudTranscode) deployed as a Docker image. The server should supply stream to the player with certain buffer size, so when the playback is paused we buffer for instance +5 seconds and that's it. Adaptive bitrate would be nice, however I'm not sure how this works with different players (I was thinking about using Video.JS due to high customization option, plus it's free).
Live video capturing - user visits a certain page that captures video from the webcam and sends the stream to the server for further stream distribution to clients. For most browsers WebRTC could be a good option, but iOS devices probably won't work with it, so any suggestions here would be much appreciated
Live video streaming - users visit a certain page where they can watch the stream captured from the user mentioned in point 2. Here the stream may be watched by one or many users (may be as well 1 or 10,000 users)
Cutting to the chase my questions follow:
What would be the best media server software that I can use for that purpose, having on mind high scalability (deployed as Docker container on AWS EC2), and possible huge load of both streaming and watching users, as well as multi-device/platform/browser support?
What would be the best media player for webpage that (again) would be cross-browser/platform/device, keeping in mind good integration with media server itself for purpose of adaptive resolution streaming? Also it would be nice if the player has broad customization options in matter of appearence (for instance thumbnail display when hovering the timeline).
Do you know any better solution for video transcoding than mentioned CloudTranscode, having on mind Docker setup, and some easy to use API (here some on-the-fly transcoding would be nice, so the worker wouldn't need to wait for the whole file to be uploaded)?
What happens if I use autoscalling functionality on EC2 instance, and more instances of the media server are being automatically started? Let's say we have instance 1 (I1) and instance 2 (I2). Some user started broadcasting on I1, and 1000 users are watching the stream which is the server instance's limit because it's running out of resources. Next, another couple of users are trying to view the stream, so they are being connected to I2 by AWS load balancer - how does that work with live stream? Sorry, but I am total newbie to the concept, so again - forgive me for elementary questions.
So far a was able to find a few media servers that may be relevant to my needs including:
Wowza Media Server (paid)
Red5 media server (free)
Kurento Media Server (free)
My application is written in Laravel, ergo I need some PHP integration with the media server.
Obviously free solutions are the most welcome, however I do not mind to pay as long as paid solution covers my needs.
Any input here will be much appreaciated - even partial solutions / suggestions. I'm kinda stuck here, so any suggestions that can bring me closer to the solution are very welcome!
Best regards
If anyone needs such information I ended up using Nginx Plus media server functionalities. It's capable of serving both live and VOD streams, it has out-of-the-box load balancer to switch traffic over multiple container instances and many more great features. Plus they have images to deploy directly from AWS Marketplace, and the license is paid hourly when the EC2 instance is running. Ofcourse there is free version as well, but I am really satisfied with Nginx Plus support.
Capturing live stream from user I've done using getUserMedia() in JS. Still having minor glitches, but I will get it to work (problems are related with WebM chunks that MediaRecorder API spits out, but I'm almost done here using some Python piece of code modifying each chunk on server side).
If anyone needs help I will be happy to help.

Strategy to do performance testing of lightstreamer

I have an high transaction application which uses LightStreamer to stream data. It does this over HTTP.
I am not sure how to do performance testing of this (Strategy). Can someone please help me on this?
Googling yield some result but they are not in detail of approach and mainly gives info about one tool.
Looking at the Lightstreamer website this seems more of a data push technology, using websockets and other stack components, rather than a pure "streaming" technology for live video or audio. Am I missing something?
http://en.wikipedia.org/wiki/Lightstreamer

create video from images and then stream to users

idea is to create a video from images provided by a user and at the same time stream the generated video to other user demanding it.
kindly tell any efficient way to do this and which language out of PHP and C# .net will be suitable.
have looked into ffmpeg to take images and convert to video and save to server and then stream .. kindly tell if this the possibility or any other method for live streaming.
regards
UPATE
consider the following scenario as I understand:
get images from server and start combining them to form a video. at the same time, stream the video to the users requesting it.. for new coming clients, stream the previously generated video from the begining and keep on sending the new video which is being generated from images to the previous clients.
kindly tell if this is possible, if so then what can be the approach. Have read something about pipes but am completely new to ffmpeg and streaming in general.
Yes, this is possible with ffmpeg. Any language that is turing complete is suitable. They are many methods of live streaming including HLS, RTP, RTMP, etc.
If you need more detailed answers. Please ask more detailed questions.

which tech available for stream data from social media to hadoop?

i am searching for technologies that i can use in order to stream data from social media
to hadoop.
i searched and found those tech
Flume.
Storm.
Kafka.
which tool is the best? and why? does anyone familiar with some other tools ?
Most likely, you will want to use Flume as it is built to work with hdfs. However, as with all things, it depends.
Kafka is basically a queuing system that is usually used to persist data in the event of a failure in your analytics architecture. If this sounds like what you need, it might be worth looking into RabbitMQ, ZeroMQ, or maybe Kestrel.
Storm is used for complex event processing. If you use storm, you will be using zeroMQ under the hood, and will likely have to set up a spout that is hooked up to kafka or RabbitMQ. IF you need to do complicated munging of the data before storage, this might be the right option. There are other options that you can use too like spark. I'm inclined to suggest storm purely out of personal preference. I heard that linkedin was releasing a realtime complex event processing framework as well, but I can't remember the name of it. I'll update the post when I can find it.
On a different note, if you're asking this question, it might be because you haven't built this thing yet. If that is the case, you might want to look into something other than hadoop if you need streaming. The ecosystem is rapidly expanding, and there are probably many ways to do what you want to do.
Apache Kafka is a distributed messaging system. In very brief its like you pushed (published) some messages into a Kafka Queue using a KafKa producer and On the other end you consumed it using a Kafka consumer (subscriber). The messages/feeds can be divided into categories called Topic. Now you can run Kafka in cluster which makes it very scalable and can be expanded without any downtime.
It could be a nice choice for holding your social media streams. Kafka retains the message pushed to it for a configurable time and the best part is from their documentation they say
Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.
Check out the doc for more better visibility.
Now Storm is a very scalable, fault-tolerant distributed computation system which can easily be integrated with any queueing (like Kafka) or databases (HDFS/Cassandra etc). So you can feed your messages to a storm cluster for further processing based on your requirement. There is something called KafkaSpout which does a seamless integration between storm and kafka.
You should also look at the Kafka-hadoop loader #github which creates Hadoop Job for incremental loading messages from Kafka topics onto hdfs with multiple file output semantics
Also as #Peter Klipfel said that:
you might want to look into something other than hadoop if you need streaming
You can also check for other alternatives available like Apache Cassandra ,works great with streaming data with a very low latency.
I think it depends on where you are pulling the data and what you are trying to do with the data.
An alternative is to use IBM Streams where you can pull directly from social media streams and store to many different data store of your choice.
For example, you can use the streamsx.social toolkit from here: https://github.com/IBMStreams/streamsx.social which allows you to pull tweets directly from an HTTP stream.
Once you get data into Streams, the product also provides many adapters that allow you to store the streaming data into datastore (e.g. HDFS using streamsx.hdfs, HBase using streamsx.hbase.)
I think another consideration is what kind of analytics are you doing with the social media data. If you would like to analyze the social data in-stream before the data is stored, IBM Streams also provides a text toolkit that allows you to extract insight from the social data unstructured text. You can analyze the data without really having to store it anywhere.
Hope it helps!

Resources