I have previously been using the IBM Watson speech to text service to transcribe full audio files that have been pre-recorded. However, I am now trying to do live transcription while using the speaker identification feature. This means that I cannot send each short file (recording audio in about 30 second chunks) individually since the context of the speakers has to be maintained. How can I do this while still utilizing Python?
You need to use WebSocket for real-time transcription. You pass in a chunk of audio and Watson responds with a transcription. You just need to detect the silence to break the stream up into chunks.
You also need to specify the language to be used for the transcription, and it's better when the source audio is coming from a phone call, you should use the Narrow Band models for the best results.
IBM® recommends that you use the broadband model for responsive,
real-time applications (for example, for live-speech applications).
Reference.
You can check one full example using Python with Watson STT in Python in this link. This example uses Nexmo, but you can get the logic for using in any application for real-time transcripts.
Pass-to-pass with Watson Speech to Text - Real transcription (Python).
Official documentation of IBM Watson Speech to Text.
Related
As a beginner at working with these kinds of real-time streaming services, I've spent hours trying to work out how this is possible, but can't seem to work out I'd precisely go about it.
I'm prototyping a personal basic web app that does the following:
In a web browser, the web application has a button that says 'Stream Microphone' - when pressed it streams the audio from the user's microphone (the user obviously has to consent to give permission to send their microphone audio) through to the server which I was presuming would be running node.js (no specific reason at this point, just thought this is how I'd go about doing it).
The server receives the audio close enough to real-time somehow (not sure how I'd do this).
I can then run ffmpeg on the command line and take the real-time audio coming in real-time and add it as the sound to a video file (let's just say I'm going to play testmovie.mp4) that I want to play.
I've looked at various solutions - such as maybe using WebRTC, RTP/RTSP, Piping audio into ffmpeg, Gstreamer, Kurento, Flashphoner and/or Wowza - but somehow they look overly complicated and usually seem to focus on video along with audio. I just need to work with audio.
As you've found there are numerous different options to receive the audio from a WebRTC enabled browser. The options from easiest to more difficult are probably:
Use a WebRTC enabled server such as Janus, Kurento, Jitsi (not sure about wowzer) etc. These servers tend to have plugin systems and one of them may already have the audio mixing capability you need.
If you're comfortable with node you could use the werift library to receive the WebRTC audio stream and then forward it to FFmpeg.
If you want to take full control over the WebRTC pipeline and potentially do the audio mixing as well you could use gstreamer. From what you've described it should be capable of doing the complete task without having to involve a separate FFmpeg process.
The way we did this is by creating a Wowza module in Java that would take the audio from the incoming stream, take the video from wherever you want it, and mix them together.
There's no reason to introduce a thrid party like ffmpeg in the mix.
There's even a sample from Wowza for this: https://github.com/WowzaMediaSystems/wse-plugin-avmix
I have a camera that is ONVIF compatible.
If I want to zoom in/out, I presently have to send this URL to the camera:
http://192.168.2.88/cgi-bin/ptz_cgi?action=FocusAdd&steps=50&user=admin&pwd=admin
This is proprietary to my camera so I would like to do the same with ONVIF.
My question:
Is using onvif as simple as sending:
ONVIF://192.168.2.88:2010/some command ?
If so, what is the command :)
I am using Delphi XE2
Thank you.
No, it is not easy easy as a CGI protocol. The main differences are:
ONVIF is based on SOAP, while many proprietary protocols are based on REST or just parameters encoded in the URL
the ONVIF device model is more complicated, because it supports a wider set of use cases.
Thus, after you either generate the code from the WSDL files or get a library that implements the necessary functions, you have to do:
get the device services
verify that it has a PTZ service
verify that it has a Media service, either 1 or 2 (the latter is for profile T devices)
get the list of media profiles
select the media profile that has a PTZNode and that is actually the one you are looking for
select an adeguate coordinate space from the PTZ service capabilities
send the Move command with the correct parameters
This could seem overcomplex, but you need to remember that the ONVIF protocol needs to support devices with more that one input, such as multichannels encoders. These encoders may have a few fixed cameras and other cameras connected may have a PTZ controlled by the encoder. In practice, the list I just gave you lets you understand what the device you are controlling looks like.
After many hours of research and nothing relevant coming up I decided to ask.
I am pretty new to the concept of video streaming, so please forgive me if my questions may seem elementary.
I am building a project that needs to include media streaming functionality. It should has the following options:
VOD - user uploads a file to the server, that needs to be transcoded to few MP4 files of different resolutions. For transcoding I am trying the approach using CloudTranscode (https://github.com/bfansports/CloudTranscode) deployed as a Docker image. The server should supply stream to the player with certain buffer size, so when the playback is paused we buffer for instance +5 seconds and that's it. Adaptive bitrate would be nice, however I'm not sure how this works with different players (I was thinking about using Video.JS due to high customization option, plus it's free).
Live video capturing - user visits a certain page that captures video from the webcam and sends the stream to the server for further stream distribution to clients. For most browsers WebRTC could be a good option, but iOS devices probably won't work with it, so any suggestions here would be much appreciated
Live video streaming - users visit a certain page where they can watch the stream captured from the user mentioned in point 2. Here the stream may be watched by one or many users (may be as well 1 or 10,000 users)
Cutting to the chase my questions follow:
What would be the best media server software that I can use for that purpose, having on mind high scalability (deployed as Docker container on AWS EC2), and possible huge load of both streaming and watching users, as well as multi-device/platform/browser support?
What would be the best media player for webpage that (again) would be cross-browser/platform/device, keeping in mind good integration with media server itself for purpose of adaptive resolution streaming? Also it would be nice if the player has broad customization options in matter of appearence (for instance thumbnail display when hovering the timeline).
Do you know any better solution for video transcoding than mentioned CloudTranscode, having on mind Docker setup, and some easy to use API (here some on-the-fly transcoding would be nice, so the worker wouldn't need to wait for the whole file to be uploaded)?
What happens if I use autoscalling functionality on EC2 instance, and more instances of the media server are being automatically started? Let's say we have instance 1 (I1) and instance 2 (I2). Some user started broadcasting on I1, and 1000 users are watching the stream which is the server instance's limit because it's running out of resources. Next, another couple of users are trying to view the stream, so they are being connected to I2 by AWS load balancer - how does that work with live stream? Sorry, but I am total newbie to the concept, so again - forgive me for elementary questions.
So far a was able to find a few media servers that may be relevant to my needs including:
Wowza Media Server (paid)
Red5 media server (free)
Kurento Media Server (free)
My application is written in Laravel, ergo I need some PHP integration with the media server.
Obviously free solutions are the most welcome, however I do not mind to pay as long as paid solution covers my needs.
Any input here will be much appreaciated - even partial solutions / suggestions. I'm kinda stuck here, so any suggestions that can bring me closer to the solution are very welcome!
Best regards
If anyone needs such information I ended up using Nginx Plus media server functionalities. It's capable of serving both live and VOD streams, it has out-of-the-box load balancer to switch traffic over multiple container instances and many more great features. Plus they have images to deploy directly from AWS Marketplace, and the license is paid hourly when the EC2 instance is running. Ofcourse there is free version as well, but I am really satisfied with Nginx Plus support.
Capturing live stream from user I've done using getUserMedia() in JS. Still having minor glitches, but I will get it to work (problems are related with WebM chunks that MediaRecorder API spits out, but I'm almost done here using some Python piece of code modifying each chunk on server side).
If anyone needs help I will be happy to help.
I wave written a code to play back WAV files. I'm using the WavOutxxx APIs to accomplish this. It is well documented that WavOutXxx APIs open their streams to the default session. Now for certain reasons I'm trying to control the session opened by the WaveOutXxx APIs using IAudioSessionControl API (Windows Core Audio Interfaces).
Can you tell me if this is actually possible? My code does not involve an inter-process communication for this, since everything is handled in the same code. MSDN says this is possible(http://msdn.microsoft.com/en-us/library/dd371428(v=vs.85).aspx) but I don't see how to do this or any examples. It would be very kind if someone can point me to something relevant.
Thanks.
Yes, it is possible. Look at this example.
I want to have a stress/performance testing for my content management site, especially for hosted streamed video part. I am using IIS to host the videos. More specifically, I am using the new Windows Server 2008 x64 and IIS 7.0.
The confusion is,
I plan to write code to start a lot of threads, and in each thread I will send web request to video URL, and read response stream from server, but I am not sure whether in this way, it behaves the same as a real user using player to render the video (in my code, I just read the stream, without really play it or write to anywhere). I want to test similar to the real scenario as much as possible;
I also plan to use real Media Player to render video (or what-so-ever media player), but my concern is if I start multiple Media Players on my test machine, since Media Player will utilize some H/W or some other resources (video card specific memory?) to decode/render the video (not sure, needs guru help to check and confirm), if I start multiple players, are there any potential H/W or resource contention between the players? If there is contention, it is also not actual ens user scenario, i.e. few user will start 100 players on his/her machine. :-)
Does anyone have any advice to me?
BTW: I prefer to use any .Net based solution, but not a must.
thanks in advance,
George
You should use mplayer. It has a lot of command line options. I don't know how all theses options are available under Windows, but under linux something like this is possible :
mplayer some_url -dump-video -dump-file=some_file
It will behave the same as a "normal" player I think, and your test machine won't need to handle hundreds of decompression thread, sot it fits your need 1 and 2
If you know the bit rate of your video stream, you can pace your downloading request to simulate video player clients. The bit rate can be calculated from the information carried in the stream, but it's a little more complicated. There is software for stressing testing video server too, such as this IP Video Monitor.