OpenTok real-time audio transcription - opentok

I am trying to transcribe the audio in an OpenTok session in real-time. The OpenTok API does not not seem to have that feature. Is there any way I can capture the data in some form and push it to another script/ tool that makes the transcription?
The issue is not with transcribing, the issue is in accessing the live audio stream data and using it in real-time.

You can get access to the video/audio stream (MediaStream) with https://tokbox.com/developer/sdks/js/reference/OT.html#getUserMedia in client SDK.
You can manipulate audio using available API from WebAudio spec.
Publish audio from an audio MediaStreamTrack object. For example, you can use the AudioContext object and the Web Audio API to dynamically generate audio. You can then call createMediaStreamDestination().stream.getAudioTracks()[0]on the AudioContext object to get the audio MediaStreamTrack object to use as the audioSource property of the optionsobject you pass into the OT.initPublisher() method. For a basic example, see the Stereo-Audio sample OpenTok-web-samples repo on GitHub.
This above GitHub example is about injecting your audio stream. However, you can also extract/capture your audio before injecting it. See detail here...
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API.

Related

Record Agora Group video call in single screen in AWS S3 bucket

I have build video calling Application in flutter and backend API in laravel.
For Video Call At Flutter side I'm Using Agora_UIkit. and With help of Agora Rest API for Cloud Recording I' recording video.
currently i have record the video call session but problem i'm facing is that, agora storing video call in 2 separate mp4 files. and I want a single Video file where User can View Video of both side.
I was Just passed Agora ID Received from Agora Rest API, but instead of that just have to pass unique keys for recording Session(if not passed agora will take Unique key itself).

How do I route audio output to a selected audio endpoint/device in Windows?

TL;DR: When playing audio using Windows UWP MediaPlayer, how do I route audio to a specific audio device/endpoint?
Full Context
I'm working on an app to place calls. Some requirements are:
Play audio sounds at different points (e.g. when the call hangs up)
Allow users to change in-call audio output to different endpoints (not an issue)
Ensure that when in-call audio has routed to a different "default" endpoint, that any other sounds that are played are routed to the same endpoint (this is what I need help with)
Currently, when I route audio to a different endpoint, other sounds that are played with Windows UWP MediaPlayer do not get routed to the same "new" endpoint. This makes sense since we aren't changing application-wide settings.
My question is: How do I route audio to the same endpoint that the call audio is going through, given that I'm using Windows UWP MediaPlayer and given that I can get device information?
When playing audio using Windows UWP MediaPlayer, how do I route audio to a specific audio device/endpoint?
Please check Output to a specific audio endpoint document. By default, the audio output from a MediaPlayer is routed to the default audio endpoint for the system, but you can specify a specific audio endpoint that the MediaPlayer should use for output.
You could use GetAudioRenderSelector to get the render selector then use FindAllAsync to get render device the pass the specific device to mediaplayer AudioDevice property.

Is there an option to create a Snapchat-like features addon to interact with video calls?

Is there a way to create an add-on “snapchat-like” video feature for the video calls?
I want to build an app that will get and manipulate the user camera stream source, then the user will decide if and when to share it on his ms-teams calls.
Currently we only have Calls and online meetings support for bot. There is no support for processing video call stream yet.
As of now we don't have more details to share on this.

How do i stream a mp3 file instead of microphone data in opentok session

We need to stream audio file user selects instead of default microphone. Is there any option in SDK to do so?
If you are using a native SDK such as iOS, Android or Windows you should build your own audio driver.
See our samples:
iOS: https://github.com/opentok/opentok-ios-sdk-samples-swift/tree/master/Custom-Audio-Driver
Android: https://github.com/opentok/opentok-android-sdk-samples/tree/master/Custom-Audio-Driver
That audio driver will open the mp3 file and will send it over the OpenTok session.
If you are using the opentok.js SDK then you can use the Fetch API to fetch the mp3 file and the web audio API to decode the audio data and create a media stream destination out of it. Then you can take the audio track from that stream and pass it to the audioSource in OT.initPublisher.
Here is a sample that loads an mp3 file into a session.
https://github.com/opentok/opentok-web-samples/tree/master/Stereo-Audio

what type of message can a websocket send?

I am trying to use a websocket to send audio message, what type of message should I change the audio stream into so that I can use a socket to send?
If I directly use a websocket.send(audio), I would get an error "DOMException", should I change it into binary data? and how?
I am totally new to program, so please help!!!
The Websocket API and supports sending Blob and ArrayBuffer binary types.
If your browser supports Mozilla's audio data API then you can read out the data from an audio tag as an ArrayBuffer and send that. However, this API is unlikely to be standardized.
If your browser supports the Web Audio API (proposed by Google) then you can also extract an ArrayBuffer of the data and send that. The Web Audio API will likely see greater adoption. Here is an intro to Web Audio API. This FAQ has an answer that describes how to use the Web Audio API to read from normal audio/video tags.
Updated:
The Aurora.js library mentioned in this blog post looks like it might be helpful in dealing with audio and binary data.

Resources