Using NAudio to capture a microphone records the exact same amount of buffer data every single time. But I'm trying to use WASAPI Loopback Capture and getting many packets that are larger than I would expect. Using a setting of 500ms for audioBufferMillisecondsLength, I would expect 192,000 bytes per callback event, but instead I see sizes like:
192000
195840
195840
195840
218880
218880
218880
218880
218880
218880
215040
226560
215040
230400
215040
218880
222720
192000
192000
195840
192000
192000
195840
...
Over long recordings, I find the average bytes per event to be 195,951.2095, and this is making it impossible to synchronize this recording with other events at the time.
What is causing this variability, and how can I align such a recording with anything if the actual length of the recording is hard to predict like this?
Loopback capture exposes data copied from playback pipeline, which in turn has its own granularity of buffers. Then, loopback capture does not expose any silence: hence once in a while you will see partial buffers too.
Variable buffer sizes are okay and you have perfect way to synchronize the data. If there is a gap between buffers you have a discontinuity flag indicating this. Otherwise data is continuous no matter what buffer size is.
Related
I'm new to FFmpeg. When learn it with the nice repo(https://github.com/leandromoreira/ffmpeg-libav-tutorial),in the hello_world example I find avcodec_receive_frame dosen't return the first I frame until it gets the third packet, as following screenshot shows:
I'm wondering why additional packets are needed to receive an I frame.
Most modern video codecs are using I/P/B frames which brings the decoding time stamp (DTS) and presentation time stamp (PTS). So, what hello_world does with ffmpeg's lib is the following:-
Demuxing (av_read_frame)
Demuxes packets based on file format (mp4/avi/mkv etc.) until you have a packet for the stream that you want (eg. video) - (We might could say NAL units as an example here - not sure)
Feeds the decoder with the packet (avcodec_send_packet)
Starts the decoding process until it has enough packets to give you the first frame (decodes based on DTS)
Checks whether a frame is ready to be presented (avcodec_receive_frame)
Asks the decoder if it has a frame to be presented after feeding it. It might not be ready and you need to re-feed it or even it might give you more than 1 frames at once. (Frames comes out based on PTS)
I was reading about the -re option in ffmpeg .
What they have mentioned is
From the docs
-re (input)
Read input at the native frame rate. Mainly used to simulate a grab device, or live input stream (e.g. when reading from a file). Should not be used with actual grab devices or live input streams (where it can cause packet loss). By default ffmpeg attempts to read the input(s) as fast as possible. This option will slow down the reading of the input(s) to the native frame rate of the input(s). It is useful for real-time output (e.g. live streaming).
My doubt is basically the part of the above description that I highlighted. It is suggested to not use the option during live input streams but in the end, it is suggested to use it in real-time output.
Considering a situation where both the input and output are in rtmp format, should I use it or not?
Don't use it. It's useful for real-time output when ffmpeg is able to process a source at a speed faster than real-time. In that scenario, ffmpeg may send output at that faster rate and the receiver may not be able to or want to buffer and queue its input.
It (-re) is suitable for streaming from offline files and reads them with its native speed (i.e. 25 fps); otherwise, FFmpeg may output hundreds of frames per second and this may cause problems.
I've been struggling with the following problem and can't figure out a solution. The provided java server application sends pcm audio data in chunks over a websocket connection. There are no headers etc. My task is to play these raw chunks of audio data in the browser without any delay. In the earlier version, I used audioContext.decodeAudioData because I was getting the full array with the 44 byte header at the beginning. Now there is no header so decodeAudioData cannot be used. I'll be very grateful for any suggestions and tips. Maybe I've to use some JS decoding library, any example or link will help me a lot.
Thanks.
1) Your requirement "play these raw chunks of audio data in the browser without any delay" is not possible. There is always some amount of time to send audio, receive it, and play it. Read about the term "latency." First you must get a realistic requirement. It might be 1 second or 50 milliseconds but you need to get something realistic.
2) Web sockets use tcp. TCP is designed for reliable communications, congestion control, etc. It is not design for fast low latency communications.
3) Give more information about your problem. Is you client and server communicating over the Internet or over a local Lan? This will hugely effect your performance and design.
4) The 44 byte header was a wav file header. It tells the type of data (sample rate, mono/stereo, bits per sample). You must know this information to be able to play the audio. IF you know the PCM type, you could insert it yourself and use your decoder as you did before. Otherwise, you need to construct an audio player manually.
Streaming audio over networks is not a trivial task.
I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.
Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.
I have a 4 mic array connected to an iMac via an external audio interface (RME Fireface). I need to record from all 4 simultaneously, get individual signals, perform some operations, and play an output sound, all in real time.
First of all, the audio inputs in Mac system preferences does not show 4 separate devices but just one RME FireFace. How then, can i find the port addresses for each mic?
Secondly, the main question - Can i use Audio Queues for this purpose? The Audio queues documentation has no clear information on multi channel audio input and signal processing.
Audio Queues does not support input from 4 channels simultaneously, based on [this thread].(http://www.mailinglistarchive.com/html/coreaudio-api#lists.apple.com/2011-01/msg00174.html).
You probably have to go with CoreAudio, which is a lower level API but claims "scalable multichannel input and output" which is exactly what you want.