Play a PCM stream sampled at 16 kHz - websocket

I get a input frame stream through a socket, it is a mono 32-bit IEEE floating point PCM stream sampled at 16 kHz.
I get this with the following code : audio File sample
With Audacity i can visualize this and i see a regular cuts between my audio flux:
var audioCtx = new(window.AudioContext || window.webkitAudioContext)();
var audioBuffer = audioCtx.createBuffer(1, 256, 16000);
var BufferfloatArray;
var source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
var gainNode = audioCtx.createGain();
gainNode.gain.value = 0.1;
gainNode.connect(audioCtx.destination);
source.connect(gainNode);
source.start(0);
socket.on('audioFrame', function(raw) {
var context = audioCtx;
BufferfloatArray = new Float32Array(raw);
var src = context.createBufferSource();
audioBuffer.getChannelData(0).set(BufferfloatArray);
src.buffer = audioBuffer;
src.connect(gainNode);
src.start(0);
});
I think it is because of the sample rate of my raw buffer (16000) is different of the sample rate of my Audio context (44100), what do you think ?

This is not a sample rate problem, because the AudioBufferSourceNode resamples the audio to the AudioContext's rate when playing.
What you should do here, is to have a little queue of buffers you feed using the network, and then, play your buffers normally like you do, but from the buffer queue, taking extra care to schedule them (using the first parameter of the start method of the AudioBufferSourceNode) at the right time, so that the end of the previous buffer is exactly the start of the next one. You can use the AudioBuffer.duration parameter to achieve this (duration is in seconds).

Related

Can't get the right formula to set frame pts for a stream using libav

I'm trying to save a stream of frames as mp4.
Source framerate is not fixed and it stay in the range [15,30]
Encoder params:
...
eCodec.time_base = AVRational(1,3000);
eCodec.framerate = AVRational(30, 1);
...
Stream params:
eStream = avformat_new_stream(eFormat, null);
eStream.codecpar = codecParams;
eStream.time_base = eCodec.time_base;
Decoder time_base is 0/1 and it marks each frame with a pts like:
480000
528000
576000
...
PTS(f) is always == PTS(f-1)+48000
Encoding (dFrame is the received frame, micro the elapsed time in microseconds):
av_frame_copy(eFrame, dFrame);
eFrame.pts = micro*3/1000;
This make the video playing too fast.
I can't understand why, but changing micro*3/1000 to micro*3*4/1000 make the video play at the correct speed (checked against a clock after many minutes of varying fps)
What am I missing?

How to get the sample rate from a mediaDevices.getUserMedia stream

Firefox is limited in its audio resampling ability for audio mediastreams. If the input media stream's sample rate is not the same as the AudioCotext's, then it complains :
DOMException: AudioContext.createMediaStreamSource: Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported.
For example if we get an audio stream like so :
navigator.mediaDevices.getUserMedia(constraints).then(stream => {
let context = new (window.AudioContext || window.webkitAudioContext)({sampleRate : 48000});
let audioInput = this.context.createMediaStreamSource(stream);
});
Firefox will complain about mismatching sample rates - if they are different between the audio context and the hardware device's settings in the audio subsystem.
I can't find a way to get the sample rate from the audio track in the stream. I've tried :
let tracks = stream.getAudioTracks();
let settings = tracks[0].getSettings();
let constraints = tracks[0].getConstraints();
But none of these objects have the streams's sampleRate in them.
Is there another way to enquire an audio track's/stream's sample rate ?

How to use "kAudioUnitSubType_VoiceProcessingIO" subtype of core audio API in mac os?

I'm finding an example of simple play-thru application using built-in mic/speaker with kAudioUnitSubType_VoiceProcessingIO subtype(not kAudioUnitSubType_HALOutput) in macosx. The comments on the core audio api says that kAudioUnitSubType_VoiceProcessingIO is available on the desktop and with iPhone 3.0 or greater, so I think that there must be an example somewhere for macos.
Do you have any idea where the sample is? or Is there anyone who know how to use the kAudioUnitSubType_VoiceProcessingIO subtype in macos? I already tried the same way that I did in iOS, but it didn't work.
I discovered a few things enabling this IO unit.
Stream format is really picky. It has to be
LinearPCM
FlagsCononical
32 bits per channel
(I did 1 channel but it might work with more)-
sample rate 44100 (might work with others might not)
You don't set EnableIO on it. IO is enabled by default and that property is not writable.
Set stream format before initialization.
As with other core audio work, you just need to check the error status of every single function call, determine what the errors are and make little changes at each step until you finally get it to work.
I had two different kAudioUnitProperty_StreamFormat setup based on the number of the channels.
size_t bytesPerSample = sizeof (AudioUnitSampleType);
stereoStreamFormat.mFormatID = kAudioFormatLinearPCM;
stereoStreamFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
stereoStreamFormat.mBytesPerPacket = bytesPerSample;
stereoStreamFormat.mFramesPerPacket = 1;
stereoStreamFormat.mBytesPerFrame = bytesPerSample;
stereoStreamFormat.mChannelsPerFrame = 2;
stereoStreamFormat.mBitsPerChannel = 8 * bytesPerSample;
stereoStreamFormat.mSampleRate = graphSampleRate;
and
size_t bytesPerSample = sizeof (AudioUnitSampleType);
monoStreamFormat.mFormatID = kAudioFormatLinearPCM;
monoStreamFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
monoStreamFormat.mBytesPerPacket = bytesPerSample;
monoStreamFormat.mFramesPerPacket = 1;
monoStreamFormat.mBytesPerFrame = bytesPerSample;
monoStreamFormat.mChannelsPerFrame = 1; // 1 indicates mono
monoStreamFormat.mBitsPerChannel = 8 * bytesPerSample;
monoStreamFormat.mSampleRate = graphSampleRate;
with this audio stream formats when using the I/O unit as a kAudioUnitSubType_VoiceProcessingIO
AudioComponentDescription iOUnitDescription;
iOUnitDescription.componentType = kAudioUnitType_Output;
iOUnitDescription.componentSubType = kAudioUnitSubType_VoiceProcessingIO;
iOUnitDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
iOUnitDescription.componentFlags = 0;
iOUnitDescription.componentFlagsMask = 0;
I can clearly see a interruption in the audio output, as the buffer size was smaller than the one from this AudioUnit.
Switching back to the kAudioUnitSubType_RemoteIO
iOUnitDescription.componentSubType = kAudioUnitSubType_RemoteIO;
That interruption disappear.
I'm processing audio input from microphone and applying some real time calculations on the audio buffers.
In the methods the graphSampleRate is the AVSession sample rate
graphSampleRate = [AVAudioSession sharedInstance] sampleRate];
and maybe here I'm wrong.
At the end the configuration parameters values are the following:
The stereo stream format:
Sample Rate: 44100
Format ID: lpcm
Format Flags: 3116
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
The mono stream format:
Sample Rate: 44100
Format ID: lpcm
Format Flags: 3116
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 1
Bits per Channel: 32
Thanks to SO post here I realized I should have used this flag:
audioFormat.mFormatFlags = kAudioFormatFlagsCanonical;

encapsulating H.264 streams variable framerate in MPEG2 transport stream

Imagine I have H.264 AnxB frames coming in from a real-time conversation. What is the best way to encapsulate in MPEG2 transport stream while maintaining the timing information for subsequent playback?
I am using libavcodec and libavformat libraries. When I obtain pointer to object (*pcc) of type AVCodecContext, I set the foll.
pcc->codec_id = CODEC_ID_H264;
pcc->bit_rate = br;
pcc->width = 640;
pcc->height = 480;
pcc->time_base.num = 1;
pcc->time_base.den = fps;
When I receive NAL units, I create a AVPacket and call av_interleaved_write_frame().
AVPacket pkt;
av_init_packet( &pkt );
pkt.flags |= AV_PKT_FLAG_KEY;
pkt.stream_index = pst->index;
pkt.data = (uint8_t*)p_NALunit;
pkt.size = len;
pkt.dts = AV_NOPTS_VALUE;
pkt.pts = AV_NOPTS_VALUE;
av_interleaved_write_frame( fc, &pkt );
I basically have two questions:
1) For variable framerate, is there a way to not specify the foll.
pcc->time_base.num = 1;
pcc->time_base.den = fps;
and replace it with something to indicate variable framerate?
2) While submitting packets, what "timestamps" should I assign to
pkt.dts and pkt.pts?
Right now, when I play the output using ffplay it is playing at constant framerate (fps) which I use in the above code.
I also would love to know how to accommodate varying spatial resolution. In the stream that I receive, each keyframe is preceded by SPS and PPS. I know whenever the spatial resolution changes.
IS there a way to not have to specify
pcc->width = 640;
pcc->height = 480;
upfront? In other words, indicate that the spatial resolution can change mid-stream.
Thanks a lot,
Eddie
DTS and PTS are measured in a 90 KHz clock. See ISO 13818 part 1 section 2.4.3.6 way down below the syntax table.
As for the variable frame rate, your framework may or may not have a way to generate this (vui_parameters.fixed_frame_rate_flag=0). Whether the playback software handles it is an ENTIRELY different question. Most players assume a fixed frame rate regardless of PTS or DTS. mplayer can't even compute the frame rate correctly for a fixed-rate transport stream generated by ffmpeg.
I think if you're going to change the resolution you need to end the stream (nal_unit_type 10 or 11) and start a new sequence. It can be in the same transport stream (assuming your client's not too simple).

is it possible to change the playback pitch of an audioqueue

This is supposed to be possible on Mac OS X by overwriting the sample rate in the AudioStreamBasicDescription then create a new output queue.
I've been able to retrieve the default sample rate and write a new one (ie. replace 44100 with 48000) but this is not resulting in any pitch change in the output signal.
err = AudioFileGetProperty(mAudioFile, kAudioFilePropertyDataFormat, &size, &mDataFormat);
if (err != noErr)
NSLog(#"Couldn't determine the audio file format");
Float64 mySampleRate = mDataFormat.mSampleRate; //the initial rate
if (inRate != 1) {
//write a new value
mDataFormat.mSampleRate = inRate;
//then
err = AudioQueueNewOutput etc.
Any suggestions would be greatly appreciated.
Changing the sample rate doesn't change the pitch of the audio. You may perceive that something playing back faster has a higher pitch. However that's perception rather than reality.
To change pitch, you'll need to process the audio data through a Digital Signal Processing (DSP) library. Alternatively, take a look at running it through an AudioUnit:
Audio Unit Programming Guide

Resources