Compute accurately the pts and dts of a video packet - ffmpeg

Is there a way to compute the encoded AVPacket.pts and AVPacket.dts? If the encoded packet doesn't contain a duration?
I tried computing it by starting the timestamp at 0, then increase that timestamp with the computed duration of the video frame. My computation for the duration below:
if (video_ts < 0) video_ts = 0;
else
{
video_ts += (int64_t)last_duration;
}
compressed_video.pts = video_ts;
compressed_video.dts = video_ts;
last_duration = ((compressed_video.size * 8) / (double)out_videocc->bit_rate) * 1000;
It worked slightly but it is not exact. The playback stutters

Related

AudioUnit output buffer and input buffer

My question is what should I do when I use real-time time stretch?
I understand that the change of rate will change the count of samples for output.
For example, if I stretch audio with 2.0 coefficient, the output buffer is bigger (twice).
So, what should I do if I create reverb, delay or real-time time stretch?
For example, my input buffer is 1024 samples. Then I stretch audio with 2.0 coefficient. Now my Buffer is 2048 samples.
In this code with superpowered audio stretch, everything is work. But if I do not change the rate... When I change rate - it sounds with distortion without actual change of speed.
return ^AUAudioUnitStatus(AudioUnitRenderActionFlags *actionFlags,
const AudioTimeStamp *timestamp,
AVAudioFrameCount frameCount,
NSInteger outputBusNumber,
AudioBufferList *outputBufferListPtr,
const AURenderEvent *realtimeEventListHead,
AURenderPullInputBlock pullInputBlock ) {
pullInputBlock(actionFlags, timestamp, frameCount, 0, renderABLCapture);
Float32 *sampleDataInLeft = (Float32*) renderABLCapture->mBuffers[0].mData;
Float32 *sampleDataInRight = (Float32*) renderABLCapture->mBuffers[1].mData;
Float32 *sampleDataOutLeft = (Float32*)outputBufferListPtr->mBuffers[0].mData;
Float32 *sampleDataOutRight = (Float32*)outputBufferListPtr->mBuffers[1].mData;
SuperpoweredAudiobufferlistElement inputBuffer;
inputBuffer.samplePosition = 0;
inputBuffer.startSample = 0;
inputBuffer.samplesUsed = 0;
inputBuffer.endSample = frameCount;
inputBuffer.buffers[0] = SuperpoweredAudiobufferPool::getBuffer(frameCount * 8 + 64);
inputBuffer.buffers[1] = inputBuffer.buffers[2] = inputBuffer.buffers[3] = NULL;
SuperpoweredInterleave(sampleDataInLeft, sampleDataInRight, (Float32*)inputBuffer.buffers[0], frameCount);
timeStretch->setRateAndPitchShift(1.0f, -2);
timeStretch->setSampleRate(48000);
timeStretch->process(&inputBuffer, outputBuffers);
if (outputBuffers->makeSlice(0, outputBuffers->sampleLength)) {
int numSamples = 0;
int samplesOffset =0;
while (true) {
Float32 *timeStretchedAudio = (Float32 *)outputBuffers->nextSliceItem(&numSamples);
if (!timeStretchedAudio) break;
SuperpoweredDeInterleave(timeStretchedAudio, sampleDataOutLeft + samplesOffset, sampleDataOutRight + samplesOffset, numSamples);
samplesOffset += numSamples;
};
outputBuffers->clear();
}
return noErr;
};
So, how can I create my Audio Unit render block, when my input and output buffers have the different count of samples (reverb, delay or time stretch)?
If your process creates more samples than provided by the audio callback input/output buffer size, you have to save those samples and play them later, by mixing in with subsequent output in a later audio unit callback if necessary.
Often circular buffers are used to decouple input, processing, and output sample rates or buffer sizes.

libx264 encode error Input picture width (40) is greater than stride (0)

I used libx264 in ffmpeg to encode video, I used the configuration below.
enCodecContext->bit_rate = 300000;
enCodecContext->width = 80;
enCodecContext->height = 60;
enCodecContext->time_base = (AVRational) {1, 25};
enCodecContext->gop_size = 10;
enCodecContext->max_b_frames = 1;
enCodecContext->pix_fmt = PIX_FMT_YUV420P;
enCodecContext->qcompress = 0.6;
av_opt_set(enCodecContext->priv_data, "preset", "slow", 0);
But when I called avcodec_encode_video2 with enCodecContext, I got the error Input picture width (40) is greater than stride (0).
avcodec_encode_video2(enCodecContext, &filteredAVPacket, pFilteredAVFrame, &got_packet_ptr);
The pFilteredAVFrame->width and pFilteredAVFrame->height is 80 and 60 respectively.
Did I missed something when configured libx264, How can I get a workable configuration for libx264 to encode my video?
x264 is fine. You must fill in the AVPicture.linestride variable for your image planes. The stride describes how the image is laid out in memory. The stride must be at least as big as the image width. In the case of YUV 4:2:0, the stride must be at least half the width on the second and third plane.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa473780(v=vs.85).aspx

Why does the packet number in my audio queue input callback vary?

I use Audio Queue Services to record PCM audio data on Mac OS X. It works but the number of frames I get in my callback varies.
static void MyAQInputCallback(void *inUserData, AudioQueueRef inQueue, AudioQueueBufferRef inBuffer, const AudioTimeStamp *inStartTime, UInt32 inNumPackets, const AudioStreamPacketDescription *inPacketDesc)
On each call of my audio input queue I want to get 5 ms (240 frames/inNumPackets, 48 kHz) of audio data.
This is the audio format I use:
AudioStreamBasicDescription recordFormat = {0};
memset(&recordFormat, 0, sizeof(recordFormat));
recordFormat.mFormatID = kAudioFormatLinearPCM;
recordFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
recordFormat.mBytesPerPacket = 4;
recordFormat.mFramesPerPacket = 1;
recordFormat.mBytesPerFrame = 4;
recordFormat.mChannelsPerFrame = 2;
recordFormat.mBitsPerChannel = 16;
I have two buffers of 960 bytes enqueued:
for (int i = 0; i < 2; ++i) {
AudioQueueBufferRef buffer;
AudioQueueAllocateBuffer(queue, 960, &buffer);
AudioQueueEnqueueBuffer(queue, buffer, 0, NULL);
}
My problem: For every 204 times of 240 frames (inNumPackets) the callback is once called with only 192 frames.
Why does that happen and is there something I can do to get 240 frames constantly?
Audio Queues run on top of Audio Units. The Audio Unit buffers are very likely configured by the OS to be a power-of-two in size, and your returned Audio Queue buffers are chopped out of the larger Audio Unit buffers.
204 * 240 + 192 = 12 audio unit buffers of 4096.
If you want fixed length buffers that are not a power-of-two, your best bet is to have the app re-buffer the incoming buffers (save up until you have enough data) to your desired length. A lock-free circular fifo/buffer might be suitable for this purpose.

Fade out function of audio between samplerate changes

I'm trying to create a simple function that will decrease audio volume in a buffer (like a fade out) each iteration through the buffer. Here's my simple function.
double iterationSum = 1.0;
double iteration(double sample)
{
iterationSum *= 0.9;
//and then multiply that sum with the current sample.
sample *= iterationSum;
return sample;
}
This works fine when set to a 44100 kHz samplerate but the problem I'm having is that if the samplerate is for an example changed to 88200 kHz it should only reduce the volume half that step each time because the samplerate is twice as much and will otherwise end the "fade out" in halftime, and I've tried to use a factor like 44100 / 88200 = 0.5 but this will not make it half the step in any way.
I'm stuck with this simple problem and need a guide to lead me through, what can I do to make it half step in each iteration as this function is called if the samplerate is changed during programtime?
Regards, Morgan
The most robust way to fade out independent of sample rate is to keep track of the time since the fadeout started, and use an explicit fadeout(time) function.
If for some reason you can't do that, you can set your exponential decay rate based on the sample rate, as follows:
double decay_time = 0.01; // time to fall to ~37% of original amplitude
double sample_time = 1.0 / sampleRate;
double natural_decay_factor = exp(- sample_time / decay_time);
...
double iteration(double sample) {
iterationSum *= natural_decay_factor;
...
}
The reason for the ~37% is because exp(x) = e^x, where e is the "natural log" base, and 1/e ~ 0.3678.... If you want a different decay factor for your decay time, you need to scale it by a constant:
// for decay to 50% amplitude (~ -6dB) over the given decay_time:
double halflife_decay_factor = exp(- log(2) * sample_time / decay_time);
// for decay to 10% amplitude (-20dB) over the given decay_time:
double db20_decay_factor = exp(- log(10) * sample_time / decay_time);
im not sure if i understood, but what about something like this:
public void fadeOut(double sampleRate)
{
//run 1 iteration per sec?
int defaultIterations=10;
double decrement = calculateIteration(sampleRate, defaultIterations);
for(int i=0; i < defaultIterations; i++)
{
//maybe run each one of these loops every x ms?
sampleRate = processIteration(sampleRate, decrement);
}
}
public double calculateIteration(double sampleRate, int numIterations)
{
return sampleRate/numIterations;
}
private double processIteration(double sampleRate, double decrement)
{
return sampleRate -= decrement;
}

Upscaling images on Retina devices

I know images upscale by default on retina devices, but the default scaling makes the images blurry.
I was wondering if there was a way to scale it in nearest-neighbor mode, where there are no transparent pixels created, but rather each pixel multiplied by 4, so it looks like it would on a non retina device.
Example of what I'm talking about can be seen in the image below.
example http://cclloyd.com/downloads/sdfsdf.png
CoreGraphics will not do a 2x scale like that, you need to write a bit of explicit pixel mapping logic to do something like this. The following is some code I used to do this operation, you would of course need to fill in the details as this operates on an input buffer of pixels and writes to an output buffer of pixels that is 2x larger.
// Use special case "DOUBLE" logic that will simply duplicate the exact
// RGB value from the indicated pixel into the 2x sized output buffer.
int numOutputPixels = resizedFrameBuffer.width * resizedFrameBuffer.height;
uint32_t *inPixels32 = (uint32_t*)cgFrameBuffer.pixels;
uint32_t *outPixels32 = (uint32_t*)resizedFrameBuffer.pixels;
int outRow = 0;
int outColumn = 0;
for (int i=0; i < numOutputPixels; i++) {
if ((i > 0) && ((i % resizedFrameBuffer.width) == 0)) {
outRow += 1;
outColumn = 0;
}
// Divide by 2 to get the column/row in the input framebuffer
int inColumn = outColumn / 2;
int inRow = outRow / 2;
// Get the pixel for the row and column this output pixel corresponds to
int inOffset = (inRow * cgFrameBuffer.width) + inColumn;
uint32_t pixel = inPixels32[inOffset];
outPixels32[i] = pixel;
//fprintf(stdout, "Wrote 0x%.10X for 2x row/col %d %d (%d), read from row/col %d %d (%d)\n", pixel, outRow, outColumn, i, inRow, inColumn, inOffset);
outColumn += 1;
}
This code of course depends on you creating a buffer of pixels and then wrapping it back up into CFImageRef. But, you can find all the code to do that kind of thing easily.

Resources