pts & pcr values limit in mpegts? - ffmpeg

As I know PCR stored in 42bits and PTS stored in 33bits in mpegts container
So,
Max value for PCR is 2^42 = 4398046511104
Max value for PTS is 2^33 = 8589934592
PCR (sec) = 4398046511104 / 27 000 000 Hz = 162890,6 seconds (45 hours)
PTS (sec) = 8589934592 / 90 000 Hz = 95443,7 seconds (26,5 hours)
So,
what I must to do, if PTS or PCR reach one of this max values ?
This can be happening in iptv for continuous stream

Max value for PCR is 2^42 = 4398046511104
This is not true. Please refer: https://stackoverflow.com/a/36810049/6244249

Just let it overflow and continue to write the low 33 bits. The demuxer will know how to handle it.

Related

Is 1KB = 1.024 bytes OR 1.000 bytes?

Everyone keeps saying that
1'000 KB = 1 MB (decimal) -> Megabyte (MB) 10^6 Byte = 1 000 000 Byte
1'024 KB = 1 MiB (binary) -> Mebibyte (MiB) 2^20 Byte = 1 048 576 Byte
But when you look at windows properties of a folder, you get:
Which clearly uses 1KB = 1024bytes (and it's not this "MiB" but still uses 1024). So, what's the veredict?
Is 1KB = 1.024 bytes OR 1.000 ?
Yes, in causal discourse, it is either one - context sensitive.
Memory size tends to use 1024.
File size tends to use 1000.
Exceptions are common.
Else see Kilobyte.
Pedantic concerns like this include using K, when k should be used as in 1kB.

MFCC feature extraction, Librosa

I want to extract mfcc features of an audio file sampled at 8000 Hz with the frame size of 20 ms and of 10 ms overlap. What must be the parameters for librosa.feature.mfcc() function. Does the code written below specify 20ms chunks with 10ms overlap?
import librosa as l
x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc = 24, hop_length = 160)
The audio file is of 1800 seconds. Does that mean I would get 24 mfccs for all (1800/0.01)-1 chunks of the audio?
1800 seconds at 8000 Hz are obviously 1800 * 8000 = 14400000 samples.
If your hop length is 160, you get roughly 14400000 / 160 = 90000 MFCC values with 24 dimensions each. So this is clearly not (1800 / 0.01) - 1 = 179999 (off by a factor of roughly 2).
Note that I used roughly in my calculation, because I only used the hop length and ignored the window length. Hop length is the number of samples the window is moved with each step. How many hops you can fit depends on whether you pad somehow or not. And if you decide not to pad, the number of frames also depends on your window size.
To get back to your question: You have to ask yourself how many samples are 10 ms?
If 1 s contains 8000 samples (that's what 8000 Hz means), how many samples are in 0.01 s? That's 8000 * 0.01 = 80 samples.
This means you have a hop length of 80 samples and a window length of 160 samples (0.02 s—twice as long).
Now you should tell librosa to use this info, like this:
import librosa as l
x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
n_fft = int(sr * 0.02) # window length: 0.02 s
hop_length = n_fft // 2 # usually one specifies the hop length as a fraction of the window length
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=24, hop_length=hop_length, n_fft=n_fft)
# check the dimensions
print(mfccs.shape)
Hope this helps.

Why average loss goes up when training using Vowpal Wabbit

I tried to use VW to train a regression model on a small set of examples (about 3112). I think I'm doing it correctly, yet it showed me weird results. Dug around but didn't find anything helpful.
$ cat sh600000.feat | vw --l1 1e-8 --l2 1e-8 --readable_model model -b 24 --passes 10 --cache_file cache
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.040000 0.040000 1 1.0 -0.2000 0.0000 79
0.051155 0.062310 2 2.0 0.2000 -0.0496 79
0.046606 0.042056 4 4.0 0.4100 0.1482 79
0.052160 0.057715 8 8.0 0.0200 0.0021 78
0.064936 0.077711 16 16.0 -0.1800 0.0547 77
0.060507 0.056079 32 32.0 0.0000 0.3164 79
0.136933 0.213358 64 64.0 -0.5900 -0.0850 79
0.151692 0.166452 128 128.0 0.0700 0.0060 79
0.133965 0.116238 256 256.0 0.0900 -0.0446 78
0.179995 0.226024 512 512.0 0.3700 -0.0217 79
0.109296 0.038597 1024 1024.0 0.1200 -0.0728 79
0.579360 1.049425 2048 2048.0 -0.3700 -0.0084 79
0.485389 0.485389 4096 4096.0 1.9600 0.3934 79 h
0.517748 0.550036 8192 8192.0 0.0700 0.0334 79 h
finished run
number of examples per pass = 2847
passes used = 5
weighted example sum = 14236
weighted label sum = -155.98
average loss = 0.490685 h
best constant = -0.0109567
total feature number = 1121506
$ wc model
41 48 657 model
Questions:
Why is the number of features in the output (readable) model less than the number of actual features? I counted that the training data contains 78 features (plus the bias that's 79 as shown during the training). The number of feature bits is 24, which should be far more than enough to avoid collision.
Why does the average loss actually go up in the training as you can see in the above example?
(Minor) I tried to increase the number of feature bits to 32, and it output an empty model. Why?
EDIT:
I tried to shuffle the input file, as well as using --holdout_off, as suggested. But the result is still almost the same - the average loss go up.
$ cat sh600000.feat.shuf | vw --l1 1e-8 --l2 1e-8 --readable_model model -b 24 --passes 10 --cache_file cache --holdout_off
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.040000 0.040000 1 1.0 -0.2000 0.0000 79
0.051155 0.062310 2 2.0 0.2000 -0.0496 79
0.046606 0.042056 4 4.0 0.4100 0.1482 79
0.052160 0.057715 8 8.0 0.0200 0.0021 78
0.071332 0.090504 16 16.0 0.0300 0.1203 79
0.043720 0.016108 32 32.0 -0.2200 -0.1971 78
0.142895 0.242071 64 64.0 0.0100 -0.1531 79
0.158564 0.174232 128 128.0 0.0500 -0.0439 79
0.150691 0.142818 256 256.0 0.3200 0.1466 79
0.197050 0.243408 512 512.0 0.2300 -0.0459 79
0.117398 0.037747 1024 1024.0 0.0400 0.0284 79
0.636949 1.156501 2048 2048.0 1.2500 -0.0152 79
0.363364 0.089779 4096 4096.0 0.1800 0.0071 79
0.477569 0.591774 8192 8192.0 -0.4800 0.0065 79
0.411068 0.344567 16384 16384.0 0.0700 0.0450 77
finished run
number of examples per pass = 3112
passes used = 10
weighted example sum = 31120
weighted label sum = -105.5
average loss = 0.423404
best constant = -0.0033901
total feature number = 2451800
The training examples are unique to each other so I doubt there is over-fitting problem (which, as I understand it, usually happens when the number of input is too small comparing the number of features).
EDIT2:
Tried to print the average loss for every pass of examples, and see that it mostly remains constant.
$ cat dist/sh600000.feat | vw --l1 1e-8 --l2 1e-8 -f dist/model -P 3112 --passes 10 -b 24 --cache_file dist/cache
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
final_regressor = dist/model
using cache_file = dist/cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.498822 0.498822 3112 3112.0 0.0800 0.0015 79 h
0.476677 0.454595 6224 6224.0 -0.2200 -0.0085 79 h
0.466413 0.445856 9336 9336.0 0.0200 -0.0022 79 h
0.490221 0.561506 12448 12448.0 0.0700 -0.1113 79 h
finished run
number of examples per pass = 2847
passes used = 5
weighted example sum = 14236
weighted label sum = -155.98
average loss = 0.490685 h
best constant = -0.0109567
total feature number = 1121506
Also another try without the --l1, --l2 and -b parameters:
$ cat dist/sh600000.feat | vw -f dist/model -P 3112 --passes 10 --cache_file dist/cacheNum weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
final_regressor = dist/model
using cache_file = dist/cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.520286 0.520286 3112 3112.0 0.0800 -0.0021 79 h
0.488581 0.456967 6224 6224.0 -0.2200 -0.0137 79 h
0.474247 0.445538 9336 9336.0 0.0200 -0.0299 79 h
0.496580 0.563450 12448 12448.0 0.0700 -0.1727 79 h
0.533413 0.680958 15560 15560.0 -0.1700 0.0322 79 h
0.524531 0.480201 18672 18672.0 -0.9800 -0.0573 79 h
finished run
number of examples per pass = 2801
passes used = 7
weighted example sum = 19608
weighted label sum = -212.58
average loss = 0.491739 h
best constant = -0.0108415
total feature number = 1544713
Does that mean it's normal for average loss to go up during one pass, but as long as multiple pass gets the same loss then it's fine?
Model file stores only non-zero weights. So most likely others got nulled especially if you are using --l1
It may be caused by many reasons. Perhaps your dataset isn't shuffled well enough. If you sort your dataset so examples labeled -1 will be in first half and examples labeled 1 will be in second then your model will show very good convergence on first half, but you'll see avg loss bump as it reaches 2nd half. So it may be disbalance in dataset. As for last two losses - these are holdout losses (marked with 'h' at end of line) and may point that model is overfitted. Pls refer to my other answer.
Well, in master branch usage of -b 32 is even currently blocked. You shall use up to -b 31. On practice -b 24-28 is usually enough even for dozens of thousands of features.
I would recommend you to get up-to-date VW version from github

Finding duration of a song

This was a midterm question and I do not know how to calculate this.
A CD quality stereo song has been saved on your computer, occupying 35.28
MBytes of storage. The CD quality mandates that we have 16-bit quantization as
well as a uniform sampling of 44.1 KHz (samples/second). Find the duration of
this song (Hint: 1 Bytes=8 Bits).
Hint: follow the units.
Edit: forgot about stereo.
(total size = bits) / (channels) / (quantization = bits/sample) / (sampling rate = samples/second)
= 35.28 MB / (2 channels for stereo) (16 bits/sample) / (44.1 k samples/second) = (35.28 * 8000000) / (16 * 44100)
= x seconds (assuming MB = 8000000 bits, not MiB)

calculate MPEG frame length (ms)

I'm looking all over the Internet for information in regards to calculating the frame length and it's been hard... I was able to successfully calculate the frame length in ms of MPEG-4, AAC, using:
frameLengthMs = mSamplingRate/1000
This works since there is one sample per frame on AAC. For MPEG-1 or MPEG-2 I'm confused. There are 1152 samples per frame, ok, so what do I do with that? :P
Frame sample:
MPEGDecoder(23069): mSamplesPerFrame: 1152
MPEGDecoder(23069): mBitrateIndex: 7
MPEGDecoder(23069): mFrameLength: 314
MPEGDecoder(23069): mSamplingRate: 44100
MPEGDecoder(23069): mMpegAudioVersion 3
MPEGDecoder(23069): mLayerDesc 1
MPEGDecoder(23069): mProtectionBit 1
MPEGDecoder(23069): mBitrateIndex 7
MPEGDecoder(23069): mSamplingRateFreqIndex 0
MPEGDecoder(23069): mPaddingBit 1
MPEGDecoder(23069): mPrivateBit 0
MPEGDecoder(23069): mChannelMode 1
MPEGDecoder(23069): mModeExtension 2
MPEGDecoder(23069): mCopyright 0
MPEGDecoder(23069): mOriginal 1
MPEGDecoder(23069): mEmphasis 0
MPEGDecoder(23069): mBitrate: 96kbps
The duration of an MPEG audio frame is a function of the sampling rate and the number of samples per frame. The formula is:
frameTimeMs = (1000/SamplingRate) * SamplesPerFrame
In your case this would be
frameTimeMs = (1000/44100) * 1152
Which yields ~26ms per frame. For a different sampling rate you would get a different duration. The key is MPEG audio always represents a fixed number of samples per frame, but the time duration of each sample is dependent on the sampling rate.

Resources