Converting stereo file to mono but preserving peak and RMS loudness - ffmpeg

I am converting stereo audio files to mono using ffmpeg.
ffmpeg -i $1 -ac 1 -ab 192k mono_$1
However, after conversion, the RMS and peak loudness levels are not the same.
Tests-iMac:auditions test$ ./rms.sh mono_test.mp3
mean_volume: -20.1 dB
max_volume: -0.2 dB
Peak level dB: -0.150201
RMS level dB: -20.138039
RMS peak dB: -10.650649
RMS trough dB: -94.923318
Flat factor: 0.000000
Peak count: 2.000000
Bit depth: 32/32
Number of samples: 5800320
Number of NaNs: 0.000000
Number of Infs: 0.000000
Number of denormals: 0.000000
Tests-iMac:auditions test$ ./rms.sh test.mp3
mean_volume: -22.9 dB
max_volume: -2.9 dB
Peak level dB: -2.896314
RMS level dB: -22.883812
RMS peak dB: -13.397327
RMS trough dB: -95.943631
Flat factor: 0.000000
Peak count: 2.000000
Bit depth: 32/32
Number of samples: 5800320
Number of NaNs: 0.000000
Number of Infs: 0.000000
Number of denormals: 0.000000
The first ouput is the mono file which is technically louder than the stereo file, listed second. How can I preserve the peak and RMS values while also converting to mono? I have no issue scripting in order to obtain the stereo loudness values to pass to the mono conversion process.
Thanks!

I just needed to reduce the volume by 2.7db with an audio filter.
ffmpeg -i $1 -ac 1 -af "volume=-2.7dB" -ab 192k mono_$1

Related

Problems with Gnuplot Gif

I am trying to make a gif of the solution of a partial differential equation. In some related posts I have found that I should split my data as follows:
-1.000000 0.000000
-0.600000 0.000000
-0.200000 0.654508
0.200000 0.654508
0.600000 0.000000
1.000000 0.000000
1.400000 0.000000
1.800000 0.000000
2.200000 0.000000
2.600000 0.000000
3.000000 0.000000
-1.000000 0.000000
-0.600000 0.000000
-0.200000 0.163627
0.200000 0.654508
0.600000 0.490881
1.000000 0.000000
1.400000 0.000000
1.800000 0.000000
2.200000 0.000000
2.600000 0.000000
3.000000 0.000000
...
and then I have read that something like that should work:
set terminal gif animate delay 100
set output 'name.gif'
stats 'data.dat' nooutput
do for [i=1:int(STATS_blocks)]{plot 'data.dat' every i using 1:2 with lines notitle}
but I get this. Whereas if I plot every data chunk alone it is completely different. What is wrong with my Gnuplot code?
I think you want index i rather than every i

deeplearning and deepwater models give very different logloss (0.4 vs 0.6)

In AWS, I followed the instruction in here and launched a g2.2xlarge EC2 using the community AMI ami-97591381 (h2o version: 3.13.0.356).
This is my code, which you can run as I made the S3 links public:
library(h2o)
library(jsonlite)
library(curl)
localH2O = h2o.init()
df.truth <- h2o.importFile("https://s3.amazonaws.com/nw.data.test.us.east/df.truth.zeroed", header = T, sep=",")
df.truth$isFemale <- h2o.asfactor(df.truth$isFemale)
hotnames.truth <- fromJSON("https://s3.amazonaws.com/nw.data.test.us.east/hotnames.json", simplifyVector = T)
# Training and validation sets
splits <- h2o.splitFrame(df.truth, c(0.9), seed=1234)
train.truth <- h2o.assign(splits[[1]], "train.truth.hex")
valid.truth <- h2o.assign(splits[[2]], "valid.truth.hex")
# Train a model using non-GPU deeplearning
dl.2 <- h2o.deeplearning(
training_frame = train.truth, model_id="dl.2",
validation_frame = valid.truth,
x=setdiff(hotnames.truth[1:(length(hotnames.truth)/2)], c("isFemale", "nwtcs")),
y="isFemale", stopping_metric = "AUTO", seed = 1,
sparse = F, mini_batch_size = 20)
# Train a model using GPU-enabled deepwater
dw.2 <- h2o.deepwater(
training_frame = train.truth, model_id="dw.2",
validation_frame = valid.truth,
x=setdiff(hotnames.truth[1:(length(hotnames.truth)/2)], c("isFemale", "nwtcs")),
y="isFemale", stopping_metric = "AUTO", seed = 1,
sparse = F, mini_batch_size = 20)
When I inspect the two models, to my surprise I saw large difference in logloss:
Non-GPU
print(dl.2)
Model Details:
==============
H2OBinomialModel: deeplearning
Model ID: dl.2
Status of Neuron Layers: predicting isFemale, 2-class classification, bernoulli distribution, CrossEntropy loss, 160,802 weights/biases, 2.0 MB, 1,041,465 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum
1 1 600 Input 0.00 %
2 2 200 Rectifier 0.00 % 0.000000 0.000000 0.104435 0.102760 0.000000
3 3 200 Rectifier 0.00 % 0.000000 0.000000 0.031395 0.055490 0.000000
4 4 2 Softmax 0.000000 0.000000 0.001541 0.001438 0.000000
mean_weight weight_rms mean_bias bias_rms
1
2 0.018904 0.144034 0.150630 0.415525
3 -0.023333 0.081914 0.545394 0.251275
4 0.029091 0.295439 -0.004396 0.357609
H2OBinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 9877 samples **
MSE: 0.1213733
RMSE: 0.3483868
LogLoss: 0.388214
Mean Per-Class Error: 0.2563669
AUC: 0.8433182
Gini: 0.6866365
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 6546 1079 0.141508 =1079/7625
1 836 1416 0.371226 =836/2252
Totals 7382 2495 0.193885 =1915/9877
H2OBinomialMetrics: deeplearning
** Reported on validation data. **
** Metrics reported on full validation frame **
MSE: 0.126671
RMSE: 0.3559087
LogLoss: 0.4005941
Mean Per-Class Error: 0.2585051
AUC: 0.8309913
Gini: 0.6619825
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 11746 3134 0.210618 =3134/14880
1 1323 2995 0.306392 =1323/4318
Totals 13069 6129 0.232160 =4457/19198
GPU-enabled
print(dw.2)
Model Details:
==============
H2OBinomialModel: deepwater
Model ID: dw.2b
Status of Deep Learning Model: MLP: [200, 200], 630.8 KB, predicting isFemale, 2-class classification, 1,708,160 training samples, mini-batch size 20
input_neurons rate momentum
1 600 0.000369 0.900000
H2OBinomialMetrics: deepwater
** Reported on training data. **
** Metrics reported on temporary training frame with 9877 samples **
MSE: 0.1615781
RMSE: 0.4019677
LogLoss: 0.629549
Mean Per-Class Error: 0.3467246
AUC: 0.7289561
Gini: 0.4579122
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 4843 2782 0.364852 =2782/7625
1 740 1512 0.328597 =740/2252
Totals 5583 4294 0.356586 =3522/9877
H2OBinomialMetrics: deepwater
** Reported on validation data. **
** Metrics reported on full validation frame **
MSE: 0.1651776
RMSE: 0.4064205
LogLoss: 0.6901861
Mean Per-Class Error: 0.3476629
AUC: 0.7187362
Gini: 0.4374724
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 8624 6256 0.420430 =6256/14880
1 1187 3131 0.274896 =1187/4318
Totals 9811 9387 0.387697 =7443/19198
As seen above, the difference in logloss is huge between non-GPU and GPU models:
Logloss
+----------------------------------+
| | non-GPU | GPU |
+----------------------------------+
| training data | 0.39 | 0.63 |
+----------------------------------|
| validation data | 0.40 | 0.69 |
+----------------------------------+
I understand that due to the stochastic nature of the training I will get different results, but I won't expect such a huge difference between non-GPU and GPU.
h2o.deeplearning is H2O's built-in deep-learning algorithm. It parallelizes very well, works well with large data, but does not use GPUs.
h2o.deepwater is a wrapper around (probably) Tensorflow, and (probably) using your GPU (but it can use the CPU, and it can use different back-ends).
In other words, this is not a difference in using the CPU or using the GPU: you are using two different implementations of deep learning.
BTW, I'd suggest you increase the number of epochs (from the default of 10, to something like 200 - bearing in mind this means it will take 20x longer to run), and see if the difference is still there. Or compare the score history charts, and see if Tensorflow is getting there, but just needs, say, 50% more epochs to get the same logloss score.

Error while using .cache file with vowpal wabbit

I am trying the examples given on vowpal-wabbit tutorial but I am getting an error while using *.cache file for training. Error: 6 is too many tokens for a simple label: 8.3.0c�?�p�k>���>���L=��O�?#
second_house�p�Q8>�ޙ�>�33�>��O�??
third_house�p�?��
V$ cat house_dataset
0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924
V$ ls -lrth
total 4.0K
-rw-r--r-- 1 A users 144 May 3 06:28 house_dataset
V$ vw --version
8.3.0
V$ vw house_dataset -c
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
creating cache_file = house_dataset.cache
Reading datafile = house_dataset
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 1 1.0 0.0000 0.0000 5
0.666667 1.000000 2 3.0 1.0000 0.0000 5
finished run
number of examples per pass = 4
passes used = 1
weighted example sum = 5.000000
weighted label sum = 2.000000
average loss = 0.600000
best constant = 0.500000
best constant's loss = 0.250000
total feature number = 16
V$ vw house_dataset.cache
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = house_dataset.cache
num sources = 1
average since example example current current current
loss last counter weight label predict features
Error: 6 is too many tokens for a simple label: 8.3.0c�?�p�k>���>���L=��O�?#
second_house�p�Q8>�ޙ�>�33�>��O�??
third_house�p�?��
0.000000 0.000000 1 1.0 unknown 0.0000 1
0.000000 0.000000 2 2.0 unknown 0.0000 1
finished run
number of examples per pass = 2
passes used = 1
weighted example sum = 2.000000
weighted label sum = 0.000000
average loss = 0.000000
total feature number = 2
It should be
$ vw --cache_file house_dataset.cache
You can check command line arguments description here.

Finding duration of a song

This was a midterm question and I do not know how to calculate this.
A CD quality stereo song has been saved on your computer, occupying 35.28
MBytes of storage. The CD quality mandates that we have 16-bit quantization as
well as a uniform sampling of 44.1 KHz (samples/second). Find the duration of
this song (Hint: 1 Bytes=8 Bits).
Hint: follow the units.
Edit: forgot about stereo.
(total size = bits) / (channels) / (quantization = bits/sample) / (sampling rate = samples/second)
= 35.28 MB / (2 channels for stereo) (16 bits/sample) / (44.1 k samples/second) = (35.28 * 8000000) / (16 * 44100)
= x seconds (assuming MB = 8000000 bits, not MiB)

calculate MPEG frame length (ms)

I'm looking all over the Internet for information in regards to calculating the frame length and it's been hard... I was able to successfully calculate the frame length in ms of MPEG-4, AAC, using:
frameLengthMs = mSamplingRate/1000
This works since there is one sample per frame on AAC. For MPEG-1 or MPEG-2 I'm confused. There are 1152 samples per frame, ok, so what do I do with that? :P
Frame sample:
MPEGDecoder(23069): mSamplesPerFrame: 1152
MPEGDecoder(23069): mBitrateIndex: 7
MPEGDecoder(23069): mFrameLength: 314
MPEGDecoder(23069): mSamplingRate: 44100
MPEGDecoder(23069): mMpegAudioVersion 3
MPEGDecoder(23069): mLayerDesc 1
MPEGDecoder(23069): mProtectionBit 1
MPEGDecoder(23069): mBitrateIndex 7
MPEGDecoder(23069): mSamplingRateFreqIndex 0
MPEGDecoder(23069): mPaddingBit 1
MPEGDecoder(23069): mPrivateBit 0
MPEGDecoder(23069): mChannelMode 1
MPEGDecoder(23069): mModeExtension 2
MPEGDecoder(23069): mCopyright 0
MPEGDecoder(23069): mOriginal 1
MPEGDecoder(23069): mEmphasis 0
MPEGDecoder(23069): mBitrate: 96kbps
The duration of an MPEG audio frame is a function of the sampling rate and the number of samples per frame. The formula is:
frameTimeMs = (1000/SamplingRate) * SamplesPerFrame
In your case this would be
frameTimeMs = (1000/44100) * 1152
Which yields ~26ms per frame. For a different sampling rate you would get a different duration. The key is MPEG audio always represents a fixed number of samples per frame, but the time duration of each sample is dependent on the sampling rate.

Resources