expected audio sample rate doesn't match actual? - bash

I am trying to use pocket sphinx to transcribe audio files.
pocketsphinx_continuous -infile 116-288045-0005.flac.wav
but I am getting the errors:
ERROR: "continuous.c", line 136: Input audio file has sample rate [44100],
but decoder expects [16000]
FATAL: "continuous.c", line 165: Failed to process file '116-288045-0005.flac.wav'
due to format mismatch.
Here's one of the audio files I need to transcribe: Download from GitHub
Eventually I will batch-transcribe over 5 hours of audio files like these, currently they all throw the same error.
Here's some stats of the same file I'm trying to transcribe:
$ soxi 116-288045-0000.flac.wav
Input File : '116-288045-0000.flac.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:00:10.65 = 469665 samples = 798.75 CDDA sectors
File Size : 939k
Bit Rate : 706k
Sample Encoding: 16-bit Signed Integer PCM
There might be a problem with some of this file's configuration, I've done some pre-processing to merge it with mp3s, convert from flac to wav, among others.
What's the easiest way now for me to get the transcription working?
Is it possible without re-sampling the files back down to 16kHz. Originally the flac files had a sample-rate of 16kHz, but I had to merge them with 44.1kHz mp3 files. Therefore there's some high-frequency information in them now that may be lost if resampled to 16k.

Resample the audio to 16000 samples then try again.
You can resample like this
sox file.wav -r 16000 file-16000.wav

Related

Invalid data error during ffmpeg .m4a conversion

I wanted to edit my .m4a voice recording from Samsung Voice Recorder using ffmpeg 2.2.2, however, I got the error Invalid data found when processing input. I tried to open it through Audacity, but it returned an error claiming that the ffmpeg library is missing, which is definitely not the case. Eventually I tried to use online .m4a to .mp3 converters, but they all returned error, so I assume there may be an issue with the encoding of the original file and ffmpeg should be configured accordingly. What settings shall I use? (The original file can be played on the phone without any problem.)
ffmpeg -ss 00:00:19 -i "C:\Your\Folder\original.m4a" edited.m4a

Ffmpeg pkt_pos vs. hls byterange differs

I have a single ts file and created a single-file m3u8 using ffmpeg. It looks like the following
#EXTM3U
#EXT-X-VERSION:4
#EXT-X-TARGETDURATION:1
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:0.000000,
#EXT-X-BYTERANGE:22184#0
video.ts
#EXTINF:1.000667,
#EXT-X-BYTERANGE:713836#22184
video.ts
#EXTINF:1.000667,
#EXT-X-BYTERANGE:549336#736020
video.ts
#EXTINF:1.000667,
#EXT-X-BYTERANGE:568324#1285356
video.ts
#EXTINF:1.000667,
#EXT-X-BYTERANGE:569264#1853680
video.ts
...
The m3u8 file works perfectly but in its creation, ffmpeg re-creates the ts file. I wanted to avoid this and thought I could simply create the m3u8 file myself. I used the following command to get the byte offset of keyframes. However, none of the keyframe locations agrees with the offsets in the m3u8 file.
ffprobe -loglevel error -skip_frame nokey -select_streams v:0 -show_entries frame=pkt_pos -of compact video.m3u8
frame|pkt_pos=22560
frame|pkt_pos=736396
frame|pkt_pos=1285732
...
All of the offsets disagree by 376 bytes. That number is twice the mpeg-ts package size (which is 188). Both locations contains the ASCII character "G" which is the package header for MPEG-TS.
How can I get the correct offset positions using ffprobe that I can use to create an HLS playlist? Does Ffmpeg just subtract 2 packages for safety, is it important?
ffmpeg re-creates the ts file (which is a byte-wise copy of the original)
No, it’s not a bytewise copy. Ffmpeg still parses and repackages the file. You you need an exact copy, downlod with curl, or wget.
The package the offset points to is a PAT (Program Association Table), followed by a PMT (Program Mapping Table). Each packet takes 188 bytes which makes a total of 376. After this meta data, the actual keyframe starts.
In simple cases the m3u8 offset can point to the keyframe directly and the file will play correctly. However, in general cases, it makes sense for the decoder to be given the list of programs right away when seeking into the middle of a transport stream.

FFMPEG DASH - Live Streaming a Sequence of MP3 Clips

I am attempting to create a online radio application using FFMPEG - an audio only DASH stream.
I have a directory of mp3 clips (all of the same bitrate and sample size) which I am encoding to the AAC format and outputting to a mpd.
This is the current command I am working with to stream a single mp3 file:
ffmpeg -re -i <input>.mp3 -c:a aac -use_timeline 1 -use_template 1 -window_size 5 -f dash <out>.mpd
(Input and output paths have been substituted for < input >.mp3 and < output >.mpd in this snippet)
I am running a web server and have made the mpd accessible on it. I am testing the stream using VLC player at the moment.
The problem:
Well, the command works, but it will only work for one clip at a time. Once the next command is run immediately proceeding the completion of the first, VLC player will halt and I need to refresh the player to continue.
I'm aiming for an uninterrupted stream wherein the clips play in sequence.
I imagine the problem is that a new mpd is being created with no reference to the previous one, and what I ought to be doing is appending segments to the existing mpd - but I don't know how to do that using FFMPEG.
The question: Is there such a command to append segments to a previously existing mpd file in FFMPEG? or am I coming at this problem all wrong? Perhaps I should be using FFMPEG to format the clips into these segments, but then adjusting the mpd file manually.
Any help or suggestions would be very much appreciated!

play stat -freq What does the output mean?

What does the output of play $file stat -freq mean?
I recently ran the command, here's a sample of the output:
$ play 44100Hz/3660/6517/3660-6517-0024.flac stat -freq
44100Hz/3660/6517/3660-6517-0024.flac:
File Size: 214k Bit Rate: 325k
Encoding: FLAC Info: Processed by SoX
Channels: 1 # 16-bit
Samplerate: 44100Hz
Replaygain: off
Duration: 00:00:05.28
In:0.00% 00:00:00.00 [00:00:05.28] Out:0 [ | ] Clip:0 0.000000 0.412632
10.766602 0.430416
21.533203 0.750785
32.299805 0.839694
43.066406 0.989763
53.833008 0.435572
64.599609 0.404773
75.366211 0.048392
86.132812 0.025195
96.899414 0.011314
...
In:3.52% 00:00:00.19 [00:00:05.09] Out:4.10k [ | ] Clip:0 0.000000 0.889006
10.766602 0.092675
21.533203 0.785106
32.299805 1.693663
43.066406 0.990839
53.833008 0.044969
64.599609 0.096066
75.366211 0.121797
86.132812 0.256809
96.899414 0.122486
107.666016 0.019195
...
How am I meant to understand this?
I hope that this is some Fourier transform and the above output represents a table like
Frequency | Level
But I don't know if that's the really case, or what level would be measured in were that the case.
And what do the lines starting with In:% mean? Ending with Clip:0 ....
Please can someone explain the output of this command to me.
From man page here:
The −freq option calculates the input’s power spectrum (4096 point DFT) instead of the statistics listed above. This should only be used
with a single channel audio file.
As you said, it is a Frequency / Level table.
So the last frequency is more or less the half of your sampling rate.
I tried it with a pure tone (generated in audacity) and it works quite well.
Be careful, if file length exceeds 4096 bytes per channel then you will see several sets of DFT, as the length of each DFT window is 4096. If so, then you will see several tables concatenated.
I don't have any '%'. Did you convert your audio file in mono as said in the documentation?
from man page here:
stat [-s scale] [-rms] [-freq] [-v] [-d]
Display time and frequency domain statistical information about the audio. Audio is passed unmodified through the SoX processing chain.
The information is output to the 'standard error' (stderr) stream and is calculated, where n is the duration of the audio in samples, c is the number of audio channels, r is the audio sample rate, and x k represents the PCM value (in the range -1 to +1 by default) of each successive sample in the audio, as follows:
...
The -freq option calculates the input's power spectrum (4096 point DFT) instead of the statistics listed above.
...

DTS Discontinuity Error while Playing Media File with RTSP Url

I am playing the media file on RTSP by fetching the streams directly from some server. I am getting DTS discontinuity in stream error. I have tried with both FFMPEG and FFPLAY.
FFMPEG
I am using the following ffmpeg command:
ffmpeg -i rtsp://media:123456#10.10.167.20/41415308b3839f2 -f wav test.wav
As an output of this command, I am getting the following error:
FFPLAY
I am using the following ffplay command:
ffplay rtsp://media:123456#10.10.167.20/41415308b3839f2
As an output of this command, I am getting the following error:
Can anyone please tell me that when this error usually occurs? Is there any reason behind this and any workaround for this?
From the libavformat/utils.c, avformat_find_stream_info function:
/* Check for a discontinuity in dts. If the difference in dts
* is more than 1000 times the average packet duration in the
* sequence, we treat it as a discontinuity. */
Also note, that RTP does not define any mechanisms for recovering for packet loss.
So, if you lose packets in such manner that the dts difference between two read packets is more than 1000 times the average packets duration you get foregoing warning.

Resources