FFmpeg gap in concatenated audio after splitting audio - ffmpeg

Not sure if it's a gap or just misaligned audio samples, but when i split an audio file in two, like this:
ffmpeg -ss 0 -t 00:00:15.00 -i song.mp3 seg1.mp3
and
ffmpeg -ss 00:00:15.00 -t 15 -i song.mp3 seg2.mp3
and then combine them again with concat filter:
ffmpeg -i 'concat:seg1.mp3|seg2.mp3' out.mp3
There is a distinct "pop" between the segments. How can i make this seamless?
I see this on seg2.mp3:
Duration: 00:00:15.05, start: 0.025057, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s
Why is "start" not 0? That could be the gap.

If you want to eliminate the gap I recommend using the atrim and concat filters:
ffmpeg -i input -filter_complex \
"[0:a]atrim=end=15[a0]; \
[0:a]atrim=start=15:end=30[a1]; \
[a0][a1]concat=n=2:v=0:a=1" \
output.mp3
Note that MP3 files may have silence/delay at the beginning and end, so using individually encoded segments is not ideal.

Related

Concatenating audio files with ffmpeg results in a wrong total duration

With "wrong total duration" I mean a total duration different from the sum of individual duration of audio files.
sum_duration_files != duration( concatenation of files )
In particular I am concatenating 2 OGG audio files with this command
ffmpeg -safe 0 -loglevel quiet \
-f concat -segment_time_metadata 1 -i {m3u_file_name} \
-vf select=concatdec_select \
-af aselect=concatdec_select,aresample=async=1 \
{ogg_file_name}
And I get the following
# Output of: ffprobe <FILE>.ogg
======== files_in
Input #0, ogg, from 'f1.ogg':
Duration: 00:00:04.32, start: 0.000000, bitrate: 28 kb/s
Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
Input #0, ogg, from 'f2.ogg':
Duration: 00:00:00.70, start: 0.000000, bitrate: 68 kb/s
Stream #0:0: Audio: vorbis, 44100 Hz, mono, fltp, 160 kb/s
Metadata:
ENCODER : Lavc57.107.100 libvorbis
Note durations: 4.32 and 0.7 sec
And this is the output file.
========== files out (concatenate of files_in)
Input #0, ogg, from 'f_concat_v1.ogg':
Duration: 00:00:04.61, start: 0.000000, bitrate: 61 kb/s
Stream #0:0: Audio: vorbis, 48000 Hz, mono, fltp, 80 kb/s
Metadata:
ENCODER : Lavc57.107.100 libvorbis
Duration: 4.61 sec
As 4.61 sec != 4.32 + 0.7 sec I have a problem.
The issue here is using a wrong concatenation approach for these files. As FFmpeg wiki article suggests, file-level concatenation (-f concat) requires all files in the listing to have the exact same codec parameters. In your case, only # of channels (mono) and sample format (flt) are common between them. On the other hand, codec (opus vs. vorbis) and sampling rate (48000 vs. 44100) are different.
-f concat grabs the first set of parameters and runs with it. In your case, it uses 48000 S/s for all the files. Although the second file is 44100 S/s, it assumes 48k (so it'll play it faster than it is). I don't know how the difference in the codec played out in the output.
So, a standard approach is to use -filter_complex concat=a=1:v=1:n=2 with these files given as separate inputs.
Out of curiosity, have you listen to the wrong-duration output file? [edit: never mind, your self-answer indicates one of them is a silent track]
I don't know WHY it happens, but I know how to avoid the problem in my particular case.
My case:
I am mixing (concatenating) different audio files generated by one single source with silence files generated by me.
Initially I generated the silence files with
# x is a float from python
ffmpeg -f lavfi -i anullsrc=r=44100:cl=mono -t {x:2.1f} -q:a 9 -acodec libvorbis silence-{x:2.1f}.ogg
Trying to resolve the issue I re-created those silences with the SAME parameters than the audios I was mixing with, that is (mono at 48Khz):
ffmpeg -f lavfi -i anullsrc=r=48000:cl=mono -t {x:2.1f} -c:a libvorbis silence-{x:2.1f}.ogg
And now ffprobe shows the expected result.
========== files out (concatenate of files_in)
Input #0, ogg, from 'f_concat_v2.ogg':
Duration: 00:00:05.02, start: 0.000000, bitrate: 56 kb/s
Stream #0:0: Audio: vorbis, 48000 Hz, mono, fltp, 80 kb/s
Metadata:
ENCODER : Lavc57.107.100 libvorbis
Duration: 5.02 = 4.32 + 0.70
If you want to avoid problems when concatenating silence with other sounds, do create the silence with the SAME parameters than the sound you will mix with (mono/stereo and Hz)
==== Update 2022-03-08
Using the info provided by #kesh I have recreated the silent ogg files using
ffmpeg -f lavfi -i anullsrc=r=48000:cl=mono -t 5.8 -c:a libopus silence-5.8.ogg
And now the
ffmpeg -safe 0 -f concat -segment_time_metadata 1
-i {m3u_file_name}
-vf select=concatdec_select
-af aselect=concatdec_select,aresample=async=1 {ogg_file_name}
Doesn't throw this error anymore (multiple times).
[opus # 0x558b2c245400] Error parsing the packet header.
Error while decoding stream #0:0: Invalid data found when processing input
I must say that the error was not creating (for me) any problem, because the output was what I expected, but now I feel better without it.

ffmpeg how add header info into pcm?

I use this cmd convert s16le to pcmu8, but will lost header info.
ffmpeg -i s16le.wav -f u8 pcmu8.wav
ffmpeg -i pcmu8.wav
# pcmu8.wav: Invalid data found when processing input
I want known, how add this header info into pcmu8.wav?
It should be this:
ffmpeg -i pcmu8.wav
#Input #0, wav, from 'pcmu8.wav':
# Duration: 00:13:39.20, bitrate: 64 kb/s
# Stream #0:0: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 8000 Hz, mono, u8, 64 kb/s
Your first command is outputting to a raw bitstream, not a WAV, so adding a header won't help. Instead use
ffmpeg -i s16le.wav -c:a pcm_u8 pcmu8.wav

ignore "channel_layout" when working with multichannel audio in ffmpeg

I'm working with multichannel audio files (higher-order ambisonics), that typically have at least 16 channels.
Sometimes I'm only interested in a subset of the audiochannels (e.g. the first 25 channels of a file that contains even more channels).
For this I have a script like the following, that takes a multichannel input file, an output file and the number of channels I want to extract:
#!/bin/sh
infile=$1
outfile=$2
channels=$3
channelmap=$(seq -s"|" 0 $((channels-1)))
ffmpeg -y -hide_banner \
-i "${infile}" \
-filter_complex "[0:a]channelmap=${channelmap}" \
-c:a libopus -mapping_family 255 -b:a 160k -sample_fmt s16 -vn -f webm -dash 1 \
"${outfile}"
The actual channel extraction is done via the channelmap filter, that is invoked with something like -filter:complex "[0:a]channelmap=0|1|2|3"
This works great with 1,2,4 or 16 channels.
However, it fails with 9 channels, and 25 and 17 (and generally anything with >>16 channels).
The error I get is:
$ ffmpeg -y -hide_banner -i input.wav -filter_complex "[0:a]channelmap=0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16" -c:a libopus -mapping_family 255 -b:a 160k -sample_fmt s16 -vn -f webm -dash 1 output.webm
Input #0, wav, from 'input.wav':
Duration: 00:00:09.99, bitrate: 17649 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 25 channels, s16, 17640 kb/s
[Parsed_channelmap_0 # 0x5568874ffbc0] Output channel layout is not set and cannot be guessed from the maps.
[AVFilterGraph # 0x5568874fff40] Error initializing filter 'channelmap' with args '0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16'
Error initializing complex filters.
Invalid argument
So ffmpeg cannot guess the channel layout for a 17 channel file.
ffmpeg -layouts only lists channel layouts with 1,2,3,4,5,6,7,8 & 16.
However, I really don't care about the channel layout. The entire concept of "channel layout" is centered around the idea, that each audio channel should go to a different speaker.
But my audio channels are not speaker feeds at all.
So I tried providing explicit channel layouts, with something like -filter_complex "[0:a]channelmap=map=0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16:channel_layout=unknown", but this results in an error when parsing the channel layout:
$ ffmpeg -y -hide_banner -i input.wav -filter_complex "[0:a]channelmap=map=0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16:channel_layout=unknown" -c:a libopus -mapping_family 255 -b:a 160k -sample_fmt s16 -vn -f webm -dash 1 output.webm
Input #0, wav, from 'input.wav':
Duration: 00:00:09.99, bitrate: 17649 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 25 channels, s16, 17640 kb/s
[Parsed_channelmap_0 # 0x55b60492bf80] Error parsing channel layout: 'unknown'.
[AVFilterGraph # 0x55b604916d00] Error initializing filter 'channelmap' with args 'map=0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16:channel_layout=unknown'
Error initializing complex filters.
Invalid argument
I also tried values like any, all, none, 0x0 and 0xFF with the same result.
I tried using mono (as the channels are kind-of independent), but ffmpeg is trying to be clever and tells me that a mono layout must not have 17 channels.
I know that ffmpeg can handle multi-channel files without a layout.
E.g. converting a 25-channel file without the -filter_complex "..." works without problems, and ffprobe gives me an unknown channel layout.
So: how do I tell ffmpeg to just not care about the channel_layout when creating an output file that only contains a subset of the input channels?
Based on Audio Channel Manipulation you could try splitting into n separate streams the amerge them back together:
-filter_complex "\
[0:a]pan=mono|c0=c0[a0];\
[0:a]pan=mono|c0=c1[a1];\
[0:a]pan=mono|c0=c2[a2];\
[0:a]pan=mono|c0=c3[a3];\
[0:a]pan=mono|c0=c4[a4];\
[0:a]pan=mono|c0=c5[a5];\
[0:a]pan=mono|c0=c6[a6];\
[0:a]pan=mono|c0=c7[a7];\
[0:a]pan=mono|c0=c8[a8];\
[0:a]pan=mono|c0=c9[a9];\
[0:a]pan=mono|c0=c10[a10];\
[0:a]pan=mono|c0=c11[a11];\
[0:a]pan=mono|c0=c12[a12];\
[0:a]pan=mono|c0=c13[a13];\
[0:a]pan=mono|c0=c14[a14];\
[0:a]pan=mono|c0=c15[a15];\
[0:a]pan=mono|c0=c16[a16];\
[a0][a1][a2][a3][a4][a5][a6][a7][a8][a9][a10][a11][a12][a13][a14][a15][a16]amerge=inputs=17"
Building on the answer from #aergistal, and working with an mxf file with 10 audio streams, I had to modify the filter in order to specify the input to every pan filter. Working with "pan=mono" it only uses one channel identified as c0
-filter_complex "\
[0:a:0]pan=mono|c0=c0[a0];\
[0:a:1]pan=mono|c0=c0[a1];\
[0:a:2]pan=mono|c0=c0[a2];\
[0:a:3]pan=mono|c0=c0[a3];\
[0:a:4]pan=mono|c0=c0[a4];\
[0:a:5]pan=mono|c0=c0[a5];\
[0:a:6]pan=mono|c0=c0[a6];\
[0:a:7]pan=mono|c0=c0[a7];\
[0:a:8]pan=mono|c0=c0[a8];\
[0:a:9]pan=mono|c0=c0[a9];\
[a0][a1][a2][a3][a4][a5][a6][a7][a8][a9]amerge=inputs=10"

convert raw audio to mp3(wav) with adpcm_ima_wav codec

I want to convert raw audio(binary) to audio file(mp3, wav etc) with same audio info as originals'.
Here's video(mp4) file that has audio stream's, and following is the audio stream info pulled out from ffmpeg.
Stream #0:1(eng): Audio: adpcm_ima_wav (ms[0][17] / 0x1100736D), 32000 Hz, 2 channels, s16p, 256 kb/s (default)
I used,
ffmpeg.exe -f s16le -ar 32000 -ac 1 -i raw_audio.raw -acodec copy output.wav
Seems converting process is finished okay, but the problem is, if I listen the output.wav, there's the big noise from output wav file. Also, it's not the same audio from original video.
I tried specifying "adpcm_ima_wav" codec with "-f" switch, but it doesn't work.
Any suggenstion please?
by the way I know how to extract audio from video with ffmpeg, I just want to convert RAW audio binary data to .WAV or .MP3
(ffmpeg.exe -f test.mp4 -map 0:a:0 audio.mp3)

How do I get audio files of a specific file size?

Is there any way to use ffmpeg to accurately break audio files into smaller files of a specific file size, or pull a specific number of samples from a file?
I'm working with a speech-to-text API that needs audio chunks in exactly 160,000 bytes, or 80,000 16-bit samples.
I have a video stream, and I have an ffmpeg command to extract audio from it:
ffmpeg -i "rtmp://MyFMSWorkspace/ingest/test/mp4:test_1000 live=1" -ar 16000 -f segment -segment_time 10 out%04d.wav
So now I have ~10 second audio chunks with a sample rate of 16 kHz. Is there any way to break this into exactly 160kb, 5 second files using ffmpeg?
I tried this:
ffmpeg -t 00:00:05.00 -i out0000.wav outCropped.wav
But the output was this:
Input #0, wav, from 'out0000.wav':
Metadata:
encoder : Lavf56.40.101
Duration: 00:00:10.00, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
Output #0, wav, to 'outCropped.wav':
Metadata:
ISFT : Lavf56.40.101
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc56.60.100 pcm_s16le
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
size= 156kB time=00:00:05.00 bitrate= 256.1kbits/s
but now the size is 156kb
EDIT:
My finished command is:
ffmpeg -i "url" -map 0:1 -af aresample=16000,asetnsamples=16000 -f segment -segment_time 5 -segment_format sw out%04d.sw
That output looks perfectly right. That ffmpeg size is expressed in KiB although it says kB. 160000 bytes = 156.25 kB + some header data. ffmpeg shows size with fractional part hidden. If you want a raw file, with no headers, output to .raw instead of .wav.
For people converting video files to MP3s split into 30 minute segments:
ffmpeg -i "something.MP4" -q:a 0 -map a -f segment -segment_time 1800 FileNumber%04d.mp3
The -q option can only be used with libmp3lame and corresponds to the LAME -V option (source)

Resources