ffmpeg produced .wav reads only zeros with scipy.io.wavfile - ffmpeg

Hi everyone and thanks for reading.
I wanted to do some analysis on a song using Python's scipy.io.wavfile. Since I only have the song as .mp3 I converted the file to .wav using ffmpeg the following way:
ffmpeg -i test.mp3 test.wav
The .wav file plays perfectly well with vlc player, but wavfile shows only zeroes when reading it:
from scipy.io import wavfile as wf
data = wf.read("test.wav")
C:\Program Files\Anaconda\lib\site-packages\scipy\io\wavfile.py:42: WavFileWarning: Unknown wave file format
warnings.warn("Unknown wave file format", WavFileWarning)
data
(44100, array([[0, 0],
[0, 0],
[0, 0],
...,
[0, 0],
[0, 0],
[0, 0]], dtype=int16))
I tried getting the data with Python's built-in wave module before to the same effect (only zeros).
I am using the 64bit version of ffmpeg (ffmpeg-20140218-git-61d5970-win64-static).
Any help is appreciated :-)
Edit: Included .wav header and tried forcing ffmpeg output format
I guess the header information of the .wav file is included here:
ffmpeg -i .\test.wav
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from '.\test.wav':
Metadata:
artist : Joe Cocker
copyright : (C) 1987 Capitol Records, Inc.
date : 1987
genre : Pop
title : Unchain My Heart
album : Unchain My Heart
track : 1/10
encoder : Lavf55.33.100
Duration: 00:05:04.33, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
If I try to specify the ffmpeg output format explicitly for the .mp3 conversion:
ffmpeg -i .\test.mp3 -f s16le -ar 44100 -ac 2 test.wav
Input #0, mp3, from '.\test.mp3':
Metadata:
title : Unchain My Heart
artist : Joe Cocker
album : Unchain My Heart
genre : Pop
composer : Bobby Sharp
track : 1/10
disc : 1/1
album_artist : Joe Cocker
copyright : (C) 1987 Capitol Records, Inc.
date : 1987
Duration: 00:05:04.35, start: 0.025056, bitrate: 240 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 235 kb/s
Stream #0:1: Video: mjpeg, yuvj420p(pc), 600x600 [SAR 1:1 DAR 1:1], 90k tbr, 90k tbn, 90k tbc
Metadata:
title :
comment : Cover (front)
Output #0, s16le, to 'test.wav':
Metadata:
title : Unchain My Heart
artist : Joe Cocker
album : Unchain My Heart
genre : Pop
composer : Bobby Sharp
track : 1/10
disc : 1/1
album_artist : Joe Cocker
copyright : (C) 1987 Capitol Records, Inc.
date : 1987
encoder : Lavf55.33.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (mp3 -> pcm_s16le)
Press [q] to stop, [?] for help
video:0kB audio:52425kB subtitle:0 data:0 global headers:0kB muxing overhead 0.000000%
size= 52425kB time=00:05:04.32 bitrate=1411.2kbits/s
But in this case (forced format), both ffmpeg and wavfile are not able to read the file:
ffmpeg -i .\test.wav
.\test.wav: Invalid data found when processing input
and
data = wf.read("test2.wav")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-fbbd84cb966b> in <module>()
----> 1 data = wf.read("test2.wav")
C:\Program Files\Anaconda\lib\site-packages\scipy\io\wavfile.pyc in read(filename, mmap)
152
153 try:
--> 154 fsize = _read_riff_chunk(fid)
155 noc = 1
156 bits = 8
C:\Program Files\Anaconda\lib\site-packages\scipy\io\wavfile.pyc in _read_riff_chunk(fid)
98 _big_endian = True
99 elif str1 != b'RIFF':
--> 100 raise ValueError("Not a WAV file.")
101 if _big_endian:
102 fmt = '>I'
ValueError: Not a WAV file.

I encountered the same problem. This seems to be a bug in FFmpeg that was introduced in October 2011 that was fixed April 29, 2014 (5e7d21c7ad02e37caa1bcb50ab8ad64e7d7fb86c). FFmpeg versions more recent than 2.3 (July 16, 2014) should write WAVs that numpy can read without error.

Related

Using ffmpeg to copy file with all streams intact (to fix any issues)

I am trying to use ffmpeg to copy files with all streams intact to fix various issues with files. ffmpeg is pretty good at reconstructing files and can remove various strangeness various encoders insert. The issue is that it fails on files with covers.
I am currently using:
ffmpeg -i input.ogg -map 0 -c copy output.ogg
But for example on this file it fails with:
Input #0, ogg, from 'KR369.ogg':
Duration: 01:32:53.80, start: 0.000000, bitrate: 164 kb/s
Stream #0:0: Audio: vorbis, 44100 Hz, stereo, fltp, 152 kb/s
Metadata:
ALBUM : Küchenradio.org
ENCODED-BY : auphonic.com
ARTIST : Philip Banse
TITLE : KR369 Hotel Berlin
PUBLISHER : Küchenstud.io
URL : http://kuechenstud.io/
DATE : 2014
GENRE : Podcast
RIGHTS-DATE : 2014
RIGHTS : 2014 CC BY SA
LICENSE : http://creativecommons.org/licenses/by-sa/3.0/de/
RIGHTS-URI : http://creativecommons.org/licenses/by-sa/3.0/de/
ENCODED_BY : auphonic.com
comment : DocPhil, Cindy, Frau Katja und Onkel Andi zu Besuch bei Susanne DeOcampo-Herrmann, der Chefköchin im Hotel Berlin. Wir waren lange nicht mehr zusammen und hatten wie gewohnt alle auf einmal viele Fragen zur: Küchenplanung, Hotelgeschichte, Filetier-Tech
Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 500x500 [SAR 1:1 DAR 1:1], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
comment : Cover (front)
[ogg # 0x55db241c2e80] Unsupported codec id in stream 1
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Last message repeated 1 times
There seems to be a known issue about it but I am trying to see if there is any workaround. Maybe to do something with multiple passes, first manually extracting everything and then reconstructing? Or if there is another tool to use?
You need to "unmap" the video stream, I believe using -map 0 -map -0:v will un-map the "video stream" (cover) so that you're left with only the audio.
See https://trac.ffmpeg.org/wiki/Map for more information.

Add coverart into ogg containing an opus audio stream with ffmpeg without re-encoding the audio stream

I'm trying to add a coverart into an ogg file with ffmpeg :
Here are my source.ogg and source.jpg files :
$ ffprobe -hide_banner source.ogg
Input #0, ogg, from 'source.ogg':
Duration: 00:03:02.45, start: 0.007500, bitrate: 73 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
$ identify source.jpg
source.jpg JPEG 480x360 480x360+0+0 8-bit DirectClass 15.1KB 0.000u 0:00.000
I tried this :
$ ffmpeg -hide_banner -i source.ogg -i source.jpg -map 0 -map 1 -c:a copy -c copy -map_metadata 0 dest.ogg -y && echo && ffprobe -hide_banner dest.ogg
Input #0, ogg, from 'source.ogg':
Duration: 00:03:02.45, start: 0.007500, bitrate: 73 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
Input #1, image2, from 'source.jpg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 3023 kb/s
Stream #1:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 480x360 [SAR 1:1 DAR 4:3], 25 tbr, 25 tbn, 25 tbc
[ogg # 0x5655578064c0] Unsupported codec id in stream 1
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #1:0 -> #0:1 (copy)
Last message repeated 1 times
[ogg # 0x5655577e8540] Format ogg detected only with low score of 1, misdetection possible!
dest.ogg: End of file
I've also found this answer but it does not explain how to do it with ffmpeg.
I've read about a "METADATA_BLOCK_PICTURE" metadata in the ogg container that might contain the picture in base64, so I tried this :
$ ffmpeg -hide_banner -i source.ogg -map 0 -c:a copy -c copy -metadata METADATA_BLOCK_PICTURE="$(base64 source.jpg)" dest.ogg
Input #0, ogg, from 'source.ogg':
Duration: 00:03:02.45, start: 0.007500, bitrate: 73 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
File 'dest.ogg' already exists. Overwrite ? [y/N] y
Output #0, ogg, to 'dest.ogg':
Metadata:
METADATA_BLOCK_PICTURE: /9j/4AAQSkZJRgABAQAAAQABAAD/2wCEABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkz
: ODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2MBERISGBUYLxoaL2NCOEJjY2NjY2NjY2Nj
..............................................................................
: nVmaS2E/urUWVbH6ORI9z2l8zyRfFpkLooIHSBuk9lFFoC6OBnP1SON8rEooqM2WOVHDdRRAAUVK
: KiiCWRRRRBJ//9k=
encoder : Lavf58.20.100
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
METADATA_BLOCK_PICTURE: /9j/4AAQSkZJRgABAQAAAQABAAD/2wCEABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkz
: ODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2MBERISGBUYLxoaL2NCOEJjY2NjY2NjY2Nj
: Y2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY//AABEIAWgB4AMBIgACEQED
..............................................................................
: nVmaS2E/urUWVbH6ORI9z2l8zyRfFpkLooIHSBuk9lFFoC6OBnP1SON8rEooqM2WOVHDdRRAAUVK
: KiiCWRRRRBJ//9k=
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size= 1658kB time=00:03:02.41 bitrate= 74.5kbits/s speed=1.01e+03x
video:0kB audio:1624kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.100392%
It kinda "worked", but neither ffplay nor mpv can parse the cover art :
$ ffplay -hide_banner dest.ogg
[ogg # 0x5655577e8540] Failed to parse cover art block.
Input #0, ogg, from 'dest.ogg':
Duration: 00:03:02.44, start: 0.000000, bitrate: 74 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
3.95 M-A: -0.000 fd= 0 aq= 14KB vq= 0KB sq= 0B f=0/0
$ mpv dest.ogg
Playing: dest.ogg
[ffmpeg/demuxer] ogg: Failed to parse cover art block.
(+) Audio --aid=1 (opus 2ch 48000Hz)
AO: [pulse] 48000Hz stereo 2ch float
A: 00:00:03 / 00:03:02 (2%)
Exiting... (Quit)
I alse tried -metadata:s:a along with the --wrap 0 of base64 (which I had forgotten to specify, oops :) ) :
$ ffmpeg -i source.ogg -map 0 -c:a copy -c copy -metadata:s:a METADATA_BLOCK_PICTURE="$(base64 --wrap 0 source.jpg)" dest.ogg
Input #0, ogg, from 'source.ogg':
Duration: 00:03:02.45, start: 0.007500, bitrate: 73 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
File 'dest.ogg' already exists. Overwrite ? [y/N] y
Output #0, ogg, to 'dest.ogg':
Metadata:
encoder : Lavf58.20.100
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
METADATA_BLOCK_PICTURE: /9j/4AAQSkZJRgABAQAAAQABAAD/2wCEABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2MBERISGBUYLxoaL2NCOEJjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY//AABEIAWgB4AMBIgACEQEDEQH/xAAaAAACAwEBAAAAAAAAAAA
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size= 1658kB time=00:03:02.41 bitrate= 74.5kbits/s speed=1.22e+03x
video:0kB audio:1624kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.084397%
But still the dest.ogg jpg coverart cannot be read properly :
$ ffprobe -hide_banner dest.ogg
[ogg # 0x5655577e8540] Invalid picture type: -2555936.
[ogg # 0x5655577e8540] Could not read mimetype from an attached picture.
Input #0, ogg, from 'dest.ogg':
Duration: 00:03:02.44, start: 0.000000, bitrate: 74 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
Metadata:
DURATION : 00:03:02.441000000
ENCODER : Lavf58.20.100
Can you please help me ?
FFmpeg version 4.4 automatically supports embedding album art into Ogg containers with the Theora video codec (see "Ogg codecs" on Wikipedia for a list of supported codecs, although they may not all be supported by FFmpeg).
This is not the the same as MP3 files, which store album art as binary encoded strings in special purpose tags. This allows media players to correctly detect it as an audio file (e.g. with mpv's --audio-display option) and prevent frame redrawing during playback. Ogg containers do not support this functionality, so FFmpeg simply adds a regular video stream to the file. The framerate of this video stream is set (at least for JPEGs) to 90000 resulting in a harmless warning.
This does not decrease performance at least with mpv, which only redraws as fast as the screen refresh rate allows. Only a single frame is encoded in the video stream, which can be manually verified by running ffprobe -v error -select_streams v:0 -count_packets -show_entries stream=nb_read_packets -of csv=p=0 input.ogg as suggested in this answer. The framerate can be manually set to 1 with the -r:v 1 option if desired. See the comments for additional discussion.
Here's an example converting an MP3 file with a video track containing album art to an Ogg file with Opus encoded audio and Theora encoded video:
$ ffprobe -hide_banner '01 - State of Grace.mp3'
[mp3 # 0x5594cbafe320] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '01 - State of Grace.mp3':
Metadata:
lyrics-eng :
copyright : š 2012 Big Machine Records, LLC.
title : State of Grace
album_artist : Taylor Swift
album : Red (Deluxe Version)
date : 2012
track : 01/22
genre : Country
composer : Taylor Swift
disc : 1/1
comment : Taylor Swift
Duration: 00:04:55.81, start: 0.000000, bitrate: 321 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 320 kb/s
Stream #0:1: Video: mjpeg (Baseline), yuvj444p(pc, bt470bg/unknown/unknown), 600x600 [SAR 72:72 DAR 1:1], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
title : Cover
comment : Cover (front)
$ ffmpeg -hide_banner -i '01 - State of Grace.mp3' -c:a libopus -b:a 128000 -c:v libtheora -q:v 10 '01 - State of Grace.ogg'
[mp3 # 0x55ebe6d3cc40] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '01 - State of Grace.mp3':
Metadata:
lyrics-eng :
copyright : š 2012 Big Machine Records, LLC.
title : State of Grace
album_artist : Taylor Swift
album : Red (Deluxe Version)
date : 2012
track : 01/22
genre : Country
composer : Taylor Swift
disc : 1/1
comment : Taylor Swift
Duration: 00:04:55.81, start: 0.000000, bitrate: 321 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 320 kb/s
Stream #0:1: Video: mjpeg (Baseline), yuvj444p(pc, bt470bg/unknown/unknown), 600x600 [SAR 72:72 DAR 1:1], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
title : Cover
comment : Cover (front)
Stream mapping:
Stream #0:1 -> #0:0 (mjpeg (native) -> theora (libtheora))
Stream #0:0 -> #0:1 (mp3 (mp3float) -> opus (libopus))
Press [q] to stop, [?] for help
[swscaler # 0x55ebe6db69e0] deprecated pixel format used, make sure you did set range correctly
[ogg # 0x55ebe6d44c80] Frame rate very high for a muxer not efficiently supporting it.
Please consider specifying a lower framerate, a different muxer or -vsync 2
Output #0, ogg, to '01 - State of Grace.ogg':
Metadata:
lyrics-eng :
copyright : š 2012 Big Machine Records, LLC.
title : State of Grace
album_artist : Taylor Swift
album : Red (Deluxe Version)
date : 2012
track : 01/22
genre : Country
composer : Taylor Swift
disc : 1/1
comment : Taylor Swift
encoder : Lavf58.76.100
Stream #0:0: Video: theora, yuv444p(tv, bt470bg/unknown/unknown, progressive), 600x600 [SAR 1:1 DAR 1:1], q=2-31, 200 kb/s, 90k fps, 90k tbn (attached pic)
Metadata:
title : Cover
DESCRIPTION : Cover (front)
encoder : Lavc58.134.100 libtheora
lyrics-eng :
copyright : š 2012 Big Machine Records, LLC.
ALBUMARTIST : Taylor Swift
album : Red (Deluxe Version)
date : 2012
TRACKNUMBER : 01/22
genre : Country
composer : Taylor Swift
DISCNUMBER : 1/1
Stream #0:1: Audio: opus, 48000 Hz, stereo, flt, 128 kb/s
Metadata:
encoder : Lavc58.134.100 libopus
lyrics-eng :
copyright : š 2012 Big Machine Records, LLC.
title : State of Grace
ALBUMARTIST : Taylor Swift
album : Red (Deluxe Version)
date : 2012
TRACKNUMBER : 01/22
genre : Country
composer : Taylor Swift
DISCNUMBER : 1/1
DESCRIPTION : Taylor Swift
[mp3float # 0x55ebe6d96360] Header missing time=00:04:31.63 bitrate= 0.1kbits/s speed=59.8x 64x
Error while decoding stream #0:0: Invalid data found when processing input
frame= 1 fps=0.2 q=-0.0 Lsize= 4929kB time=00:04:55.79 bitrate= 136.5kbits/s speed=59.8x
video:58kB audio:4830kB subtitle:0kB other streams:0kB global headers:3kB muxing overhead: 0.845459%
$ mpv '01 - State of Grace.ogg'
(+) Video --vid=1 'Cover' (theora 600x600)
(+) Audio --aid=1 'State of Grace' (opus 2ch 48000Hz)
AO: [alsa] 48000Hz stereo 2ch float
VO: [gpu] 600x600 yuv444p
(Paused) AV: -00:00:00 / 00:04:55 (0%)
Exiting... (Quit)
$
Note that the -q:v 10 Theora video codec option is used for the highest possible video quality. Without this option the album art is extremely low resolution by default, and the size difference when using the highest quality is negligible since only a single frame is being encoded.
This requires FFmpeg to be built with libtheora (and libopus for Opus encoded audio). Here is the output of ffmpeg -codecs with unrelated codecs removed and better formatting:
$ ffmpeg -codecs
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11.1.0
configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64
--docdir=/usr/share/doc/ffmpeg-4.4.1-r1/html --mandir=/usr/share/man
--enable-shared --cc=x86_64-pc-linux-gnu-gcc
--cxx=x86_64-pc-linux-gnu-g++ --ar=x86_64-pc-linux-gnu-ar
--nm=x86_64-pc-linux-gnu-nm --ranlib=x86_64-pc-linux-gnu-ranlib
--pkg-config=x86_64-pc-linux-gnu-pkg-config --optflags='-O2 -pipe
-march=native -ggdb3' --extra-libs= --enable-static --enable-avfilter
--enable-avresample --disable-stripping --disable-optimizations
--disable-libcelt --enable-nonfree --disable-indev=v4l2
--disable-outdev=v4l2 --disable-indev=oss --disable-indev=jack
--disable-indev=sndio --disable-outdev=oss --disable-outdev=sndio
--enable-bzlib --enable-runtime-cpudetect --disable-debug
--disable-gcrypt --enable-gnutls --disable-gmp --enable-gpl
--disable-hardcoded-tables --enable-iconv --disable-libxml2 --enable-lzma
--enable-network --disable-opencl --enable-openssl --enable-postproc
--disable-libsmbclient --disable-ffplay --disable-sdl2 --disable-vaapi
--disable-vdpau --disable-vulkan --enable-xlib --enable-libxcb
--enable-libxcb-shm --enable-libxcb-xfixes --enable-zlib
--disable-libcdio --disable-libiec61883 --disable-libdc1394
--disable-libcaca --enable-openal --enable-opengl --disable-libv4l2
--disable-libpulse --disable-libdrm --disable-libjack
--disable-libopencore-amrwb --disable-libopencore-amrnb
--disable-libcodec2 --enable-libdav1d --disable-libfdk-aac
--disable-libopenjpeg --disable-libbluray --disable-libgme
--disable-libgsm --disable-libaribb24 --disable-mmal --disable-libmodplug
--enable-libopus --disable-libilbc --disable-librtmp --disable-libssh
--disable-libspeex --disable-libsrt --disable-librsvg --disable-ffnvcodec
--disable-libvorbis --disable-libvpx --disable-libzvbi --disable-appkit
--disable-libbs2b --disable-chromaprint --disable-cuda-llvm
--disable-libflite --disable-frei0r --disable-libfribidi
--enable-fontconfig --disable-ladspa --disable-libass
--disable-libtesseract --disable-lv2 --disable-libfreetype
--disable-libvidstab --disable-librubberband --disable-libzmq
--disable-libzimg --disable-libsoxr --enable-pthreads
--disable-libvo-amrwbenc --disable-libmp3lame --disable-libkvazaar
--enable-libaom --disable-libopenh264 --disable-librav1e
--disable-libsnappy --enable-libtheora --disable-libtwolame
--disable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid
--disable-gnutls --disable-armv5te --disable-armv6 --disable-armv6t2
--disable-neon --disable-vfp --disable-vfpv3 --disable-armv8
--disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu --disable-altivec
--disable-vsx --disable-power8 --disable-amd3dnow --disable-amd3dnowext
--disable-aesni --disable-avx --disable-avx2 --disable-fma3
--disable-fma4 --disable-sse3 --disable-ssse3 --disable-sse4
--disable-sse42 --disable-xop --cpu=host --disable-doc
--disable-htmlpages --enable-manpages
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Codecs:
D..... = Decoding supported
.E.... = Encoding supported
..V... = Video codec
..A... = Audio codec
..S... = Subtitle codec
...I.. = Intra frame-only codec
....L. = Lossy compression
.....S = Lossless compression
-------
[...]
DEV.L. theora Theora (encoders: libtheora )
[...]
DEAIL. opus Opus (Opus Interactive Audio Codec)
(decoders: opus libopus ) (encoders: opus libopus )
[...]
$
FFmpeg can also add album art (or any video track) from a separate file, instead of directly mapping the original album art to the output. Here's an example of how to extract the original MJPEG album art as a separate file, then pass it back in and using the -map option to only use the audio track from the MP3 and the video track from the MJPEG (I removed most of the output of the commands since they are basically the same):
$ ffmpeg -i '01 - State of Grace.mp3' -map 0:v -c:v copy '01 - State of Grace.jpg'
[...]
$ ffmpeg -i '01 - State of Grace.mp3' -i '01 - State of Grace.jpg' -map 0:a -map 1:v '01 - State of Grace.ogg'
[...]
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (mp3float) -> flac (native))
Stream #1:0 -> #0:1 (mjpeg (native) -> theora (libtheora))
[...]
I also omitted the audio and video codecs and their options (which I wouldn't suggest) so FFmpeg used FLAC as the default audio codec and Theora as the default video codec for an Ogg container.
Hope this helps!
This works for me:
ffmpeg -i mysong.ogg -i coverart.jpg song_with_art.ogg

Get lengths of intermediate concatenated files in ffmpeg filter-complex

I'm writing a media Electron app that occasionally needs to individually trim => individually normalize => concatenate => convert a varying number of WAV files into MP3.
I've successfully used FFMPEG (via Fluent-ffmpeg) to do so (command wrapped for visibility):
ffmpeg -i 3.301_to_8.752_Careful.wav -i 8.752_to_18.751_Careful.wav -y
-filter_complex
[0]silenceremove=start_periods=1:start_threshold=-50dB[mid];[mid]loudnorm=I=-16:TP=-1.5:LRA=11[out];
[1]silenceremove=start_periods=1:start_threshold=-50dB[mid];[mid]loudnorm=I=-16:TP=-1.5:LRA=11[b];
[out][b]concat=v=0:a=1[out]
-b:a 128k -ac 1 -acodec libmp3lame -f mp3 -map [out] -y Careful_Merged.mp3
Here's the relevant parts of the output:
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '3.301_to_8.752_Careful.wav':
Duration: 00:00:05.50, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from '8.752_to_18.751_Careful.wav':
Duration: 00:00:10.30, bitrate: 705 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Stream mapping:
Stream #0:0 (pcm_s16le) -> silenceremove
Stream #1:0 (pcm_s16le) -> silenceremove
concat -> Stream #0:0 (libmp3lame)
Press [q] to stop, [?] for help
Output #0, mp3, to 'Careful_Merged.mp3':
Metadata:
TSSE : Lavf58.28.101
Stream #0:0: Audio: mp3 (libmp3lame), 48000 Hz, mono, fltp, 128 kb/s (default)
Metadata:
encoder : Lavc58.53.101 libmp3lame
size= 246kB time=00:00:15.69 bitrate= 128.4kbits/s speed=33.8x
video:0kB audio:246kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.170563%
The benefit is that the process chains nicely, but the downside is that I don't know the resulting length of the intermediate files after the automagic trim. My app needs a "table of contents" showing the start and end of each segment, but I can't figure out how to export the duration of each trimmed file before concatenation between
[0]silenceremove=start_periods=1:start_threshold=-50dB[mid];[mid]loudnorm=I=-16:TP=-1.5:LRA=11[out] and the concatenation [out][b]concat=v=0:a=1[out].
It's so elegant as it is, I'd prefer not to save the intermediate files to disk or trim the audio twice to get the info I need. All I need is a duration for each, and I can do the math.
Is there a filter that I can put inline to export the intermediate duration, or a way to get a log of the concatenation action?
Any ideas?
If you run the command with -loglevel verbose or -v 40, concat will log segment EOF events with timings in microseconds e.g.
[Parsed_concat_0 # 00000000037e3440] EOF on in0:v0, 0 streams left in segment.
[Parsed_concat_0 # 00000000037e3440] Segment finished at pts=5500000
[Parsed_concat_0 # 00000000037e3440] EOF on in1:v0, 0 streams left in segment.
[Parsed_concat_0 # 00000000037e3440] Segment finished at pts=6533333
The first segment was 5.5s long, and the 2nd was 6533333 us - 5500000 us = 1.03s long.

ffmpeg: "Referenced QT chapter track not found"

Using ffmpeg to replace audio in a QuickTime with audio from a WAV.
Anyone know why I'm getting Referenced QT chapter track not found?
Command:
$ ffmpeg \
-i "$video" -t 25 \
-i "$audio" -map 0:v -c:v copy -map 1:a -c:a pcm_s24le -ar 48000 \
-hide_banner "$output"
Output:
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7faf62010600] Referenced QT chapter track not found
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mov':
Metadata:
major_brand : qt
minor_version : 537199360
compatible_brands: qt
creation_time : 2018-11-06T09:27:43.000000Z
Duration: 00:00:25.00, start: 0.000000, bitrate: 186987 kb/s
Stream #0:0(eng): Video: prores (apch / 0x68637061), yuv422p10le(bt709, progressive), 1920x1080, 185115 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc (default)
Metadata:
creation_time : 2018-11-06T09:27:43.000000Z
handler_name : Apple Alias Data Handler
encoder : Apple ProRes 422 (HQ)
timecode : 00:00:00:00
Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
creation_time : 2018-11-06T09:27:43.000000Z
handler_name : Apple Alias Data Handler
timecode : 00:00:00:00
Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
Metadata:
creation_time : 2018-11-06T09:27:43.000000Z
handler_name : Apple Alias Data Handler
timecode : 00:00:00:00
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, wav, from 'audio.wav':
Metadata:
encoded_by : Pro Tools
originator_reference: aaOpKJaTN7Nk
date : 2018-11-08
creation_time : 13:53:50
time_reference : 166698000
Duration: 00:00:25.00, bitrate: 2128 kb/s
Stream #1:0: Audio: pcm_s24le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s32 (24 bit), 2116 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #1:0 -> #0:1 (pcm_s24le (native) -> pcm_s24le (native))
Press [q] to stop, [?] for help
Output #0, mov, to 'test19.mov':
Metadata:
major_brand : qt
minor_version : 537199360
compatible_brands: qt
encoder : Lavf58.12.100
Stream #0:0(eng): Video: prores (apch / 0x68637061), yuv422p10le(bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 185115 kb/s, 0.04 fps, 25 tbr, 12800 tbn, 25 tbc (default)
Metadata:
creation_time : 2018-11-06T09:27:43.000000Z
handler_name : Apple Alias Data Handler
encoder : Apple ProRes 422 (HQ)
timecode : 00:00:00:00
Stream #0:1: Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz, stereo, s32 (24 bit), 2304 kb/s
Metadata:
encoder : Lavc58.18.100 pcm_s24le
frame= 625 fps=277 q=-1.0 Lsize= 566343kB time=00:00:24.96 bitrate=185876.0kbits/s speed=11.1x
video:564928kB audio:1406kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001496%
Same error with -map 0:v:0
Output looks to be created without errors.
What the error means is that MOV header indicates that a text track with chapter titles and timestamps is present but FFmpeg can't actually find that track in the file.
Adding -ignore_chapters 1 before -i "$video" will stop ffmpeg from looking for that track.
While traversing my personal video collection with ffprobe, I ran into several such errors, including:
[mov,mp4,m4a,3gp,3g2,mj2 # 000000000039a980] Referenced QT chapter
track not found
and...
[mp3float # 000000000079bb80] Header missing
From what I can tell, these are non-fatal errors that seem to simply be an indicator of the video's original encoder doing a bad or incomplete job, and so for a video collection that's large enough and sourced from enough disparate sources, I'd guess that it would be a mathematical likelihood to run into at least a few of these errors while analysing the entire collection. For obvious reasons, however, it makes sense to want to suppress these errors since they aren't needed and look unseemly in STDOUT.
There seems to be no such -ignore-chapters option for ffprobe, but I was able to suppress all of these non-fatal errors by adding -v fatal, which changes the command's loglevel to only display fatal errors (the default shows fatal and non-fatal errors, warnings and extra information). This option doesn't suppress ffprobe's output, which is printed as normal.

ffmpeg for android, out of memory on decoding rotate mp4 file with openh264

Version
ffmpeg : 4.0.2
openh264: 1.8.0
Problem
I try to trim a .mp4 file which metadata info contains rotate info, but I failed with the error information.
The file stream info :
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2018-10-09T09:40:53.000000Z
location : +39.8983+116.4145/
location-eng : +39.8983+116.4145/
com.android.version: 6.0
Duration: 00:00:10.56, start: 0.000000, bitrate: 8671 kb/s
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 8563 kb/s, SAR 1:1 DAR 16:9, 30.01 fps, 30 tbr, 90k tbn, 180k tbc (default)
Metadata:
rotate : 180
creation_time : 2018-10-09T09:40:53.000000Z
handler_name : VideoHandle
Side data:
displaymatrix: rotation of -180.00 degrees
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
creation_time : 2018-10-09T09:40:53.000000Z
handler_name : SoundHandle
ffmpeg cmd
ffmpeg -y -i 1.mp4 -threads 4 -b:v 2000k -vcodec libopenh264 -acodec copy -ss 0 -t 3 -f mp4 -movflags faststart -strict -2 ./output.mp4
result
Error reinitializing filters!
Failed to inject frame into filter network: Out of memory
Error while processing the decoded data for stream #0:0
Then I found this answer: ffmpeg-for-android-out-of-memory, after i added -noautorotate command to my cmd, the video is trimmed successful.
If I use -vcodec copy instead of -vcodec libopenh264, the result also is ok, I wonder if there is a bug when libopenh264 decode with ffmpeg's autorotate function.
I wipe the video's rotate info from metadata with -metadata:s:v:0 command, the newly video can be trimmed successful with the origin cmd :(
The error message is misleading. Due to your usage of --disable-filters you need to manually enable the hflip/vflip filters:
--enable-filter=aresample,crop,hflip,scale,transpose,vflip
Some filters (such as the format filter) will be automatically enabled in this case.

Resources