Google Speech to Text API gives a 00:00 audio length

Google Speech to Text API gives a 00:00 audio length - google-api

I have an audio clip that is about 40 minutes long. I uploaded it to GCS and use the URI for audio configuration. The audio assessment gave an estimated duration of 00:00, which is apparently wrong (see image below). The transcription result is empty as well.
This shows the audio assessment that gave 00:00.
The file was originally in .m4a format. I changed it to other formats (.wav
and .flac), but they also gave a 00:00 length. The API only worked when I trimmed the audio file to the first 40 seconds and 100 seconds. It failed when I trimmed the audio file to the first 10 minutes.
Please advise if you have any idea about this problem. Thanks!

I solved my problem by converting the .m4a file with an online tool, e.g. convertio.co to a FLAC file. Then the API works like a charm.
Originally, I directly changed the extension m4a to flac in my Windows computer. It turns out that this does not do the trick.

Related

Inexperienced with videos and looking for advice for dealing with incorrect avi framerate and possible alternatives

Hi there I am aiming to record 1 hr videos at 500x375) from a raspberry pi (running 64-bit bullseye) which need to be recorded in such a way that they can endure unexpected program termination or system shutdown.
Currently I am using a bash script utilising libcamera-vid and libav:
libcamera-vid -t $filmDuration --framerate 5 --width 500 --height 375 --nopreview --codec libav --libav-format avi -o "$(date +%Y%m%d_%H%M.avi)" --tuning-file /usr/share/libcamera/ipa/raspberrypi/imx219_noir.json
I initially encoded h.264 as mp4 but found that any interruption of the script would corrupt the file and I lack the understanding to work around this (though I suspect a method exists). The avi format on the other hand seems more robust and so I moved to using it but I am having a fairly serious issue by which the file appears to think the video is running at 600fps, rather than 5.
As far as I can tell this is not the case and there has been no loss in video duration that I would expect if the frames were being condensed but the machine learning toolkit (utilising openCV) these videos are recorded for takes the fps information as part of its novel video analysis effectively making it unable to analyse them.
I am not sure why exactly this is occurring or how to fix it but any advice would be very welcome; including any suggestions for other encoding software or solutions to recording to mp4 in a way that avoids corruption.

Not resolved as such but after opening an issue at the libcamera-apps repo this behaviour has been replicated and confirmed to be unintended.
While a similar issue that was effecting the mkv format incorrectly reporting its fps (as 30 according ffprobe) has been fixed, currently the issue with avi files incorrectly reporting fps has not.
Edit: New update to the libcamera-apps has now fixed the avi issue as well according to latest commit.

Determining the original recording date of a video file

I received a video file that I am trying to determine when the video was originally recorded. The file is a .mov video that is muxed with an audio file. The EXIF data shows multiple creation/modification dates which I recognize as the dates that the file was saved to my local computer, but the last creation date shown is from an earlier date that is adjusted to Eastern Time (UTC -4), which I believe may be the date the file was originally recorded. However, for that earliest creation date, it shows Lavf58.20.100 as the Encoder tag which I am unsure if that earlier creation date is just the date the video file was muxed or if it is in fact the date the video was originally recorded.
I used exiftool v. 12.44 to view the EXIF data from the video and attempted to validate the results against other known video and image files. In doing so, the last creation dates displayed on my known files were consistent with the dates the original files were recorded, however, they lacked the Lavf58.20.100 Encoder tag. I ran additional files that were muxed using FFMPEG, which show the Lavf Encoder tag, but those files did not return Creation Dates. Included is the screen capture of the EXIF data from the .mov file I am trying to determine its original recording date.

First a minor nitpick. Not all metadata is EXIF data. Your output shows mostly Quicktime data (and file system data), which is the standard used in video. EXIF data can exist in a video file, but it is non-standard.
Because a lot of video editing programs do not copy or save metadata when creating new files, it can be hard, if not impossible to know the original date of a video file. For example, using ffmpeg at any point in the workflow without including the -map_metadata option will strip the file of all embedded metadata.
The fact that CreationDate still exists and that the Track*Date and Media*Date tags, as well as the CreateDate/ModifyDate tags have values at all indicates a better quality program was used with this file. But that still depends upon what happened further upsteam in the workflow.
Your output is missing a lot of data due to the fact that many video tags are duplicates with the same name. This especially pertains to the Track*Date and Media*Date tags, as there would be copies of these for every track, meaning there would be at least one set for the video track and one set for the audio track, more if there are additional tracks.
Run this command on your file to see all the date/time related tags in the file, including duplicates and the groups they belong to. It is a variation of the command from exiftool FAQ #3, (always use the FAQ #3 command). From there you can try and determine what the original date was, if it is possible. Also note that most of these time stamps are supposed to be set to UTC, though the accuracy of that depends upon the program that created the file in the first place
exiftool -time:all -G1 -a -s file.mov

Concatenating Smooth Streaming output to a single MP4 file - problems with A/V sync. What is CodecPrivateData?

I have a video in fragmented form which is an output of an Azure Media Services Live Event (Smooth Streaming).
I'm trying to concatenate the segments to get a single MP4 file, however I've run into a A/V sync problem - no matter what I do (time-shifting/speeding up/slowing down/using FFmpeg filters), the audio delay is always floating. To get the output MP4 file, I tried concatenating the segments for video and audio streams (both at OS file level and with FFmpeg) and then muxing with FFmpeg.
I've tried everything I found on the web and I'm always ending up with exactly the same result. What's important, when I play the source from the manifest file, it's all good. That made me skim through the manifest once again, and I realized there's CodecPrivateData value which I'm not using anywhere in the process. What is it? Could it somehow help solving my problem?

Mystery solved: the manifest file contains the list of stream discontinuities, which need to be taken into account when concatenating the streams.

movie atom problem in mp4 conversion

In our project, we convert any given video file into mp4 file which works fine when we publish it via our site.
But when we publish the stream link in our itunes-rss and try to download and play the files in Itunes or quicktime, we get an error on the movie-atom in some of the movies and those don't play as they're downloaded to local machine.
After some research, we got that the problem is in the framerate value, to be more specific, the problem is related with 32bit - 64bit value differences. And the conversion should be done with the following formula:
newFrameRate = (int(oldFrameRate)+1)*(1000/1001)
- as we found so far.
We tried to learn the framerate value through ffmpeg and movieinfo, but the results were always different and not accurate.
What's your suggestion to solve this issue?
Tolga

I found one useful way to solve this problem and wanted to report.
I installed MP4Box, and used
mp4box -frag 1000
which solves all the moov-atom related problems.
I tried other values for fragmantation but in larger values, second half of the movie loose its movie track and turns into white.
FYI,
Tolga

Server side video mixing

I have a serie of video files encoded in mpeg2 (I can change this encoding), and I have to produce a movie in flash flv (this is a requirement, I can't change that encoding).
One destination movie is a compilation of different source video files.
I have a playlist defining the destination movie. For example:
Video file Position Offset Length
little_gnomes 0 0 8.5
fairies 5.23 0.12 12.234
pixies 14 0 9.2
Video file is the name of the file, position is when the file should be started (in the master timeline), offset is the offset within the video file, and length is the length of the video to play. The numbers are seconds (in double).
This would result in something like that (final movie timeline):
0--5.23|--8.5|--14|--17.464|--23.2|
little_nomes **************
fairies *********************
pixies *****************
Where video overlaps, the last video to be added override the last one, the audio should be mixed.
The resulting video track would be:
0--5.23|--8.5|--14|--17.464|--23.2|
little_nomes *******
fairies ***********
pixies *****************
While the resulting audio would be:
0--5.23|--8.5|--14|--17.464|--23.2|
little_nomes 11111112222222
fairies 222222211112222222222
pixies 22222222221111111
Where 1 or 2 is the number of mixed audio tracks.
There can be a maximum of 3 audio tracks.
I need to write a program which takes the playlist as input and produce the flv file. I'm open to any solution (must be free/open source).
An existing tool that can do that would be the simplest, but I found none. As for making my own solution, I found only ffmpeg, I was able to do basic things with it, but the documentation is terribly lacking.
It can be any language, it doesn't have to be super fast (if it takes 30 minutes to build a 1h movie it's fine).
The solution will run on opensolaris based x64 servers. If I have to use linux, this would work too. But windows is out of the question.

I finally ended writing my solution from scratch, using ffmpeg library. It's a lot of boiler plate code but in the end the logic in not complicated.
I found the MLT framework which helped me greatly.

Here are two related questions:
Command line video editing tools
https://superuser.com/questions/74028/linux-command-line-tool-for-video-editing
Avisynth sounds as if it might do what you want, but it's Windows-only.
You may very well end up writing your own application using the FFmpeg library. You're right, the documentation could be better... but the tutorial by Stephen Dranger is a good place to start (if you don't know it yet).

Well, if you prefer Java, I've written several similar programs using Xuggler's API.

If your videos / images are already online, you may use the Stupeflix API to create the final videos. You can change the soundtrack, add filters to the video and much more. Here the documentation and an online demo : https://developer.stupeflix.com/documentation/ .

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio