libav gives audio duration as negative - gcc

I am trying to make a simple av player, and in some cases I am getting values correctly as below:
checking /media/timecapsule/Music/02 Baawre.mp3
[mp3 # 0x7f0698005660] Skipping 0 bytes of junk at 2102699.
dur is 4396400640
duration is 311
However, in other places, I am getting negative durations:
checking /media/timecapsule/Music/01 Just Chill.mp3
[mp3 # 0x7f0694005f20] Skipping 0 bytes of junk at 1318922.
dur is -9223372036854775808
duration is -653583619391
I am not sure what's causing the duration to end up negative only in some audio files. Any ideas to where I might be wrong are welcome!
Source code here https://github.com/heroic/musika/blob/master/player/library.c

I would suggest two things:
1. Make sure that failed files are not corrupt, i.e. you can use ffmpeg command line tool to dump details.
2. Break this in 2 if conditions to avoid order of operation and ensure open succeeded.
if(!(avformat_open_input(&container, name, NULL, NULL) < 0 && avformat_find_stream_info(container, NULL) < 0)) {
Also you can use av_dump_format to ensure that it headers are correct. See ex - https://www.ffmpeg.org/doxygen/2.8/avio_reading_8c-example.html#a24
Ketan

Related

Using PyAV to encode mono audio to file, params match docs, but still causes Errno 22

While trying to use PyAV to encode live mono audio from a microphone to a compressed audio stream (using mp2 or flac as encoder), the program kept raising an exception ValueError: [Errno 22] Invalid argument.
To remove the live microphone source as a cause of the problem, and to make the problematic code easier for others to run/test, I have removed the mic source and now just generate a pure tone as a sequence of input buffers.
All attempts to figure out the missing or mismatched or incorrect argument have just resulted in seeing documentation and examples that are the same as my code.
I would like to know from someone who has used PyAV successfully for mono audio what the correct method and parameters are for encoding mono frames into the mono stream.
The package used is av 10.0.0 installed with
pip3 install av --no-binary av
so it uses my package-manager provided ffmpeg library, which is version 4.2.7.
The problematic python code is:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Recreating an error 22 when encoding sound with PyAV.
Created on Sun Feb 19 08:10:29 2023
#author: andrewm
"""
import typing
import sys
import math
import fractions
import av
from av import AudioFrame
""" Ensure some PyAudio constants are still defined without changing
the PyAudio recording callback function and without depending
on PyAudio simply for reproducing the PyAV bug [Errno 22] thrown in
File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push
"""
class PA_Stub():
paContinue = True
paComplete= False
pyaudio = PA_Stub()
"""Generate pure tone at given frequency with amplitude 0...1.0 at
sampling frewuency fs and beginning at phase offset 'phase'.
Returns the new phase after the sinusoid has cycled over the
sampling window length.
"""
def generate_tone(
freq:int, phase:float, amp:float, fs, samp_fmt, buffer:bytearray
) -> float:
assert samp_fmt == "s16", "Only s16 supported atm"
samp_size_bytes = 2
n_samples = int(len(buffer)/samp_size_bytes)
window = [int(0) for i in range(n_samples)]
theta = phase
phase_inc = 2*math.pi * freq / fs
for i in range(n_samples):
v = amp * math.sin(theta)
theta += phase_inc
s = int((2**15-1)*v)
window[i] = s
for sample_i in range(len(window)):
byte_i = sample_i * samp_size_bytes
enc = window[sample_i].to_bytes(
2, byteorder=sys.byteorder, signed=True
)
buffer[byte_i] = enc[0]
buffer[byte_i+1] = enc[1]
return theta
channels = 1
fs = 44100 # Record at 44100 samples per second
fft_size_samps = 256
chunk_samps = fft_size_samps * 10 # Record in chunks that are multiples of fft windows.
# print(f"fft_size_samps={fft_size_samps}\nchunk_samps={chunk_samps}")
seconds = 3.0
out_filename = "testoutput.wav"
# Store data in chunks for 3 seconds
sample_limit = int(fs * seconds)
sample_len = 0
frames = [] # Initialize array to store frames
ffmpeg_codec_name = 'mp2' # flac, mp3, or libvorbis make same error.
sample_size_bytes = 2
buffer = bytearray(int(chunk_samps*sample_size_bytes))
chunkperiod = chunk_samps / fs
total_chunks = int(math.ceil(seconds / chunkperiod))
phase = 0.0
### uncomment if you want to see the synthetic data being used as a mic input.
# with open("test.raw","wb") as raw_out:
# for ci in range(total_chunks):
# phase = generate_tone(2600, phase, 0.8, fs, "s16", buffer)
# raw_out.write(buffer)
# print("finished gen test")
# sys.exit(0)
# #----
# Using mp2 or mkv as the container format gets the same error.
with av.open(out_filename+'.mp2', "w", format="mp2") as output_con:
output_con.metadata["title"] = "My title"
output_con.metadata["key"] = "value"
channel_layout = "mono"
sample_fmt = "s16p"
ostream = output_con.add_stream(ffmpeg_codec_name, fs, layout=channel_layout)
assert ostream is not None, "No stream!"
cctx = ostream.codec_context
cctx.sample_rate = fs
cctx.time_base = fractions.Fraction(numerator=1,denominator=fs)
cctx.format = sample_fmt
cctx.channels = channels
cctx.layout = channel_layout
print(cctx, f"layout#{cctx.channel_layout}")
# Define PyAudio-style callback for recording plus PyAV transcoding.
def rec_callback(in_data, frame_count, time_info, status):
global sample_len
global ostream
frames.append(in_data)
nsamples = int(len(in_data) / (channels*sample_size_bytes))
frame = AudioFrame(format=sample_fmt, layout=channel_layout, samples=nsamples)
frame.sample_rate = fs
frame.time_base = fractions.Fraction(numerator=1,denominator=fs)
frame.pts = sample_len
frame.planes[0].update(in_data)
print(frame, len(in_data))
for out_packet in ostream.encode(frame):
output_con.mux(out_packet)
for out_packet in ostream.encode(None):
output_con.mux(out_packet)
sample_len += nsamples
retflag = pyaudio.paContinue if sample_len<sample_limit else pyaudio.paComplete
return (in_data, retflag)
print('Beginning')
### some e.g. PyAudio code which starts the recording process normally.
# istream = p.open(
# format=sample_format,
# channels=channels,
# rate=fs,
# frames_per_buffer=chunk_samps,
# input=True,
# stream_callback=rec_callback
# )
# print(istream)
# Normally at this point you just sleep the main thread while
# PyAudio calls back with mic data, but here it is all generated.
for ci in range(total_chunks):
phase = generate_tone(2600, phase, 0.8, fs, "s16", buffer)
ret_data, ret_flag = rec_callback(buffer, ci, {}, 1)
print('.', end='')
print(" closing.")
# Stop and close the istream
# istream.stop_stream()
# istream.close()
If you uncomment the RAW output part you will find the generated data can be imported as PCM s16 Mono 44100Hz into Audacity and plays the expected tone, so the generated audio data does not seem to be the problem.
The normal program console output up until the exception is:
<av.AudioCodecContext audio/mp2 at 0x7f8e38202cf0> layout#4
Beginning
<av.AudioFrame 0, pts=0, 2560 samples at 44100Hz, mono, s16p at 0x7f8e38202eb0> 5120
.<av.AudioFrame 0, pts=2560, 2560 samples at 44100Hz, mono, s16p at 0x7f8e382025f0> 5120
The stack trace is:
Traceback (most recent call last):
File "Dev/multichan_recording/av_encode.py", line 147, in <module>
ret_data, ret_flag = rec_callback(buffer, ci, {}, 1)
File "Dev/multichan_recording/av_encode.py", line 121, in rec_callback
for out_packet in ostream.encode(frame):
File "av/stream.pyx", line 153, in av.stream.Stream.encode
File "av/codec/context.pyx", line 484, in av.codec.context.CodecContext.encode
File "av/audio/codeccontext.pyx", line 42, in av.audio.codeccontext.AudioCodecContext._prepare_frames_for_encode
File "av/audio/resampler.pyx", line 101, in av.audio.resampler.AudioResampler.resample
File "av/filter/graph.pyx", line 211, in av.filter.graph.Graph.push
File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push
File "av/error.pyx", line 336, in av.error.err_check
ValueError: [Errno 22] Invalid argument
edit: It's interesting that the error happens on the 2nd AudioFrame, as apparently the first one was encoded okay, because they are given the same attribute values aside from the Presentation Time Stamp (pts), but leaving this out and letting PyAV/ffmpeg generate the PTS by itself does not fix the error, so an incorrect PTS does not seem the cause.
After a brief glance in av/filter/context.pyx the exception must come from a bad return value from res = lib.av_buffersrc_write_frame(self.ptr, frame.ptr)
Trying to dig into av_buffersrc_write_frame from the ffmpeg source it is not clear what could be causing this error. The only obvious one is a mismatch between channel layouts, but my code is setting the layout the same in the Stream and the Frame. That problem had been found by an old question pyav - cannot save stream as mono and their answer (that one parameter required is undocumented) is the only reason the code now has the layout='mono' argument when making the stream.
The program output shows layout #4 is being used, and from https://github.com/FFmpeg/FFmpeg/blob/release/4.2/libavutil/channel_layout.h you can see this is the value for symbol AV_CH_FRONT_CENTER which is the only channel in the MONO layout.
The mismatch is surely some other object property or an undocumented parameter requirement.
How do you encode mono audio to a compressed stream with PyAV?

Snakemake - parameter file treated as a wildcard

I have written a pipeline in Snakemake. It's an ATAC-seq pipeline (bioinformatics pipeline to analyze genomics data from a specific experiment). Basically, until merging alignment step I use {sample_id} wildcard, to later switch to {sample} wildcard (merging two or more sample_ids into one sample).
working DAG here (for simplicity only one sample shown; orange and blue {sample_id}s are merged into one green {sample}
Tha all rule looks as follows:
configfile: "config.yaml"
SAMPLES_DICT = dict()
with open(config['SAMPLE_SHEET'], "r+") as fil:
next(fil)
for lin in fil.readlines():
row = lin.strip("\n").split("\t")
sample_id = row[0]
sample_name = row[1]
if sample_name in SAMPLES_DICT.keys():
SAMPLES_DICT[sample_name].append(sample_id)
else:
SAMPLES_DICT[sample_name] = [sample_id]
SAMPLES = list(SAMPLES_DICT.keys())
SAMPLE_IDS = [sample_id for sample in SAMPLES_DICT.values() for sample_id in sample]
rule all:
input:
# FASTQC output for RAW reads
expand(os.path.join(config['FASTQC'], '{sample_id}_R{read}_fastqc.zip'),
sample_id = SAMPLE_IDS,
read = ['1', '2']),
# Trimming
expand(os.path.join(config['TRIMMED'],
'{sample_id}_R{read}_val_{read}.fq.gz'),
sample_id = SAMPLE_IDS,
read = ['1', '2']),
# Alignment
expand(os.path.join(config['ALIGNMENT'], '{sample_id}_sorted.bam'),
sample_id = SAMPLE_IDS),
# Merging
expand(os.path.join(config['ALIGNMENT'], '{sample}_sorted_merged.bam'),
sample = SAMPLES),
# Marking Duplicates
expand(os.path.join(config['ALIGNMENT'], '{sample}_sorted_md.bam'),
sample = SAMPLES),
# Filtering
expand(os.path.join(config['FILTERED'],
'{sample}.bam'),
sample = SAMPLES),
expand(os.path.join(config['FILTERED'],
'{sample}.bam.bai'),
sample = SAMPLES),
# multiqc report
"multiqc_report.html"
message:
'\n#################### ATAC-seq pipeline #####################\n'
'Running all necessary rules to produce complete output.\n'
'############################################################'
I know it's too messy, I should only leave the necessary bits, but here my understanding of snakemake fails cause I don't know what I have to keep and what I should delete.
This is working, to my knowledge exactly as I want.
However, I added a rule:
rule hmmratac:
input:
bam = os.path.join(config['FILTERED'], '{sample}.bam'),
index = os.path.join(config['FILTERED'], '{sample}.bam.bai')
output:
model = os.path.join(config['HMMRATAC'], '{sample}.model'),
gappedPeak = os.path.join(config['HMMRATAC'], '{sample}_peaks.gappedPeak'),
summits = os.path.join(config['HMMRATAC'], '{sample}_summits.bed'),
states = os.path.join(config['HMMRATAC'], '{sample}.bedgraph'),
logs = os.path.join(config['HMMRATAC'], '{sample}.log'),
sample_name = '{sample}'
log:
os.path.join(config['LOGS'], 'hmmratac', '{sample}.log')
params:
genomes = config['GENOMES'],
blacklisted = config['BLACKLIST']
resources:
mem_mb = 32000
message:
'\n######################### Peak calling ########################\n'
'Peak calling for {output.sample_name}\n.'
'############################################################'
shell:
'HMMRATAC -Xms2g -Xmx{resources.mem_mb}m '
'--bam {input.bam} --index {input.index} '
'--genome {params.genome} --blacklist {params.blacklisted} '
'--output {output.sample_name} --bedgraph true &> {log}'
And into the rule all, after filtering, before multiqc, I added:
# Peak calling
expand(os.path.join(config['HMMRATAC'], '{sample}.model'),
sample = SAMPLES),
Relevant config.yaml fragments:
# Path to blacklisted regions
BLACKLIST: "/mnt/data/.../hg38.blacklist.bed"
# Path to chromosome sizes
GENOMES: "/mnt/data/.../hg38_sizes.genome"
# Path to filtered alignment
FILTERED: "alignment/filtered"
# Path to peaks
HMMRATAC: "peaks/hmmratac"
This is the error* I get (It goes on for every input and output of the rule). *Technically it's a warning but it halts execution of snakemake so I am calling it an error.
File path alignment/filtered//mnt/data/.../hg38.blacklist.bed.bam contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
WARNING:snakemake.logging:File path alignment/filtered//mnt/data/.../hg38.blacklist.bed.bam contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
It isn't actually ... - I just didn't feel safe providing an absolute path here.
For a couple of days, I have struggled with this error. Looked through the documentation, listened to the introduction. I understand that the above description is far from perfect (it is huge bc I don't even know how to work it down to provide minimal reproducible example...) but I am desperate and hope you can be patient with me.
Any suggestions as to how to google it, where to look for an error would be much appreciated.
Technically it's a warning but it halts execution of snakemake so I am calling it an error.
It would be useful to post the logs from snakemake to see if snakemake terminated with an error and if so what error.
However, in addition to Eric C.'s suggestion to use wildcards.sample instead of {sample} as file name, I think that this is quite suspicious:
alignment/filtered//mnt/data/.../hg38.blacklist.bed.bam
/mnt/ is usually at the root of the file system and you are prepending to it a relative path (alignment/filtered). Are you sure it is correct?

Compile Issue under ProCOBOL due to SQLBEX

Have been trying to compile an existing Pro*COBOL program after making few changes, have verified the syntax and Non-Printable character which may cause issue for compilation.
But at the end, compilation for Pro*COBOL failing due one of .(dot) appearing under CALL to SQLBEX for the embedded SQL (Line Number 18106 under the listing).
See below code for more details, code snippet has been taken from Pro*COBOL code and the listing generated during compilation.
For other instances where SQLBEX is being called, the .(dot) doesn't appear. Would really appreciate any help.
Code under the listing:
18085 IF SQLCODE IN SQLCA = WS-DEADLOCK-WAIT-FOR-RESRC 26825000
18086 SET DEADLOCK TO TRUE 26826000
18087
18088* EXEC SQL 26827000
18089* COMMIT 26828000
18090* END-EXEC
18091 MOVE 1 TO SQL-ITERS
18092 MOVE 2914 TO SQL-OFFSET
* Micro Focus COBOL for UNIX V4.0 revision 004 18-Jan-17 07:31 Page 313
* cmcomc23.cob
18093 MOVE 0 TO SQL-OCCURS
18094 CALL "SQLADR" USING
18095 SQLCUD
18096 SQL-CUD
18097 CALL "SQLADR" USING
18098 SQLCA
18099 SQL-SQLEST
18100 MOVE 256 TO SQL-SQLETY
18101
18102 CALL "SQLBEX" USING
18103 SQLCTX
18104 SQLEXD
18105 SQLFPN
18106 .
18107 26829000
18108 DISPLAY 'DEAD LOCK OCCURED ' 26829100
18109 GO TO 9000-EXIT 26829200
18110 ELSE 26829300
* 562-S****************************************************************( 308)**
** An "ELSE" phrase did not have a matching IF and was discarded.
18111 SET NO-DEADLOCK TO TRUE 26829400
18112 END-IF. 26829500
* 564-S********** ( 313)**
** A scope-delimiter did not have a matching verb and was discarded.
Original Code under Program:
268210******************************************************************26821000
268221 9000-SQL-ERROR SECTION. 26822100
268230******************************************************************26823000
268250 EXEC SQL 26824000
268250 WHENEVER SQLERROR CONTINUE 26824000
268250 END-EXEC. 26824000
268240 26824000
268250 IF SQLCODE IN SQLCA = WS-DEADLOCK-WAIT-FOR-RESRC 26825000
268260 SET DEADLOCK TO TRUE 26826000
268270 EXEC SQL 26827000
268280 COMMIT 26828000
268290 END-EXEC 26829000
268291 DISPLAY 'DEAD LOCK OCCURED ' 26829100
268292 GO TO 9000-EXIT 26829200
268293 ELSE 26829300
268294 SET NO-DEADLOCK TO TRUE 26829400
268295 END-IF. 26829500
268296 26829600
268297 MOVE 'E' TO WS-ERR-SEVERITY-CD. 26829700

Meaning of yywrap() in flex

What does this instructions mean in flex (lex) :
#define yywrap() 1
and this [ \t]+$
i find it in the code below:
(%%
[ \t]+ putchar('_');
[ \t]+%
%%
input "hello world"
output "hello_world"
)
According to The Lex & Yacc Page :
When the scanner receives an end-of-file indication from YY_INPUT, it then checks the yywrap() function. If yywrap() returns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL.
The #define is used to simplify building the program (so that no -ll linkage option is needed).
Further reading:
What are lex and yacc?
Routines to reprocess input
6. How do Lex and YACC work internally (Lex and YACC primer/HOWTO)

TokyoCabinet's Ruby C interface can't bzip

I'm using the Ruby official Ruby C interface and am not able to bzip working. It did build with bzip support, ./configure said:
checking bzlib.h usability... yes
checking bzlib.h presence... yes
checking for bzlib.h... yes
So I wrote this example program that just writes an entry to two files, one supposedly bzip'd and one not. Neither is compressed; aside from the simple file size test at the end I can edit the with_bzip.tcb
file and see the raw string text there.
require 'tokyocabinet'
include TokyoCabinet
def write filename, options
File.unlink filename if File.exists? filename
bdb = BDB::new
bdb.tune(0, 0, 0, -1 -1, options) or raise "Couldn't tune"
bdb.open(filename, BDB::OWRITER | BDB::OCREAT | BDB::OLCKNB) or raise "Couldn't open"
bdb["test"] = "This string should be compressed and not appear raw.\n" * 10000
bdb.close
end
write 'without_bzip.tcb', 0
write 'with_bzip.tcb', BDB::TBZIP
puts "Not actually compressed" unless File.size('with_bzip.tcb') < File.size('without_bzip.tcb')
What makes it worse is that if I try a preview release of Oklahoma Mixer (example following - though I don't have the reputation to add the new tag), it compresses fine. When I tucked some debugging into its try() call, it seems to be making the same call to tune(0, 0, 0, -1, -1, 4). I'm pretty completely stumped - can anyone tell me what my code above is doing wrong?
require 'oklahoma_mixer'
OklahomaMixer.open("minimal_om.tcb", :opts => 'lb') do |db|
db["test"] = "This string should be compressed and not appear raw.\n" * 10000
end
It is an evil, subtle bug. I left out a comma in the tune() call and wrote -1 -1 instead of -1, -1. All the arguments are optional, so it was quietly not bzipping. Argh.

Resources