Modifying HEVC HM reference software - visual-studio

I'm very new to the field of HEVC codec. I'm using the HM reference code version 10.1. My task is to make larger block size Dimensions up to 128x128. So, what I tried was in the configuration file I set the parameters
MaxCUWidth
MaxCUHeight
as 128 and 128 respectively while Depth is 5 and
QuadtreeLog2MinSize= 2
QuadtreeLog2MaxSize= 6
.
This ensures that the max CU size is limited to 128 x 128. However, the code crashes with the errors:
{
Error: Minimum CU width must be greater than minimum transform size
Error: Minimum CU Height must be greater than minimum transform size
Error: QuadtreeLog2MaxSize must be 5 or greater
}
My problem is I'm not able to figure out where & how the code needs to be changed so that it does not effect other parameters. Any Kind of help regarding the same will be really valuable to me.

Thanks for helping me its done now For HEVC
HM version (10.1)
Needs to modify on only Depth increment of 1 while CusizeWidth and CuSizeheight must be equal to 128. For that purpose I had made changes in log2Bliksize<=7 containing
TComPattern.cpp
and made changes in MAX_CU_DEPTH in
TComRom.h
After that i got Block sizes / CU (Dimensions) of 128x128.

Related

Measuring decibels (dB) using soundmeter

I'm able to use soundmeter to measure the "root-mean-square (RMS) of sound fragments". Now I want to get the decibels dB measurement from this value somehow.
Afaict the formula is something like:
dB = 20 * log(RMS * P, 10)
where log is base 10, and P is some unknown value power, which (as far as I can tell from https://en.wikipedia.org/wiki/Decibel) depends on the microphone that is used.
Now if I use a sound level app on my iPhone I see the avg noise in the room is 68dB, and the measurements that I receive from the soundmeter --collect --seconds 10 are:
Collecting RMS values...
149 Timeout Collected result:
min: 97
max: 149
avg: 126
Is something wrong with this logic? and how can I determine what value of P to use without calculating it (which I'm tempted to do, and seems to work). I'd assume I'd have to look it up online in some specs page, but that seems quite difficult, and using osx I'm not sure now to figure out what the value of P would be. Also this seems to be dependent on the microphone volume level setting for osx.
soundmeter is not returning the RMS in the unit one would normally expect, which would be calibrated such that a full-scale digital sine wave is 1.0 and silence is 0.0.
I found these snippets of code:
In https://github.com/shichao-an/soundmeter/blob/master/soundmeter/meter.py
data = self.output.getvalue()
segment = pydub.AudioSegment(data)
rms = segment.rms
which calls https://github.com/jiaaro/pydub/blob/master/pydub/audio_segment.py
def rms(self):
if self.sample_width == 1:
return self.set_sample_width(2).rms
else:
return audioop.rms(self._data, self.sample_width)
The function immediately below you can see that the data is divided by the maximum sample value to give the desired scale. I assume ratio_to_db is 20*log10(x)
def dBFS(self):
rms = self.rms
if not rms:
return -float("infinity")
return ratio_to_db(self.rms / self.max_possible_amplitude)
In your particular case you need to take the collected RMS level divided by the 2^N where N is the number of bits per sample to get the RMS level scaled and then convert to dB. This number will be dBFS or decibels relative to digital full-scale and will be between +0 and -inf. To get a positive dBSPL value you need to find the sensitivity of your microphone. You can do this by looking up the specs or calibrating to a known reference. If you want to trust the app on your iPhone and it reports the room noise is 68 dBSPL while your program reads -40 dBFS then you can do simple arithmetic to convert by simply adding the difference of the two (108) to the dBFS number you get.

Modifying HEVC HM reference codec

I'm very new to the field of HEVC codec. I'm using the HM reference code version 10.1. My task is to make block size up to 128x128. So, what I tried was in the configuration file I set the parameters MaxCUWidth, MaxCUHeight as 128 and 128 respectively.
This ensures that the max CU size is limited to 128 x 128. However, the code crashes with the errors:
Error: Minimum CU width must be greater than minimum transform size
Error: Minimum CU Height must be greater than minimum transform size
My problem is I'm not able to figure out where & how the code needs to be changed so that it does not effect other parameters. Any Kind of help regarding the same will be really valuable to me.
The minimum CU width/height is derived from the MaxCUWidth/MaxCUHeight and the MaxPartitionDepth parameters. MaxPartitionDepth proscribes, how often a CTU can be splitted. So if you also increase MaxPartitionDepth by 1, it should work.
Alternatively, you can increase the parameter QuadtreeTULog2MinSize by 1, in order to increase the minimum transform size, but i would recommend the first approach, as this only increases the CTU size, without changing the rest of the configuration.
Thanks for helping me its done now For HEVC HM version (10.1) needs to modify on only Depth increment of 1 while CusizeWidth and CuSizeheight must be equal to 128. For that purpose I had made changes in log2Bliksize<=7 containing
TComPattern.cpp
and made changes in MAX_CU_DEPTH in
TComRom.h
After that i got Block sizes / CU (Dimensions) of 128x128.

Why does the GIF spec require at least 2-bits for the initial LZW code size?

I've been trying to figure out why the GIF89a spec requires that the initial LZW code size to be at least 2-bits, even when encoding 1-bit images (B&W). In appendix F of the spec, it says the following:
ESTABLISH CODE SIZE
The first byte of the Compressed Data stream is a value indicating the minimum number of bits required to represent the set of actual pixel values. Normally this will be the same as the number of color bits. Because of some algorithmic constraints however, black & white images which have one color bit must be indicated as having a code size of 2.
I'm curious as to what these algorithmic constraints are. What would possibly prevent the variant of LZW used in GIF from using a code size of 1? Was this just a limitation of the early encoders or decoders? Or is there some weird edge case that can manifest itself in with just the right combination of bits? Or is there something completely different going on here?
In addition to the codes for 0 and 1, you also have a clear code and an end of information code.
Quoting from the spec:
The output codes are of variable length, starting at +1 bits per
code, up to 12 bits per code. This defines a maximum code value of 4095
(0xFFF). Whenever the LZW code value would exceed the current code length, the
code length is increased by one.
If you start with a code size of 1, the code size needs to be increased immediately by this rule.
This limitation gets rid of one if in implementation (with codesize==1 the first vocabulary phrase code would have width==codesize+2, in all other cases width==codesize+1).
The drawback is very small decreasing in compression ratio for 2-color pictures.

Integer Time Series compression

Is there a well known documented algorithm for (positive) integer streams / time series compression, that would:
have variable bit length
work on deltas
My input data is a stream of temperature measurements from a sensor (more specifically a TMP36 read out by an Arduino). It is physically impossible that big jumps occur between measurements (time constant of sensor). I therefore think my compression algorithm should work on deltas (set a base on stream start and then only difference to next value). Because gaps are limited, I want variable bit length, because differences lower than 4 fit on 2 bits, lower than 8 on 3 bits and so on... But there is a dilemma between telling in stream the bit size of the next delta and just working on, say, 3 bit deltas and telling size only when bigger for instance.
Any idea what algorithm solves than one?
Use variable-length integers to code the deltas between values, and feed that to zlib to do the compression.
First of all there are different formats in existent. One thing I would do first is getting rid of the sign. A sign is usually a distraction when thinking about compression. I usually use the scheme where every positive is 2*v and every negative value is just 2*(-v)-1. So 0 = 0, -1 = 1, 1 = 2, -2 = 3, 2 = 4... .
Since with that scheme you have nothing like 0b11111111 = -1 the leading bits are gone. Now you can think about how to compress those symbols / numbers. One thing you can do is create a representive sample and use it to train a static huffman code. This should be possible within your on chip constraints. Another more simple aproach is using huffman codes for bit lengths and write the bits to stream. So 0 = bitlength 0, -1 = bitlength 1, 2,3 = bitlength length 2, ... . By using huffman codes to describe this bitlength you become quite compact literals.
I usually use a mixture. I use the most frequent symbols / values as raw values and use not so frequent numbers by using bit lengths + bit pattern of the actual value. This way you stay compact and do not have to deal with excessive tables (there are only 64 symbols for 64 bits lengths possible).
Also there are other schemes like leading bit where for example of every byte the first bit (or the highest) marks the last byte of the value so as long as the bit is set there will be another byte for the integer. If it is zero its the last byte of the value.
I usually train a static huffman code for such purposes. Its easy and you can even do the encoding and decoding becoming source code / generate source code out from your code (simply create ifs/switch statements and write your tables as arrays in your code).
You can use Integer compression methods with delta or delta of delta encoding like used in TurboPFor Integer Compression. Gamma coding can be also used if the deltas have very small values.
The current state of the art for this problem is Quantile Compression. It compresses numerical sequences such as integers and typically achieves 35% higher compression ratio than other approaches. It has delta encoding as a built-in feature.
CLI example:
cargo run --release compress \
--csv my.csv \
--col-name my_col \
--level 6 \
--delta-order 1 \
out.qco
Rust API example:
let my_nums: Vec<i64> = ...
let compressor = Compressor::<i64>::from_config(CompressorConfig {
compression_level: 6,
delta_encoding_order: 1,
});
let bytes: Vec<u8> = compressor.simple_compress(&my_nums);
println!("compressed down to {} bytes", bytes.len());
It does this by describing each number with a Huffman code for a range (a [lower, upper] bound) followed by an exact offset into that range.
By strategically choosing the ranges based on your data, it comes close the Shannon entropy of the data distribution.
Since your data comes from a temperature sensor, your data should be very smooth, and you may even consider delta orders higher than 1 (e.g. delta order 2 is "delta-of-deltas").

How the slice is enlarged by append? Is the capacity always doubled?

When append a slice, the slice may be enlarged if necessary. Because the spec doesn't specify the algorithm, I am curious about it.
I try to find the append implementation in the Go source code, but can't find it.
Could anyone explain the specified algorithm for enlarging slice? Is the capacity always doubled? or Could anyone provide the source code position of append? I can check it myself.
The code responsible for growing slices in append can be found here.
As of 2014-2020 the implemented rules are:
If appending to the slice will increase its length by more than double, the new capacity is set to the new length.
Otherwise, double the capacity if the current length is less than 1024, or by 25% if it is larger. Repeat this step until the new capacity fits the desired length.
Presumably this isn't part of the specification so the heuristics can be changed in future if needed. You can check the most current version of this implementation on the master branch.
In Go 1.18 it changed.
https://github.com/golang/go/commit/2dda92ff6f9f07eeb110ecbf0fc2d7a0ddd27f9d
starting cap growth factor
256 2.0
512 1.63
1024 1.44
2048 1.35
4096 1.30

Resources