Modifying HEVC HM reference codec

Modifying HEVC HM reference codec - visual-studio-2010

I'm very new to the field of HEVC codec. I'm using the HM reference code version 10.1. My task is to make block size up to 128x128. So, what I tried was in the configuration file I set the parameters MaxCUWidth, MaxCUHeight as 128 and 128 respectively.
This ensures that the max CU size is limited to 128 x 128. However, the code crashes with the errors:
Error: Minimum CU width must be greater than minimum transform size
Error: Minimum CU Height must be greater than minimum transform size
My problem is I'm not able to figure out where & how the code needs to be changed so that it does not effect other parameters. Any Kind of help regarding the same will be really valuable to me.

The minimum CU width/height is derived from the MaxCUWidth/MaxCUHeight and the MaxPartitionDepth parameters. MaxPartitionDepth proscribes, how often a CTU can be splitted. So if you also increase MaxPartitionDepth by 1, it should work.
Alternatively, you can increase the parameter QuadtreeTULog2MinSize by 1, in order to increase the minimum transform size, but i would recommend the first approach, as this only increases the CTU size, without changing the rest of the configuration.

Thanks for helping me its done now For HEVC HM version (10.1) needs to modify on only Depth increment of 1 while CusizeWidth and CuSizeheight must be equal to 128. For that purpose I had made changes in log2Bliksize<=7 containing
TComPattern.cpp
and made changes in MAX_CU_DEPTH in
TComRom.h
After that i got Block sizes / CU (Dimensions) of 128x128.

Related

Modifying HEVC HM reference software

I'm very new to the field of HEVC codec. I'm using the HM reference code version 10.1. My task is to make larger block size Dimensions up to 128x128. So, what I tried was in the configuration file I set the parameters
MaxCUWidth
MaxCUHeight
as 128 and 128 respectively while Depth is 5 and
QuadtreeLog2MinSize= 2
QuadtreeLog2MaxSize= 6
.
This ensures that the max CU size is limited to 128 x 128. However, the code crashes with the errors:
{
Error: Minimum CU width must be greater than minimum transform size
Error: Minimum CU Height must be greater than minimum transform size
Error: QuadtreeLog2MaxSize must be 5 or greater
}
My problem is I'm not able to figure out where & how the code needs to be changed so that it does not effect other parameters. Any Kind of help regarding the same will be really valuable to me.

Thanks for helping me its done now For HEVC
HM version (10.1)
Needs to modify on only Depth increment of 1 while CusizeWidth and CuSizeheight must be equal to 128. For that purpose I had made changes in log2Bliksize<=7 containing
TComPattern.cpp
and made changes in MAX_CU_DEPTH in
TComRom.h
After that i got Block sizes / CU (Dimensions) of 128x128.

Random number generation requires too many iterations

I am running simulations in Anylogic and I'm trying to calibrate the following distribution:
Jump = normal(coef1, coef2, -1, 1);
However, I keep getting the following message as soon as I start the calibration (experimentation):
Random number generation requires too many iterations (> 10000)
I tried to replace -1 and 1 by other values and keep getting the same thing.
I also tried to change the bounds of coef1 and coef2 and put things like [0,1], but I still get the same error.
I don't get it.
Any ideas?

The four parameter normal method is not deprecated and is not a "calibration where coef1 and coef2 are the coefficicents to be solved for". Where did you get that understanding from? Or are you saying that you're using your AnyLogic Experiment (possibly a multi-run or optimisation experiment) to 'calibrate' that distribution, in which case you need to explain what you mean by 'calibrate' here---what is your desired outcome?
If you look in the API reference (AnyLogic classes and functions --> API Reference --> com.xj.anylogic.engine --> Utilities), you'll see that it's a method to use a truncated normal distribution.
public double normal(double min,
double max,
double shift,
double stretch)
The first 2 parameters are the min and max (where it will sample repeatedly and ignore values outside the [min,max] range); the second two are effectively the mean and standard deviation. So you will get the error you mentioned if min or max means it will sample too many times to get a value in range.
API reference details below:
Generates a sample of truncated Normal distribution. Distribution
normal(1, 0) is stretched by stretch coefficient, then shifted to the
right by shift, after that it is truncated to fit in [min, max]
interval. Truncation is performed by discarding every sample outside
this interval and taking subsequent try. For more details see
normal(double, double)
Parameters:
min - the minimum value that this function will return. The distribution is truncated to return values above this. If the sample
(stretched and shifted) is below this value it will be discarded and
another sample will be drawn. Use -infinity for "No limit".
max - the maximum value that this function will return. The distribution is truncated to return values below this. If the sample
(stretched and shifted) is bigger than this value it will be discarded
and another sample will be drawn. Use +infinity for "No limit".
shift - the shift parameter that indicates how much the (stretched) distribution will shifted to the right = mean value
stretch - the stretch parameter that indicates how much the distribution will be stretched = standard deviation Returns:
the generated sample

According to AnyLogic's documentation, there is no version of normal which takes 4 arguments. Also note that if you specify a mean and standard deviation, the order is unusual (to probabilists/statisticians) by putting the standard deviation before the mean.

Why does the GIF spec require at least 2-bits for the initial LZW code size?

I've been trying to figure out why the GIF89a spec requires that the initial LZW code size to be at least 2-bits, even when encoding 1-bit images (B&W). In appendix F of the spec, it says the following:
ESTABLISH CODE SIZE
The first byte of the Compressed Data stream is a value indicating the minimum number of bits required to represent the set of actual pixel values. Normally this will be the same as the number of color bits. Because of some algorithmic constraints however, black & white images which have one color bit must be indicated as having a code size of 2.
I'm curious as to what these algorithmic constraints are. What would possibly prevent the variant of LZW used in GIF from using a code size of 1? Was this just a limitation of the early encoders or decoders? Or is there some weird edge case that can manifest itself in with just the right combination of bits? Or is there something completely different going on here?

In addition to the codes for 0 and 1, you also have a clear code and an end of information code.
Quoting from the spec:
The output codes are of variable length, starting at +1 bits per
code, up to 12 bits per code. This defines a maximum code value of 4095
(0xFFF). Whenever the LZW code value would exceed the current code length, the
code length is increased by one.
If you start with a code size of 1, the code size needs to be increased immediately by this rule.

This limitation gets rid of one if in implementation (with codesize==1 the first vocabulary phrase code would have width==codesize+2, in all other cases width==codesize+1).
The drawback is very small decreasing in compression ratio for 2-color pictures.

How the slice is enlarged by append? Is the capacity always doubled?

When append a slice, the slice may be enlarged if necessary. Because the spec doesn't specify the algorithm, I am curious about it.
I try to find the append implementation in the Go source code, but can't find it.
Could anyone explain the specified algorithm for enlarging slice? Is the capacity always doubled? or Could anyone provide the source code position of append? I can check it myself.

The code responsible for growing slices in append can be found here.
As of 2014-2020 the implemented rules are:
If appending to the slice will increase its length by more than double, the new capacity is set to the new length.
Otherwise, double the capacity if the current length is less than 1024, or by 25% if it is larger. Repeat this step until the new capacity fits the desired length.
Presumably this isn't part of the specification so the heuristics can be changed in future if needed. You can check the most current version of this implementation on the master branch.

In Go 1.18 it changed.
https://github.com/golang/go/commit/2dda92ff6f9f07eeb110ecbf0fc2d7a0ddd27f9d
starting cap growth factor
256 2.0
512 1.63
1024 1.44
2048 1.35
4096 1.30

Does Global Work Size Need to be Multiple of Work Group Size in OpenCL?

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL?
If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities:
Dynamically set the size of the work group dimensions to a factor of the global work dimensions. (this would incur the overhead of finding a factor and possibly set the work group to a non-optimal size.)
Increase the dimensions of the global work to be the nearest multiple of the work group dimensions, keeping all input and output buffers the same but checking bounds in the kernel to avoid segfaulting, i.e. do nothing on the work items out of bound of the desired output. (This seems like the better way.)
Would the second way work? Is there a better way? (Or is it not necessary because work group dimensions need not divide global work dimensions?)
Thanks!

Thx for the link Chad. But actually, if you read on:
If local_work_size is specified, the
values specified in global_work_size[0], … global_work_size[work_dim - 1] must be evenly
divisible by the corresponding values specified in local_work_size[0], …
local_work_size[work_dim – 1].
So YES, the local work size must be a multiple of the global work size.
I also think the assigning the global work size to the nearest multiple and being careful about bounds should work, I'll post a comment when I get around to trying it.

This seems to be an old post, but let me update this post with some new information. Hopefully, it could help someone else.
Does Global Work Size (Dimensions) Need to be Multiple of Work Group
Size (Dimensions) in OpenCL?
Answer: True till OpenCL 2.0. Before CL2.0, your global work size must be a multiple of local work size, otherwise you will get an error message when you execute clEnqueueNDRangeKernel.
But from CL2.0, this is not required anymore. You can use whatever global work size which fits your application dimensions. However, please remember that the hardware implementation might still use the "old" way, which means padding the global work group size. Therefore, it makes the performance highly dependent on the hardware architecture. You may see quite different performance on different hardware/platforms. Plus, you want to make your application back compatible to support older platform which only supports CL up to version 1.2. So, I think this new feature added in CL2.0 is just for easy programming, to get better controllable performance and backward compatibility, I suggest you still use the following method mentioned by you:
Increase the dimensions of the global work to be the nearest multiple
of the work group dimensions, keeping all input and output buffers the
same but checking bounds in the kernel to avoid segfaulting, i.e. do
nothing on the work items out of bound of the desired output. (This
seems like the better way.)
Answer: you are absolutely right. This is the right way to handle such case. Carefully design the local work group size (considering factors such as register usage, cache hit/miss, memory access pattern and so on). And then pad your global work size to a multiple of local work size. Then, you are good to go.
Another thing to consider is that you can utilize the image object to store the data instead of buffer, if there are quite a lot of boundary checking work in your kernel. For image, the boundary check is automatically done by hardware, almost no overhead in most of the implementations. Therefore, padding your global work size, store your data in image object, then, you just need to write your code normally without worrying about the boundary checking.

According to the standard it doesn't have to be from what I saw. I think I would handle it with a branch, but I don't know exactly what kind of matrix operation you are doing.
http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf#page=131
global_work_size points to an array
of work_dim unsigned values that
describe the number of global
work-items in work_dim dimensions that
will execute the kernel function. The
total number of global work-items is
computed as global_work_size[0] *
... * global_work_size[work_dim –
1].
The values specified in
global_work_size + corresponding
values specified in global_work_offset
cannot exceed the range given by the
sizeof(size_t) for the device on
which the kernel execution will be
enqueued. The sizeof(size_t) for a
device can be determined using
CL_DEVICE_ADDRESS_BITS in table 4.3.
If, for example,
CL_DEVICE_ADDRESS_BITS = 32, i.e.
the device uses a 32-bit address
space, size_t is a 32-bit unsigned
integer and global_work_size values
must be in the range 1 .. 2^32 - 1.
Values outside this range return a
CL_OUT_OF_RESOURCES error.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio