General approach to temporal dithering - processing

I'm looking for some input on a general approach to implement temporal dithering in processing.
Currently I have a processing sketch which generates a hex file that can be sent to an APA102 LED strip over SPI. The framerate which I would be able to achieve should be sufficient that I can implement temporal dithering to increase the dynamic range of the LEDs, mainly with lower brightness. I looked into FastLed and Fadecandy to try and understand how it is done, but I can't really figure it out. Using these libraries is not an option as the animation should be 'hardcoded' in the hex file.
Who could point me in the right direction?
edit:
I currently implemented the following: First, I calculate the achievable framerate on the LEDs which gives me the number of dither-frames I can insert, based on the number of LEDs in my string and the SPI clockspeed. The LEDstrip can update at 420fps, so I have 7 'virtual' frames per frame to still be able to have 60fps base refresh rate.
I then calculate a lookup table of 7x7 which looks like this:
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 1 0 0 1 0
0 1 0 1 0 1 0
0 1 0 1 1 0 1
1 1 0 1 1 1 0
0 1 1 1 1 1 1
I do all the gamma- color correction calculations with floats, and every line in the lookup table corresponds a step of 1/7 between two values. These are then added to the floored RGB values to achieve the dithering.
However, all this does not really change much visually. Compared to the animation without dithering I don't see a difference.
I was hoping to see something like https://www.youtube.com/watch?v=1-_JtRl2ks0

You can read the code that FastLED uses for dithering in master/controller.h -- search for init_binary_dithering. From reading this, I gather that they just check how much time has elapsed since the last update to estimate how many "bits" of virtual dithering they can get.
Since you didn't provide working code, I'm not sure why you're not seeing a difference. But, I can work through an example to understand what temporal dithering is supposed to be doing.
Suppose your global brightness is set to 32. That means all RGB values are divided by 8 (256/32) before being displayed. For example 255,255,255 will actually display as 31,31,31.
(For now I'll just ignore "G" and "B" - let's pretend it's just "R".)
How is 32 displayed? It's displayed as 1.
How is 0 displayed? It's displayed at 0.
Now
How is 16 displayed? It was going to be displayed as 0. But this is where temporal dithering can be useful. What we really want to do is display it as 0 half the time and 1 half the time.
How is 24 displayed? It would also be displayed as 0, but with temporal dithering, we should display it as 0 one quarter of the time and 1 the other three quarters of the time.
Again from your description I'm not certain why you're not seeing the desired effect.

Related

Restricted VECM and Unrestricted VECM coefficient mainppulation

I estimated VECM where i have 2 cointegration in 4 variables
i get the following equations
V1 V2 V3 V4 V5
r1 1 0 1.9321781 0.21719257 -0.002287466
r2 0 1 0.0695936 -0.01783993 -0.001467253
I now want to be able to mainpulate the 1 & 0 coefficients and place them under the variable i choose, i understand that this will get me different results, however, i want to test the effect
i did some reading and got that i should be using cajorls function where i should place a matrix incluing my restrictions but i cant make it work
can someone show me an example of the matrix i should be using ? and in the matrix, some variables will have 1 and 0 coefficients, but others should be free for the model estimation, how can i make a matrix for this purpose ??
thanks

How to add a LUT in VHDL to generate a sine

I've made an I2S transmitter to generate a "sound" out of my FPGA. The next step I would like to do, is create a sine. I've made 16 samples in a LUT. My question is how to implement something like this in VHDL. And also how you load the samples in sequence. Who has tried this already, and could share his knowledge?
I've made a Lookup table with 16 samples:
0 0π
0,382683432 1/16π
0,707106781 1/8π
0,923879533 3/16π
1 1/4π
0,923879533 5/16π
0,707106781 3/8π
0,382683432 7/16π
3,23114E-15 1π
-0,382683432 1 1/16π
-0,707106781 1 1/8π
-0,923879533 1 3/16π
-1 1 1/4π
-0,923879533 1 5/16π
-0,707106781 1 3/8π
-0,382683432 1 7/16π
-6,46228E-15 2π
The simplest solution is to make a ROM which is just a big case statement.
FPGA synthesis tools will map this on ore more LUT's.
Note that for bigger tables only 1/4 of the wave is stored, the other values are derived.
I would like to send out a 24 bit samples, do you also know how to do that with this data (binary!)?
24 bits (signed) mean you have to convert your floating point values to integer values in the range -8388608..8388607. (For symmetry reason you would use -8388608..8388607)
Thus multiply the sine values (which you know are in the range -1..1) with 8388607.
The frequency of the sine depends on how fast (many samples per second) you send.

Is it possible to solve this, with a solution which has time complexity better than linear?

Is it possible to solve this problem with a solution which has time complexity better than linear?
N light bulbs are connected by a wire. Each bulb has a switch associated with it, however due to faulty wiring, a switch also changes the state of all the bulbs to the right of current bulb. Given an initial state of all bulbs, find the minimum number of switches you have to press to turn on all the bulbs. You can press the same switch multiple times.
Note : 0 represents the bulb is off and 1 represents the bulb is on.
Input : [0 1 0 1]
Steps:
press switch 0 : [1 0 1 0]
press switch 1 : [1 1 0 1]
press switch 2 : [1 1 1 0]
press switch 3 : [1 1 1 1]
Return : 4
This is called a "Lights Out" riddle:
http://mathworld.wolfram.com/LightsOutPuzzle.html
One speed improvement I could think off would be to paralellise the setting of all the bulbs to the right. In particular a GPU might be able to do that effectively (I am not sure as you need to change wich elements are effected with each loop).
Maybe making it a proper boolean Array and bitwise XOR-ing a pattern onto the array?
Unless the real process is a lot more complex then XOR-ing boolean values, memory speed will be the bottleneck here - not CPU time.
Unless this is for purely academic purposes, the performance rant propably applies:
http://ericlippert.com/2012/12/17/performance-rant/

Should I eliminate inputs in a logic circuit design?

Recently I had an exam where we were tested on logic circuits. I encountered something on that exam that I had never encountered before. Forgive me for I do not remember the exact problem given and we have not received our grade for it; however I will describe the problem.
The problem had a 3 or 4 inputs. We were told to simplify then draw a logic circuit design for that simplification. However, when I simplified, I ended up eliminating the other inputs and ended up literally with just
A
I had another problem like this as well where there was 4 inputs and when I simplified, I ended up with three. My question is:
What do I do with the eliminated inputs? Do I just not have it on the circuit? How would I draw it?
Typically an output is a requirement which would not be eliminated, even if it ends up being dependent on a single input. If input A flows through to output Y, just connect A to Y in the diagram. If output Y is always 0 or 1, connect an incoming 0 or 1 to output Y.
On the other hand, inputs are possible, not required, factors in the definition of the problem. Inputs that have no bearing on the output need not be shown in the circuit diagram at all.
Apparently it not eliminating inputs but the resulted expression is the simplified outcome which you need to think of implementing with logic circuit.
As an example if you have a expression given with 3 inputs namely with the combination of A, B & c, possible literals can be 2^9 = 9 between 000 through 111. Now when you said your simplification lead to just A that mean, when any of those 9 input combinations will result in to value which contain by A.
An example boolean expression simplified to output A truth table is as follows,
A B | Output = A
------------------
0 0 | 0
0 1 | 0
1 0 | 1
1 1 | 1

Is this a clever or stupid way to do an integer divide function?

I'm a Computer Science major, interested in how assembly languages handle a integer divide function. It seems that simply adding up to the numerator, while giving both the division and the mod, is way too impractical, so I came up with another way to divide using bit shifting, subtracting, and 2 look up tables.
Basically, the function takes the denominator, and makes "blocks" based on the highest power of 2. So dividing by 15 makes binary blocks of 4, dividing by 5 makes binary blocks of 3, etc. Then generate the first 2^block-size multiple of the denominator. For each multiple, write the values AFTER the first block into the look up table, keyed by the value of the first block.
Example: Multiples of 5 in binary - block size 3 (octal)
000 000 **101** - 5 maps to 0
000 001 **010** - 2 maps to 1
000 001 **111** - 7 maps to 1
000 010 **100** - 4 maps to 2
000 011 **001** - 1 maps to 3
000 011 **110** - 6 maps to 3
000 100 **011** - 3 maps to 4
000 101 **000** - 0 maps to 5
So the actual procedure involves getting the first block, left bit-shifting over the first block, and subtracting the value that the blocks maps to. If the resulting number comes out to 0, then it's perfectly divisible, and if the value becomes negative, it's not.
If you add another enumeration look up table, where you map the values to a counter as they come in, you can calculate the result of the division!
Example: Multiples of 5 again
5 maps to 1
2 maps to 2
7 maps to 3
4 maps to 4
1 maps to 5
6 maps to 6
3 maps to 7
0 maps to 8
Then all that's left is mapping every block to the counter-table, and you have your answer.
There are a few problems with this method.
If the answer isn't perfectly divisible, then the function returns back junk.
For high Integer values, this won't work, because a 5 block size will get truncated at the end of a 32 bit or 64 bit integer.
It's about 100 times slower than the standard division in C.
If the denominator is a factor of the divisor, then your blocks must map to multiple values, and you need even more tables. This can be solved with prime factorization, but all the methods I've read about easy/quick prime factorization involve dividing, defeating the purpose of this.
So I have 2 questions: First, is there an algorithm similar to this out there already? I've looked around, and I can't seem to find any like it. Second, How do actual assembly languages handle Integer division?
Sorry if there are any formatting mistake, this is my first time posting to stack overflow.
Sorry i answer so late. Ok, first regarding the commenters of your question: they think you are trying to do what the assembly memonic DIV or IDIV achieves by using different instructions in assembly. To me it seems you want to know how the op-codes that are selected by DIV and IDIV achieve division in hardware. To my knowledge Intel uses the SRT algorithm (uses a lookup-table) and AMD uses the Goldschmidt algorithm. I think what you are doing is similar to SRT. You can take a look at both of them here:
http://en.wikipedia.org/wiki/Division_%28digital%29

Resources