I am building a model which requires me to find the maximum of a set of 8 signals, also find the index of the maximum value.
How can I build such a model in Simulink(Xilinx library) ?
I am guessing Compare block followed by a counter block. But somehow, I am not able to figure all things together.
Thanks
One way which gets it all done in parallel:
You need to build a tree of comparators and multiplexers:
Start with a block which takes in two values and two indices and passes out the index and the value of the larger. One comparator, 2 muxes per block.
On the first level of your tree you have 4 of these blocks feeding into
a second level of 2 of these blocks, the results of which feed into
a final block which produces your answers
This can be pipelined so you can pour data through it as fast as you like. But it'll use a fair amount of resource. How wide are your signals? Each comparator is 1 LUT4 per bit and a 2:1 mux is 1 LUT4 per bit.
Alternatively, you use a counter to select each of your values in turn. If it's bigger than the current biggest, latch the value into your "biggest" register and latch the counter into a "biggest index" register. Reset the "biggest" register to the smallest value each time your counter resets.
This will take as many clock cycles as you have signal (8 in your case)
Related
This is a two part question:
I have a fluid flow sensor connected to an NI-9361 on my DAQ. For those that don't know, that's a pulse counter card. None the less, from the data read from the card, I'm able to calculate fluid flowing through the device in Gallons per hour, min, sec, etc. But what I need to do is the following:
Calculate total number of gallons of fluid that has flowed through the sensor since the loop began running
if possible, store that total so that it can be incremented next time the program runs
I know how to calculate it by hand, just not sure how to achieve the running summation required to calculate total amount of fluid that has passed through the sensor, or how to store the variable being incremented at the next program execution. I'm presuming the latter would involve writing a TDMS file, then opening and reading back the data, unless there's a better way?
Edit:
Below is the code used to determine GPM flow through my sensor. This setup is in accordance with the 9361 manual; it executes and yields proper results.
See this link for details:
http://zone.ni.com/reference/en-XX/help/373197L-01/criodevicehelp/crio-9361/
I can extrapolate how many gallons flow per second, or sample period, the 1526.99 scalar is the fluid flow manufacturer's constant - number of pulses per gallon passing through the sensor. The 9361 is set to frequency/period mode, so I'm calculating cycles per second, dividing by the constant for cycles per gallon to get gallons per second/min.
I suppose I could get a time reference by looking at the sample period, so I guess the better question is, how do I keep an incrementing sum?
Hi I'm reading an textbook that descrbes the piplelined desgin of CPU.
I don't understand why we still need clocked registers? for example, as the picture belows shows:
if we can remove all three registers, we can save 60ps, because we just need the processor to continuely execute instructions, so when
a comb logic finishes, that's when the next instruction should start to execute, why we need clock cycle to manually control the beginning of executing instructions?
You can begin to understand the need for latches by imagining that they are removed.
The secret is to realize that it takes each block 100 picoseconds to produce valid results. Before that time, the output is invalid, aka junk and not as you might think, the previous result. Remember, these a combinatorial blocks that have no memory.
Now imagine that we place new data on the inputs of Block A every 100 picoseconds.
What will the output look like? Well as soon as the new data is presented to the inputs, the outputs of that block are invalid. This means that Block B has invalid inputs and cannot begin processing data until they are valid.
Now after 100 picoseconds, Block A has valid data going out and Block B can finally begin. But no, the input to Block A changes and Block B has invalid inputs again. The only way to get a valid result through all three is to hold the inputs valid for the whole 300 picoseconds needed to get through all three blocks.
With latches, the valid results from each block are latched and do not change with changing inputs. Thus we can present new data every 100 + 20 picoseconds versus every 300 picoseconds. Or, with pipeline latches the circuit runs 2.5 times faster.
what is the best practice to access a changing 32bit register (like a counter) through a 16bit databus ?
I suppose i have to 'freeze' or copy the 32bit value on a read of the LSB until the MSB is also read and vise versa on a write to avoid data corruption if the LSB overflows to the MSB between the 2 accesses.
Is there a standard approach to this ?
As suggested in both the question and Morten's answer, a second register to hold the value at the time of the read of the first half is a common method. In some MCUs this register is common to multiple devices, meaning you need to either disable interrupts across the two accesses or ensure ISRs don't touch the extra register. Writes are similarly handled, frequently in the opposite order (write second word temporary storage, then write first word on device thus triggering the device to read the second word simultaneously).
There have also been cases where you just can't access the register atomically. In such cases, you might need to implement additional logic to figure out the true value. An example of such an algorithm assuming three reads take much less than 1<<15 counter ticks might be:
earlyMSB = highreg;
midLSB = lowreg;
lateMSB = highreg;
fullword = ((midLSB<0x8000 ? lateMSB : earlyMSB)<<16) | midLSB;
Other variants might use an overflow flag to signal the more significant word needs an increment (frequently used to implement that part of the counter in software).
There is no standard way, but an often used approach is to make read one address return the first 16 bits, while the remaining 16 bits are captured at the same time, and read later at another address.
I have implemented a frequency divider by the powers of 2. Now I am interested in doing a divider by any integer number from 1 to 16. Yes, I have tried but yet no ideas. How can I approach this problem?
I want to use common elements like multiplexers, flip flops and so on. Not asking for a complete solution, even though it would be great.
That is normally the job of a PLL many FPGA have some PLL on chip.
Or try a counter that resets when limit (0-15) is reached.
Each time limit is reached toggle clock.
The value for 1:1 clock needs special handling, maybe a clock bypass.
A better way would be to run the counter at double frequency to avoid the mux.
Instead of an incrementing counter a decrementing counter that loads the configured value on zero would do as well.
Sorry in advance if I have some of this wrong. I may edit to correct later if it's not too disruptive.
When multiple variables are declared in adjacent memory, as I understand it, on a very low level, registers are created that encapsulate a number of bytes, commonly 1, 2, 4 or 8. This allows those bit ranges to be binary rotated, as well as thought of by the processor as numbers and so mutated with simple mathematics such as add, subtract, multiply and devide.
There may be abstraction reasons for not overlapping thease ranges, but as many langueges consider instructions to occur in a well defined sequential order that the coder will be aware of, are there any performance reasons to not overlap one or more in adjacent bytes of allocated memory?
For example in a block of allocated memory where every bit starts as 0. Bytes 0 to 3 could be being used as an int, as well as bytes 1 to 4. The first could be set to a value before the second range was multiplied by 3.
If there are performance reasons not to then are they overcome by otherwise having to to copy values in and out of completely new variables and perform more complicated processes to achieve certain algorithms that could otherwise be done on a very low level?
There is nothing wrong with this trick when it is done in assembly: optimizers have been routinely making use of knowing where parts of an integer are to save CPU cycles and reduce the size of the code. For example, when a 32-bit integer variable is initialized to a value that fits in only 16 bits, optimizing compilers would replace the instruction that stores a 32-bit value in memory with a faster instruction that stores a 16-bit value to the lower bits of the variable, and clear the upper 16 bits. Moreover, many optimizers would go even further: if a constant is divisible by 2^16, they would store the value divided by 2^16 to the upper 16 bits, and clear lower 16 bits.
Some architectures restrict such manipulations to addresses of certain properties, for example, by requiring all 4-byte memory load/store instructions to be done at addresses divisible by four. These restrictions may reduce applicability of partial-value writing tricks.