Register varibles in Verilog by using Quartus and FPGA - cpu

I wonder, how is process behind when we use reg variable in verilog. We know that, in Quartus using by any FPGA we can declare and handle many registers. Also, we know that recent-i7(64 bit) CPUs have just 16 register. When we compile and simulate our code that has more than for example 50 register, it works. These registers are stored in any memory block? If so, why do we define this variable as reg? I mean when we define variable as reg in verilog registers should(am I wrong?) be handled in CPU registers but CPU has no more than 16 register. I am confused.

These registers are stored in any memory block?
No. Without spacial implementation Quartus tool use registers from logic cells.
Some ALTERA device have UFM (User flash memory). You can use it to hold data:
https://www.altera.com/en_US/pdfs/literature/an/an489.pdf
https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug_alt_ufm.pdf
Sorry for My English.

Related

Why my kernel module performs float division perfectly? [duplicate]

I'm running on an x86 CentOS 6.3 (kernel v2.6.32) system.
I compiled the following function into a bare-bones character driver module as an experiment to see how the Linux kernel reacts to floating point operations.
static unsigned floatstuff(void){
float x = 3.14;
x *= 2.5;
return x;
}
...
printk(KERN_INFO "x: %u", x);
The code compiled (which wasn't expecting) so I inserted the module and checked the log with dmesg. The log showed: x: 7.
This seems strange; I thought you couldn't perform floating point operations in the Linux kernel -- save some exceptions such as kernel_fpu_begin(). How did the module perform the floating point operation?
Is this because I'm on an x86 processor?
I thought you couldn't perform floating point operations in the Linux kernel
You can't safely: failure to use kernel_fpu_begin() / kernel_fpu_end() doesn't mean FPU instructions will fault (not on x86 at least).
Instead it will silently corrupt user-space's FPU state. This is bad; don't do that.
The compiler doesn't know what kernel_fpu_begin() means, so it can't check / warn about code that compiles to FPU instructions outside of FPU-begin regions.
There may be a debug mode where the kernel does disable SSE, x87, and MMX instructions outside of kernel_fpu_begin / end regions, but that would be slower and isn't done by default.
It is possible, though: setting CR0::TS = 1 makes x87 instructions fault, so lazy FPU context switching is possible, and there are other bits for SSE and AVX.
There are many ways for buggy kernel code to cause serious problems. This is just one of many. In C, you pretty much always know when you're using floating point (unless a typo results in a 1. constant or something in a context that actually compiles).
Why is the FP architectural state different from integer?
Linux has to save/restore the integer state any time it enters/exits the kernel. All code needs to use integer registers (except for a giant straight-line block of FPU computation that ends with a jmp instead of a ret (ret modifies rsp).)
But kernel code avoids FPU generally, so Linux leaves the FPU state unsaved on entry from a system call, only saving before an actual context switch to a different user-space process or on kernel_fpu_begin. Otherwise, it's common to return to the same user-space process on the same core, so FPU state doesn't need to be restored because the kernel didn't touch it. (And this is where corruption would happen if a kernel task actually did modify the FPU state. I think this goes both ways: user-space could also corrupt your FPU state).
The integer state is fairly small, only 16x 64-bit registers + RFLAGS and segment regs. FPU state is more than twice as large even without AVX: 8x 80-bit x87 registers, and 16x XMM or YMM, or 32x ZMM registers (+ MXCSR, and x87 status + control words). Also the MPX bnd0-4 registers are lumped in with "FPU". At this point "FPU state" just means all non-integer registers. On my Skylake, dmesg says x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
See Understanding FPU usage in linux kernel; modern Linux doesn't do lazy FPU context switches by default for context switches (only for kernel/user transitions). (But that article explains what Lazy is.)
Most processes use SSE for copying/zeroing small blocks of memory in compiler-generated code, and most library string/memcpy/memset implementations use SSE/SSE2. Also, hardware supported optimized save/restore is a thing now (xsaveopt / xrstor), so "eager" FPU save/restore may actually do less work if some/all FP registers haven't actually been used. e.g. save just the low 128b of YMM registers if they were zeroed with vzeroupper so the CPU knows they're clean. (And mark that fact with just one bit in the save format.)
With "eager" context switching, FPU instructions stay enabled all the time, so bad kernel code can corrupt them at any time.
Don't do that!
In kernel-space FPU mode is disabled due to several reasons:
It allows Linux to run in architectures that do not have FPU
It avoids to save and restore the whole set of registers every kernel/user-space transition (it may double the time of context switch)
Basically all of the kernel functions use integers also for representing decimal numbers -> you don't probably need floating point
In Linux, preemption is disabled when kernel-space is running in FPU mode
Floating point numbers are evil and may generate very bad unexpected behaviour
If you really want to use FP numbers (and you should not) you must use the kernel_fpu_begin and kernel_fpu_end primitives to avoid to break user-space registers, and you should take in account all of the possible problems (security included) in dealing with FP numbers.
Not sure where this perception is coming from. But the kernel executes on the same processor as the user mode code, and therefore has access to the same instruction set. If the processor can do floating point (directly or by a co-processor), the kernel can too.
Maybe you are thinking of cases where floating point arithmetic is emulated in software. But even so, it would be available in kernel (well, unless disabled somehow).
I am curious, where is this perception coming from? Maybe I am missing something.
Found this. Seems to be a good explanation.
OS kernel may simply turn the FPU off in kernel mode.
While FPU operation, while floating point operation kernel will turn the FPU on and after that turn off the FPU.
But you can not print it.

Confusion with instruction register and counter register

I have difficulty in understanding the workings of instruction register and counter register in avr micro-controller like an atmega128p.
Is there a way to explain what they do in a simple way?
I suppose you are interested in the difference between the Instruction Register(IR) and Program Counter (PC).
Their working principles are pretty simple.
IR: Holds the instruction that will be or is currently executed
PC: Holds the address of the instruction that should be loaded into IR from flash memory in the next clock cycle.
Note that AVRs normally use a 2-stage pipeline (fetch + execute) therefore the IR is double buffered (inside a clock cycle the next instruction is loaded from flash and the current instruction is executed).

Design a 256x8 bit RAM using 64 rows and 32 columns programmatically using VHDL

I am new to VHDL programming, I am going to do a project on Built-In Self-Repair.In this project am going to design RAMs of different sizes(256 B,8kB,16kB,32kB)etc. and those rams has to be tested using BIST and then they should be repaired.So please help me by giving an example like how to design RAM with 'n' rows and columns
Start by drawing a block diagram of the RAM at the level of abstraction you want (probably gate-level). Then use VHDL to describe the block diagram.
You should probably limit yourself to a behavioral description, i.e., don't expect to be able to synthesize it. Synthesis for FPGAs usually expects a register-transfer-level description, and synthesis for ASICs is not something I would recommend for a VHDL beginner.
I will assume you want to work with SRAM, since this is the simplest case. Also, let's suppose you want to model a RAM with RAM_DEPTH words, and each word is RAM_DATA_WIDTH bits wide. One possible approach is to structure your solution in three modules:
One module that holds the RAM bits. This module should have the typical ports for a RAM: clock, reset (optional), write_enable, data_in, data_out. Note that each RAM word should be wide enought to hold the data bits, plus the parity bits, which are redundant bits that will allow yout to correct any errors. You can read about Hamming codes used for memory correction here: http://bit.ly/1dKrjV5. You can see a RAM modilng example from Doulos here: http://bit.ly/1aq1tn9.
A second module that loops through all memory locations, fixing them as needed. This should happen right after reset. Note that this will probably take many clock cycles (at least RAM_DEPTH clock cycles). Also note that it won't be implemented as a loop in VHDL. You could implement it using a counter, then use the count value as a read address, pass the data value through a an EDC function, and then write the corrected value back to the RAM module.
A top-level entity (optional), that instantiates modules (1) and (2), and coordinates the process. This module could have a 'init_done' pin that will be asserted after the verification and correction take place. This pin should be checked by the modules that use your RAM to know whether it is safe to start using the RAM.
To summarize, you could loop through all memory locations upon reset, fixing them as needed using an error-correcting code. After making sure all memory locations are ok, just assert an 'init_done' signal.

InterlockedExchange on two CPU cores

I have a Windows 7 driver where I want to synchronize access to a variable. Can I use InterlockedExchange for it?
My current understanding of InterlockedExchange is, that InterlockedExchange is done via compiler intrinsics. That means, the read (InterlockedExchange returns the old value) and the write is done in one clock cycle. The interlocked functions are atomic only when the variable is always accessed via an interlocked function.
But what happens in this case:
CPU1: InterlockedExchange(&Adapter->StatusVariable, 5);
CPU2: InterlockedExchange(&Adapter->StatusVariable, 3);
StatusVariable is written in the same clock cycle on two CPU cores. Does the function notice that the variable is accessed and defer the write to a different clock cycle? Or is it undefined which value the variable has after the write? Is it also possible that the variable contains garbage?
Edit: I am on x86 or x64.
InterlockedExchange generates a xchg instruction that has an implicit memory barrier.
The Intel Instruction set reference is your friend :) See Chapter 8 for more information on how locks work.
From the XCHG instruction:
The exchange instructions swap the contents of one or more operands and, in some cases, perform additional operations such as asserting the LOCK signal or modifying flags in the EFLAGS register.
The XCHG (exchange) instruction swaps the contents of two operands. This instruction takes the place of three
MOV instructions and does not require a temporary location to save the contents of one operand location while the
other is being loaded. When a memory operand is used with the XCHG instruction, the processor’s LOCK signal is
automatically asserted. This instruction is thus useful for implementing semaphores or similar data structures for
process synchronization. See “Bus Locking” in Chapter 8, “Multiple-Processor Management,”of the Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on bus locking.
If you have any questions about the reference just ask.
I have a Windows 7 driver where I want to synchronize access to a
variable. Can I use InterlockedExchange for it?
Maybe. Maybe not. It depends on what you are trying to do, what the variable represents and what your expectations are when you say "synchronize access".
With that said, I suspect the answer is no because I can't see how what you are doing counts as synchronization.
That means, the read (InterlockedExchange returns the old value) and
the write is done in one clock cycle.
Not exactly. The interlocked functions ensure that the operation happens atomically. How many clock cycles that takes is another issue. Forget about clock cycles.
The interlocked functions are atomic only when the variable is always
accessed via an interlocked function.
What does that even mean?
Does the function notice that the variable is accessed and defer the
write to a different clock cycle?
It's more accurate to say that the processor notices, which it does. Whether it defers one write to a different clock cycle, what do you care? Maybe it does, maybe it doesn't. It's none of your business what the process does.
All the compiler and processor will guarantee in your example and all that you need to know is that:
after the statement InterlockedExchange(&Adapter->StatusVariable, 3); the value of Adapter->StatusVariable will either be 3 or 5; and
after the statement InterlockedExchange(&Adapter->StatusVariable, 5); the value of Adapter->StatusVariable will either be 3 or 5.
It will have one of those two values and no other values. You just cannot know which of those values it will have and it should be obvious to see why that is.
Or is it undefined which value the variable has after the write?
That depends on your definition of "undefined" I guess. It's unclear which of the two values it will have, but it have either 3 or 5 assuming no other threads are changing the value after that point.
Is it also possible that the variable contains garbage?
If by 'garbage' you mean something other than either 3 or 5 then, in the absence of any other code that messes with the value, the answer is an unequivocal no. The variable will contain either the value 3 or the value 5.

Tristate buffers in Quartus II

I need to clear up a problem with an external input to a CPLD by putting it through a tristate buffer. I know Quartus II has a tristate-buffer megafunction, but I am curious - if I simply tell it to output Z on the specific pin, will in automatically synthesize so the tristate buffer is enabled on that pin, or do I have to implement the function/write a buffer?
Chapter 10 – Recommended HDL coding style – in the Quartus manual will tell you everything you need to know: http://www.altera.com/literature/hb/qts/qts_qii51007.pdf
In summary, tri-state buffers will be inferred on output ports if you drive it with a ‘Z’.
You can do it either way. If you assign 'Z' to the pin (NOTE: it has to be an upper-case Z, lower-case confuses Quartus) a tri-state buffer will be inferred. Alternately, you can directly instantiate various low-level I/O primitives which have a tri-state enable pin (including various DDR I/O primitives).
I have generally allowed Quartus to infer the tri-state buffers on 'normal' I/O pins, and used the low-level primitives when timing is critical and I want to force use of the I/O ring flip-flops, use the DDR I/O features, etc.

Resources