Linux Kernel - Clock Framework - What is role of clk_prepare/unprepare?

Linux Kernel - Clock Framework - What is role of clk_prepare/unprepare? - linux-kernel

I am reading up the following article about the new Clock Framework present in the Linux Kernel..
http://lwn.net/Articles/489668/
What is unclear to me, is the usage of the new API's clk_prepare/unprepare, which complement the clk_enable/disable API's.
Also it is mentioned that while the API's clk_enable/disable can be called from an atomic context, this does not apply for clk_prepare/unprepare (which can sleep). Why is there this separation of functionality and behavior ?
I am keen to understand as to what is it about clocks that we need to prepare/unprepare them ?
Thanks,
~vj

Clocks may need PLLs to be set up and locked, voltage OPP to be set, or other prerequisite actions before clk_enable. For example:
drivers/clk/clk-highbank.c clk_pll_prepare()
This routine has wait loops that spin until the hardware PLL shows lock. Can't do that from an atomic context. Another LWN article speaks a bit to the prepare() vs enable() separation.
PLL and clock details are specific to the processor / SoC in question. Block diagrams would show clock tree of SoC input pins leading to various PLLs, then various clocks driven from each PLL (may also have power domains that can be turned on/off), and clocks individually enabled once the "prepare" is done. Long story, but I hope the above may be helpful.

Related

Connection of external crystal oscillators for FPGA

I am designing a Triple modular redundancy processor (TMR) system to synthesize in an Altera DE10lite FPGA Board. Its purpose is to demonstrate reliability of computation under the present of various faults. I need advice on how to connect three external crystal oscillators (instead of the on board crystal), with same ratings to drive the three processors inside the FPGA.I will be using a synchronization voting scheme to sync all three signals. Can this task be done?
Clock distribution triplication
I have read the following relevant links that describe using PLL's is this the correct way?
https://www.altera.com/documentation/mcn1395213337540.html#mcn1395213788377

No, that's unlikely to work.
If you run each soft CPU with a separate crystal, they will drift out of synchronization due to slight variations in frequency between the crystals.
If you try to use a majority voting scheme to create a single clock signal from three input clocks, you'll end up with a very weird, irregular clock signal which will probably cause faults in the logic driven by it.
Use one clock source at a time. If you're convinced you need to resist failures of an external clock, consider implementing some way to detect a failure in the current clock and switch to another one. (Keep in mind that this logic will need to still work without a functional clock… which may be difficult.)

How to interact between Nios and FPGA?

Example:
Let's assume there is a Nios running on a FPGA that sends randomly (or every second) a string to an attached display over the SPI interface. On the other hand there is the FPGA code that monitors a pushbutton. Every press on this button should send a string to the same attached display.
Question:
How works the interaction (or communication) between the FPGA and Nios in general or in such described case? How is it possible to 'inform' Nios that the pushbutton is pressed when this code is running under the FPGA code? Maybe there is a documentation about this topic to get an idea how it works...
Thanks in advance

Low speed or high speed?
For low speed, plug a GPIO core with enough I/O "pins" into the NIOS system and rebuild it. Wire your hardware to those pins, and use the GPIO driver code to access them. Done. Buttons count as low speed. SPI can too, though you'll probably find a much better SPI peripheral for NIOS, so I'd use that.
For high speed, you need to design a peripheral (IP core) that interfaces to whatever bus the NIOS system uses, and provides all the registers, memory, interrupt sources etc you need to interface to your VHDL hardware. There are plenty of example peripherals you can use as a starting point. Then you get to write the driver software to access that peripheral, again, starting from sample code.
This is a much more complex project, and while it's much faster than GPIO, you find "high speed" is relative; any embedded CPU is appallingly slow compared to custom hardware. We're not talking about factors of 2 here but orders of magnitude.
EDIT : Whichever approach you use, as described above, interacting with the hardware from the software side is best done through the driver software.
If you're in the situation where you have to write your own driver, then you declare variables to match each accessible register or memory block (represented by an array variable). Often the vendor tools can create a skeleton driver for you, from either the VHDL code or some other description. I don't know how Altera/Nios tools are set up but they surely have tutorials to teach you their approach.
If you have an Ada compiler you can declare these variables at package scope, to maintain proper abstraction and information hiding. But if you have to use C, with no packages, you are probably stuck with global variables.
You fix each variable at whatever physical address your hardware maps them to, and you must declare them "volatile" so that accesses to them are never optimised into registers.
If your hardware can interrupt the CPU, you have to write an interrupt handler function, with pragmas to tell the compiler which interrupt vector it should be connected to. You'll need to get the exact details from your own compiler documentation and examples of driver code for other peripherals.
I would start here:
https://www.altera.com/support/support-resources/design-examples/intellectual-property/embedded/nios-ii/exm-developing-hal-drivers.html
with sample code and a short "Guidelines" document
and use the NIOS software handbook for more depth.
To help find what you're looking for, apparently Altera use the terms "HAL" (Hardware Abstraction Layer) to describe the part of the driver that directly accesses the hardware, and "BSP" (Board Support Package) for the facilities that allow you to describe your hardware to the tools - and to your software team. Any tools to build a skeleton driver will be associated with the BSP : I see a section called "Creating a new BSP" in the software handbook.

Why use multiple clocks of the same speed in an FPGA design?

I very recently began experimenting with FPGAs. In researching things around the net I've noticed in several places that designs might use multiple separate PLL clocks of the exact same speed. Why is that?
One example I will give is this site: Parallella Linux Quick Start
They have their FCLK_CLK1 and FCLK_CLK2 both at 200MHz. Why is this recommended and not a single clock at 200MHz for both? Is it just customary to give each major component their own clock even if it is the same? Or am I missing something?

Beside the already mentioned reasons multiple other reasons exist why two PLL clocks of the same speed might exist.
Even if the frequency is exactly the same, differences might exist in clock phase or jitter. Using one PLL with fixed clock phase and another one with adjustable clock phase can be useful for proper sampling of external input signals or maintaining the correct phase difference between clock and output data. Techniques like that were especially popular before components such as IDELAY and ODELAY were widely available.
Crystal oscillators also will have small derivations from their marked value. If you have a communication link between two boards and both boards have their own oscillator, then one boards main clock might run at 200.01 Mhz and the other boards could run at 199.99 Mhz. In many cases both FPGAs will then their locally generated low-jitter clock as their main clock, but will also use the remote clock to sample incoming data. You can see this in ethernet PHYs: A 100 Mbit PHY usually has a 25 Mhz receive clock recovered from the input signal and a locally generated 25 mhz transmit clock.

There are many reasons to use multiple clocks of the exact same speed. So I will just state a few. However i don't have any deep knowledge of your example.
Magic on FPGA.
Like stated in the comments a FPGA is a highly complex device. Only the vendor knows exactly what is happening there, so they might give you some advice, which can be weird.
Clock distribution.
If you have a design, with just one clock source, it is critical to route the clock correctly. The clock has to trigger everywhere at the same time, which is hard to manage for the PnR tool. Today's FPGAs don't usually have this issue.
Different IPs on one FPGA.
If you have different IPs/Designs, which you fuse on one FPGA, the IPs can use different clocks. If you want to split it later again, you will need multiple sources of the clock anyway. Besides, you are forced to implement some registers if you switch a clockdomain and during the merging of your IPs, you don't mix up evrything, which is a good design style. This also maybe the case of your example.
HDMI support is provided by an IP core from Analog Devices ...
Output.
Maybe the additional clock is only used as an output at some I/O Port.
Low-Power.
In today's CMOS technology, the most power is wasted on transitions (transistor switches) and static power-leakage (the damn thing is so small, it just leaks current). With multiple clock domain, you have the opportunity to have less transitions per second. Or you can switch off parts of your device completly.

What is the advantage of using GPIO as IRQ.?

I know that we convert the GPIO to irq, but want to understand what is the advantage of doing so ?
If we need interrupt why can't we have interrupt line only in first place and use it directly as interrupt ?

What is the advantage of using GPIO as IRQ?
If I get your question, you are asking why even bother having a GPIO? The other answers show that someone may not even want the IRQ feature of an interrupt. Typical GPIO controllers can configure an I/O as either an input or an output.
Many GPIO pads have the flexibility to be open drain. With an open drain configuration, you may have a bi-direction 'BUS' and data can be both sent and received. Here you need to change from an input to an output. You can imagine this if you bit-bash I2C communications. This type of use maybe fine if the I2C is only used to initialize some other interface at boot.
Even if the interface is not bi-directional, you might wish to capture on each edge. Various peripherals use zero crossing and a timer to decode a signal. For example a laser bar code reader, a magnetic stripe reader, or a bit-bashed UART might look at the time between zero crossings. Is the time double a bit width? Is the line high or low; then shift previous value and add two bits. In these cases you have to look at the signal to see whether the line is high or low. This can happen even if polarity shouldn't matter as short noise pulses can cause confusion.
So even for the case where you have only the input as an interrupt, the current level of the signal is often very useful. If this GPIO interrupt happens to be connected to an Ethernet controller and active high means data is ready, then you don't need to have the 'I/O' feature. However, this case is using the GPIO interrupt feature as glue logic. Often this signalling will be integrated into a dedicated module. The case where you only need the interrupt is typically some custom hardware to detect a signal (case open, power disconnect, etc) which is not industry standard.
The ARM SOC vendor has no idea which case above the OEM might use. The SOC vendor gives lots of flexibility as the transistors on the die are cheap compared to the wire bond/pins on the package. It means that you, who only use the interrupt feature, gets economies of scale (and a cheaper part) because other might be using these features and the ARM SOC vendor gets to distribute the NRE cost between more people.
In a perfect world, there is maybe no need for this. Not so long ago when tranistors where more expensive, some lines did only behave as interrupts (some M68k CPUs have this). Historically the ARM only has a single interrupt line with one common routine (the Cortex-M are different). So the interrupt source has to be determined by reading another register. As the hardware needs to capture the state of the line on the ARM, it is almost free to add the 'input controller' portion.
Also, for this reason, all of the ARM Linux GPIO drivers have a macro to convert from a GPIO pin to an interrupt number as they are usually one-to-one mapped. There is usually a single 'GIC' interrupt for the GPIO controller. There is a 'GPIO' interrupt controller which forms a tree of interrupt controllers with the GIC as the root. Typically, the GPIO irq numbers are Max GIC IRQ + port *32 + pin; so the GPIO irq numbers are just appended to the 'GIC' irq numbers.

If you were designing a bespoke ASIC for one specific system you could indeed do precisely that - only implement exactly what you need.
However, most processors/SoCs are produced as commodity products, so more flexibility allows them to be integrated in a wider variety of systems (and thus sell more). Given modern silicon processes, chip size tends to be constrained by the physical packaging, so pin count is at an absolute premium. Therefore, allowing pins to double up as either I/O or interrupt sources depending on the needs of the user offers more functionality in a given space, or the same functionality in less space, depending on which way you look at it.

It is not about "converting" anything - on a typical processor or microcontroller, a number of peripherals are connected to an interrupt controller; GPIO is just one of those peripherals. It is also by no means universally true; different devices have different capabilities, but in any case you are simply configuring a GPIO pin to generate an interrupt - that's a normal function of the GPIO not a "conversion".
Prior to ARM Cortex, ARM did not define an interrupt controller, and the core itself had only two interrupt sources (IRQ and FIQ). A vendor defined interrupt controller was required to multiplex the single IRQ over multiple peripherals. ARM Cortex defines an interrupt controller and a more flexible interrupt architecture; it is possible to achieve zero-latency interrupt from a GPIO, so there is no real advantage in accessing a dedicated interrupt? Doing that might mean the addition of external signal conditioning circuitry that is often incorporated in GPIO on the die.

Bus protocol for a microcontroller in VHDL

I am designing a microcontroller in VHDL. I am at the point where I understand the role of each component (ALU/Memory...), and some ideas on how to realise them. I basically want to implement a Von Neumann architecture.
But here is what I don't get : how do the components communicate ? I don't know how to design my bus (buses?). I am therefore looking for a simple bus implementation and protocol.
My unresolved questions :
Is it simpler to have one bus for everything or to separate the different kind of data ?
How does each component knows when to "listen" and when to "write" ?
The emphasis is on the simplicity of the design (and thus of the implementation). I do not care about speed. I want to do everything from scratch (ie. no pre-made softcore).
I don't know if this is of importance at this stage, but it will not need to run "real" compiled code, is have any kind of compatibility with anything existing. Also, at which point do I begin to think about my 'assembly' instructions ? I thinks that I will load them directly in the memory.
Thank you for your help.
EDIT :
I ended up drawing (a lot of) inspiration from the Picoblaze, because it is :
simple to understand
under a BSD Licence
Specifically, I started by adding a few instructions to it.

Since your main concern seems to be learning about microcontroller design, a good approach could be taking a look into some of the earlier microprocessor models. Take for instance the Z80:
Source: http://landley.net/history/mirror/cpm/z80.html
Another good Z80 HW description: http://www.msxarchive.nl/pub/msx/mirrors/msx2.com/zaks/z80prg02.htm
To answer your first question (single vs. multiple buses), this chip uses a single bus for everything, and it has a very simple design. You could probably use something similar. To make the terminology clear, a single system bus may be composed of sub-buses (and they are also called buses). The figure shows a system bus composed of a bidirection data bus (8-bit wide) and an address bus (16-bit wide).
To answer your second question (how do components know when they are active),
in the image above you see two distinct signals, memory request and I/O request. Only one will be active at a time, and when I/O request is active, that's when a peripheral could potentially be accessed.
If you don't have many peripherals, you don't need to use all 16 address lines (some Z80's have an 8-bit I/O space). Each peripheral would be accessed through some addresses in this space. For instance, in a very simple system:
a timer peripheral could use addresses from 00h to 03h
a uart could addresses from 08h to 0Fh
In this simple example, you need to provide two circuits: one would detect when the address is within the range 00-03h, and another would do the same for 08-0Fh. If you do a logic "and" between the output of each detector and the I/O request signal, then you would have two signals indicating when each of the peripherals is being accessed. Your peripheral hardware should primarily listen to this signal.
Finally, regarding your question about instructions, the dataflow inside your microprocessor would have several stages. This is usually called a processor's datapath. It is common to divide the stages into:
FETCH: read an instruction from program memory
DECODE: check specific bits within the instructions, and decide what type of instruction it is
EXECUTE: take the actions required by the instruction (e.g., ALU operations)
MEMORY: for some instructions, you need to do a data read or write
WRITE BACK: update your CPU registers with new values affected by the instruction
Source: https://www.cs.umd.edu/class/fall2001/cmsc411/projects/DLX/proj.html
Most of your job of dealing with individual instructions would be done in the DECODE and EXECUTE stages. As for the datapath control, you will need a state machine that controls the sequence of operations through the 5 stages. This functional block is usually called a Control Unit. Here you have a few choices:
Your state machine could go throgh all stages sequentially, one at a time. An instruction would take several clock cycles to execute.
Similar as the choice above, but combining two or more stages in a single cycle if you want to make things simpler and faster.
Pipeline the execution of instructions. This can give a great speed boost, but maybe it's better left for later because things can get quite complex.
As for the implementation, I recommend keeping the functional blocks as separate entities, and make sure you write a testbench for each block. Your job will go faster if you write those testbenches.
As for the blocks, the Register File is pretty easy to code. The Instruction Decoder is also easy if you have a clear idea of your instruction layout and opcodes. And the ALU is also easy if you know the operations it needs to perform.
I would start by writing testbenches for the Instruction Decoder and the Register File. Then I would write a script that runs all the testbenches and checks their results automatically. Only then I would focus on the implementation of the functional blocks themselves.

Basically on-chip busses will use parallel busses for address and data input and output. Usually there will be some kind of arbiter which decides which component is allowed to write to the bus. So a common approach is:
The component that wants to write will set a data line connected to the arbiter to high or low to signal that it wants to access the bus.
The arbiter decides who gets access to the bus
The arbiter sets the chip select of the component that should be allowed next to access the bus.
Usually your on chip bus will use a master/slave concept, so only masters have acting access to the bus. The slaves only wait for requests from the master.
I for one like the AMBA AHB/APB design but this might be a little over the top for your application. You can have a look at this book looking for ideas on how to implement your bus

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio