FPGA logic cells - logic

I have an small presentation about FPGA techonology. My questions is: If your FPGA has 85k logic cells, does this mean it can run 85k operations simultaneously?
What I am trying to achieve is to shock the audience with some crazy illustrated facts about FPGA technology or facts. The people who listens now very little about FPGA, so I want to impress them.

What's inside a 'cell' can vary per manufacturer, but the Xilinx definition (using this manufacturer as an example, as these are the devices that I'm familiar with) is one four-input look-up table, and one register. Xilinx devices are made up of a number of 'slices', and these contain a number of functional elements. These might include:
Look-up tables
Registers
Multiplexers
Logic for use in carry chains
etc
As an example, a Spartan6 LX4 has 600 slices, and the marketing material claims that this is equivalent to 3840 'logic cells'. You can look in the user guide for a device to determine exactly what is contained inside a slice.
In addition to this, there are other resources such as multipliers, memories, PLLs, etc.
I suppose you could say that one logic cell can perform one operation, but a single cell is only capable of very simple operations, for example an AND gate, 2:1 multiplexer, etc.

I would say no, but it depends on what you mean by an operation. A logic cell has the capability to implement a number of logical functions (and/or/xor), and it has the ability to hold a state with storage elements. These two functions are how every digital system under the sun operates. Even addition and subtraction are higher level constructs built on top of logical functions. As in other answers, FPGA manufacturers publish guides on what is inside of their logic cell. It is this fundamental cell that is stamped repeatedly in the die to create this "array" as in Field Programmable Gate "Array".

This yields a distinctly "more or less" answer. The logic blocks can be used in multiple modes, and you might even be able to pack more than one function in one (including with two independent outputs), but you must also be able to transport meaningful data to work on. It sounds like you have a 7z020 as an example. You may want to note that besides those logic cells, it also has 220 hardware multiply+add blocks. That amount is not random; the surrounding logic is enough to keep them fed in particular cases, every cycle. Looking in 7 Series FPGAs Configurable Logic Block User Guide (UG474), we find that the Logic Cells number given is an estimate of equivalent 4LUT+FF configurations. The reason this number is lower than the number of flipflops (106k) is that the input arguments for the two 5luts you can split a 6lut into must overlap.

Related

What are the difference in delay times of the basic AND, OR, NOT, NAND, NOR, XOR, XNOR gates?

1-1 What are the difference in delay times of the basic logic gates?
I found that NAND and NOR gates are preferred in digital circuit design for shorter delay time and that AND and OR gates might even be implemented with NOT and NAND/NOR gates.
1-2 Are there set or known difference in delay time between AND, OR, NOT gates?
For a typical fpga (LUT-based logical elements) there's no difference at all.
Single cell can implement a complex function based on its resulting truth table, and multiple expressions might be folded into single cell, so you wouldn't even find individual and/or/not "gates".
It might be different for ASIC, I don't know. But in a typical fpga you don't have gates, there are ram-based lookup tables, implementing complex functions of its inputs - 4-6 inputs, not just 2.
You'll find that in a big enough design the routing costs are much higher than delays in a single logical cell.
If you look at how these different gates are constructed you can see some of the reasons for differences. An inverter consists of one pull-up transistor and one pull down transistor. This is the simplest gate and is therefore potentially the fastest. A NAND has two pull-down devices in series and two pull-up transistors in parallel. The NOR is basically the opposite of the NAND. And yes: AND is usually just NAND + inverter.
The on resistance of a path will be higher with two transistors in series (making it slower), and the number of transistors connected to a single node will increase the captive load (making it slower). You can make things faster by using larger transistors (with lower on resistance) but that increases the load of whatever cell is driving it, which slows that cell down.
It is a big optimization problem which you probably shouldn't try to solve yourself. That is what the EDA tools are for.
Like most answers in life, it depends. There are many ways to build each type of logic gate and different types of transistors can be used to make each type of gate. You can build all gates from multiple universal gates like NAND and NOR. So the other gates would have a larger delay time. BJT transistors will have a larger delay than MOFET transistors. You can also use Schottky transistors to reduce delays compared to BJT. If you use an IC there are lots of components within the chip, some which may reduce delays and some that may increase delays. So you really have to compare what you are working with. Here is a video that shows the design of logic gates at the transistor level. https://youtu.be/nB6724G3b3E

What is the Random Control Logic in the 6502?

I am currently developing a subset of the 6502 in LogiSim and at the current stage I am determining which parts to implement and what can be cut out. One of my main resources is Hanson's Block Diagram.
I am currently trying to determine how exactly the instructions are decoded into the control lines. In the diagram below, there are two parts, the Decode ROM and the Random Control Logic.
How exactly does the 6502 decode the program instructions into control lines? As a follow up, Is it possible to simplify this area to eliminate the Random Control Logic and create the decoding with only one ROM?
I'm on the outskirts of my knowledge here, but my understanding is that the PLA decode ROM outputs its 130 control signals as a function of opcode and cycle, and the random logic is a purely functional unit that takes the PLA output as input in order to control the rest of the chip. I think you could combine the two into a single ROM; from looking at the die shot the random logic is about twice as large as the PLA so my guess would be that time/cost considerations, possibly including intelligent task subdivision and almost certainly including a calculation of debugging time as the 6502 was laid out literally by hand, using pen and ruler, led to the combined approach.

Layers and Neurons of a Neural Network

I would like to know a bit more about Neural Network, I'm developing a C++ program to make a NN but I'm stuck with the BackPropagation algorithm, sorry for not offering some working code.
I know that there are so many libraries for creating a NN in many languages, but I prefer to make one from my self. The point is that I don't know how many layers and how many neurons should be necessary for achieving a particular goal such as pattern recognition, or functions approximations, or whatever.
My questions are: if I'd like to recognize some particulars patterns, like in image detection, how many layers and neurons-per-layer should be necessary? Let's say my images are all 8x8 pixels, I would start naturally with an input layer of 64 neurons, but I don't have any idea of how many neurons I have to put in hidden layers, and also in output layer. Let's say I have to distinguish from cats and dogs, or whatever you may think, how could be the output layer? I can imagine an output layer with only-one neuron outputting a value between 0 and 1 with the classical logistic function (1/(1+exp(-x)) and when it is near 0 the input was a cat and when approaches 1 it was a dog, but ... is it correct? What if I add a new pattern like a fish? and what if the input contains a dog and a cat ( ..and a fish)? This make me thinking that the logistic function in the output layer is not very suitable for pattern recognition like this, only because 1/(1+exp(-x)) has a range in (0,1). Do I have to change the activation function or maybe add some other neurons to the output layer? Are there some other activations function more accurate to do this? Do every neurons in every layers have the same activation function, or it is different from layer to layer?
Sorry for all of this questions, but this topic is not very clear to me.
I read a lot around internet, and I found libraries all-yet-implemented and hard to read from, and many explanations to what a NN can do, but not how it can do.
I read a lot from https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ and http://neuralnetworksanddeeplearning.com/chap1.html, and here I understood how to approximate a function (because every neurons in a layer can be thought as a step-function with a particular step for weights and bias) and how back-propagation algorithm works, but other tutorials and similars were more focused on preexisting libraries. I also read this question Determining the proper amount of Neurons for a Neural Network but I would like to involve also the activation functions of a NN, which is the best and for what is the best.
Thanks in advance for your answers!
Your questions are quite general, so I can only give some general recommendations:
The number of layers you need depends on the complexity of the problem you want to solve. The more calculation is required to obtain an output from a given input, the more layers you need.
Only very simple problems can be solved with a single layer network. These are called linearly separable and are usually trivial. With two layers it gets better and with three layers, at least in theory, all kinds of classification tasks can be performed if you have enough cells within the layers. In practice, however it is often better to add a 4th or 5th layer to the network while reducing the number of cells within a single layer.
Be aware that the standard backpropagation algorithm performs badly with more than 4 or 5 layers. If you need more layers, have a look at Deep Learning.
The numbers of cells within each layer mainly depends on the number of inputs and, if you solve a classification task, the number of classes you want to detect. In practice it is quite common to reduce the number of cells from layer to layer, but there are exceptions.
Concerning your question about the output function: In most cases you should stick with one type of sigmoid function. The case you describe is not really an issue because you could add another output cell for your "fish" class. The choice of a specific activation function is not that critical. Basically you use one whose values and derivative can be calculated efficiently.
#Frank Puffer has already provided some nice information, but let me add my two cents. First off, much of what you're asking is in the area of hyperparameter optimization. Although there are various "rules of thumb", the reality is that determining the optimal architecture (number/size of layers, connectivity structure, etc.) and other parameters like the learning rate typically requires extensive experimentation. The good news is that the parameterization of these hyperparameters is among the simplest aspects of the implementation of a neural network. So I would recommend focusing on building your software such that the number of layers, size of layers, learning rate, etc., are all easily configurable.
Now you specifically asked about detecting patterns in an image. It's worth mentioning that using standard multi-layer perceptrons (MLPs) to perform classification on raw image data can be computationally expensive, especially for larger images. It's common to use architectures that are designed to extract useful, spacially-local features (i.e.: Convolutional Neural Networks or CNNs).
You could still use standard MLPs for this, but the computational complexity can make it an untenable solution. The sparse connectivity of CNNs for example dramatically reduce the number of parameters requiring optimization and simultaneously build a conceptual hierarchy of representations better suited for classification of images.
Regardless, I would recommend implementing backpropagation using stochastic gradient descent for optimization. This is still the approach typically used for training neural nets, CNNs, RNNs, etc.
Regarding the number of output neurons, this is one question that does have a simple answer: use "one-hot" encoding. For each class you want to recognize, you have an output neuron. In your example of the dog, cat, and fish classes, you have three neurons. For an input image representing a dog, you would expect a value of 1 for the "dog" neuron, and 0 for all the others. Then, during inference, you can interpret the output as a probability distribution reflecting the confidence of the NN. For example, if you get output dog:0.70, cat:0.25, fish:0.05, then you have a 70% confidence that the image is a dog, and so on.
For activation functions, the most recent research I've seen seems to indicate that Rectified Linear Units are generally a good choice since they're easy to differentiate and compute, and they avoid a problem that plagues deeper networks called the "vanishing gradient problem".
Best of luck!

Method for runtime comparison of two programs' objects

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?
In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.
The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):
For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.
Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.
First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.
Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.
In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.
Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.
Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.
Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.
(Making this answer community wiki, because dataflow programming and reactive programming are not my areas of expertise.)
The area of data flow programming appears to be related, and thus debugging of data flow programs may be helpful. This paper from 1981 gives several useful high level ideas. Although it's hard to translate these to immediately applicable code, it does suggest a method I'd overlooked: when approaching a program as a dataflow, one can either statically or dynamically identify where changes in input values cause changes in other values in the intermediate processing or in the output (not just changes in execution, if one were to examine control flow).
Although dataflow programming is often related to parallel or distributed computing, it seems to dovetail with Reactive Programming, which is how the monitoring of objects (e.g. the hashing) can be implemented.
This answer is far from adequate, hence the CW tag, as it doesn't really name the debugging method that I described. Perhaps this is a form of debugging for the reactive programming paradigm.
[Also note: although this answer is CW, if anyone has a far better answer in relation to dataflow or reactive programming, please feel free to post a separate answer and I will remove this one.]
Note 1: Henrik Nilsson and Peter Fritzson have a number of papers on debugging for lazy functional languages, which are somewhat related: the debugging goal is to assess values, not the execution of code. This paper seems to have several good ideas, and their work partially inspired this paper on a debugger for a reactive programming language called Lustre. These references don't answer the original question, but may be of interest to anyone facing this same challenge, albeit in a different programming context.

PLC Ladder Logic Outputs

On a single ladder rung how many outputs can you have. If you have more than one. Would it be AND Logic, or OR Logic. Series, or parallel. I'm trying to make six lights flash using timer on delay instructions with a closed input instruction. I will using an Allen Bradley SLC 500 series PLC.
In a ControlLogix or CompactLogix PLC a ladder logic rung may have as many outputs (OTE) as you like, both at the right hand end of logic rung and even in the middle of a logic rung.
Each output is controlled only by the logic leading up to it. If you have multiple outputs at the same point in the rung, they will all have the output reflecting the logic condition from the rung start up to that point. This is a common method used to drive several outputs with the same signal at once.
If you have multiple outputs at different points in the rung, each will have outputs that correspond to the logic leading to that output. Logic downstream from an OTE acts as if the OTE wasn't present.
Now, you may have complex devices (e.g., Timer) controlled by logic within a rung.
Obviouosly, further logic that depends on the output of the complex device (e.., Timer Done) will not be independent of the behaviour of complex device. But just like OTEs, you may have lots of complex devices in a rung.
If you are programming an SLC500 then you cannot have an OTE in the middle of a rung. It must be at the very right hand side of the rung. You may however (and is a common practice) create a branch around the OTE and have another OTE (or OTU, or OTL, or any other output) on its own branch (again at the very right of the branch).
So using this method you can have as many OTEs on any given rung. However a best practice is to limit the number (to say 10 or 20 per rung) for readability and split them onto several rungs as necessary.

Resources