Related
What is FLOPS in field of deep learning? Why we don't use the term just FLO?
We use the term FLOPS to measure the number of operations of a frozen deep learning network.
Following Wikipedia, FLOPS = floating point operations per second. When we test computing units, we should consider of the time. But in case of measuring deep learning network, how can I understand this concept of time? Shouldn't we use the term just FLO(floating point operations)?
Why do people use the term FLOPS? If there is anything I don't know, what is it?
==== attachment ===
Frozen deep learning networks that I mentioned is just a kind of software. It's not about hardware. In the field of deep learning, people use the term FLOPS to measure how many operations are needed to run the network model. In this case, in my opinion, we should use the term FLO. I thought people confused about the term FLOPS and I want to know if others think the same or if I'm wrong.
Please look at these cases:
how to calculate a net's FLOPs in CNN
https://iq.opengenus.org/floating-point-operations-per-second-flops-of-machine-learning-models/
Confusingly both FLOPs, floating point operations, and FLOPS, floating point operations per second, are used in reference to machine learning. FLOPs are often used to describe how many operations are required to run a single instance of a given model, like VGG19. This is the usage of FLOPs in both of the links you posted, though unfortunately the opengenus link incorrectly mistakenly uses 'Floating point operations per second' to refer to FLOPs.
You will see FLOPS used to describe the computing power of given hardware like GPUs which is useful when thinking about how powerful a given piece of hardware is, or conversely, how long it may take to train a model on that hardware.
Sometimes people write FLOPS when they mean FLOPs. It is usually clear from the context which one they mean.
I not sure my answer is 100% correct. but this is what i understand.
FLOPS = Floating point operations per second
FLOPs = Floating point operations
FLOPS is a unit of speed. FLOPs is a unit of amount.
What is FLOPS in field of deep learning? Why we don't use the term just FLO?
FLOPS (Floating Point Operations Per Second) is the same in most fields - its the (theoretical) maximum number of floating point operations that the hardware might (if you're extremely lucky) be capable of.
We don't use FLO because FLO would always be infinity (given an infinite amount of time hardware is capable of doing an infinite amount of floating point operations).
Note that one "floating point operation" is one multiplication, one division, one addition, ... Typically (for modern CPUs) FLOPS is calculated from repeated use of a "fused multiply then add" instruction, so that one instruction counts as 2 floating point operations. When combined with SIMD a single instruction (doing 8 "multiple and add" in parallel) might count as 16 floating point instructions. Of course this is a calculated theoretical value, so you ignore things like memory accesses, branches, IRQs, etc. This is why "theoretical FLOPs" is almost never achievable in practice.
Why do people use the term FLOPS? If there is anything I don't know, what is it?
Primarily it's used to describe how powerful hardware is for marketing purposes (e.g. "Our new CPU is capable of 5 GFLOPS!").
1-1 What are the difference in delay times of the basic logic gates?
I found that NAND and NOR gates are preferred in digital circuit design for shorter delay time and that AND and OR gates might even be implemented with NOT and NAND/NOR gates.
1-2 Are there set or known difference in delay time between AND, OR, NOT gates?
For a typical fpga (LUT-based logical elements) there's no difference at all.
Single cell can implement a complex function based on its resulting truth table, and multiple expressions might be folded into single cell, so you wouldn't even find individual and/or/not "gates".
It might be different for ASIC, I don't know. But in a typical fpga you don't have gates, there are ram-based lookup tables, implementing complex functions of its inputs - 4-6 inputs, not just 2.
You'll find that in a big enough design the routing costs are much higher than delays in a single logical cell.
If you look at how these different gates are constructed you can see some of the reasons for differences. An inverter consists of one pull-up transistor and one pull down transistor. This is the simplest gate and is therefore potentially the fastest. A NAND has two pull-down devices in series and two pull-up transistors in parallel. The NOR is basically the opposite of the NAND. And yes: AND is usually just NAND + inverter.
The on resistance of a path will be higher with two transistors in series (making it slower), and the number of transistors connected to a single node will increase the captive load (making it slower). You can make things faster by using larger transistors (with lower on resistance) but that increases the load of whatever cell is driving it, which slows that cell down.
It is a big optimization problem which you probably shouldn't try to solve yourself. That is what the EDA tools are for.
Like most answers in life, it depends. There are many ways to build each type of logic gate and different types of transistors can be used to make each type of gate. You can build all gates from multiple universal gates like NAND and NOR. So the other gates would have a larger delay time. BJT transistors will have a larger delay than MOFET transistors. You can also use Schottky transistors to reduce delays compared to BJT. If you use an IC there are lots of components within the chip, some which may reduce delays and some that may increase delays. So you really have to compare what you are working with. Here is a video that shows the design of logic gates at the transistor level. https://youtu.be/nB6724G3b3E
As a firmware engineer, how do you compare the computation cost of the following operations (maybe amount of resources needed)
addition/subtraction
multiplication
division
trigonometric functions such as cosine
square root
For FLOATING point with 32 bits calculation.
Answer from a hardware (not firmware) point of view: unfortunately there is no simple answer to your question. Each function you listed has many different hardware implementations, usually between small-slow and large-fast. Moreover it depends on your target FPGA because some of them embed these functions as hard macros, so the question is not any more what do they cost, but do I have enough of them in this FPGA?
As a partial answer you can take this: with the most straightforward, combinatorial implementation, not using any hard macro, an integer or fixed-point N-bits adder/subtracter costs O(N) while a N x N-bits multiplier costs O(NxN). Roughly.
For the other functions, it is extremely difficult to answer, there are far too many variants to consider (fixed/floating point, latency, throughput, accuracy, range...). Assuming you make similar choices for all of them, I would say: add < mul < div < sqrt < trig. But honestly, do not take this for granted. If you are working with floating point numbers, for instance, the adder might be closer or even larger than the multiplier because it requires mantissas alignment, that is a barrel shifter.
A firmware engineer with a little hardware design knowledge would likely use premade cores for each of those functions. Xilinx, Altera, and others provide parametrizeable cores that can perform most of the functions that you specified. The parameters for the core will likely include input size and number of cycles taken for the specified operation, among many others. The resources used by each of these cores will vary significantly based on the configuration and also what vendor and corresponding FPGA family that the core is being created for ( for instance Altera Cyclone 5 or Xilinx Artix 7). The latency that the cores will have will need to be a factor in your decision as well, higher speed families of FPGAs can operate at faster clock rates decreasing the time that operations take.
The way to get real estimates is to install the vendors tools ( for instance Xilinx's Vivado or Altera's Quartus) and use those tools to generate each of the types of cores required. You will need pick an exact FPGA part, in the family that you are evaluating. Choose one with many pins to avoid running out of I/O on the FPGA. Then synthesize and implement a design for every of the cores you are evaluating. The simplest design will have just the core that you are evaluating and map it's inputs and outputs to the FPGAs I/O pins. The implementation results will give you the number of resources that the design takes and also the maximum clock frequency that the design can run at.
What are typical means by which a random number can be generated in an embedded system? Can you offer advantages and disadvantages for each method, and/or some factors that might make you choose one method over another?
First, you have to ask a fundamental question: do you need unpredictable random numbers?
For example, cryptography requires unpredictable random numbers. That is, nobody must be able to guess what the next random number will be. This precludes any method that seeds a random number generator from common parameters such as the time: you need a proper source of entropy.
Some applications can live with a non-cryptographic-quality random number generator. For example, if you need to communicate over Ethernet, you need a random number generator for the exponential back-off; statistic randomness is enough for this¹.
Unpredictable RNG
You need an unpredictable RNG whenever an adversary might try to guess your random numbers and do something bad based on that guess. For example, if you're going to generate a cryptographic key, or use many other kinds of cryptographic algorithms, you need an unpredictable RNG.
An unpredictable RNG is made of two parts: an entropy source, and a pseudo-random number generator.
Entropy sources
An entropy source kickstarts the unpredictability. Entropy needs to come from an unpredictable source or a blend of unpredictable sources. The sources don't need to be fully unpredictable, they need to not be fully predictable. Entropy quantifies the amount of unpredictability. Estimating entropy is difficult; look for research papers or evaluations from security professionals.
There are three approaches to generating entropy.
Your device may include some non-deterministic hardware. Some devices include a dedicated hardware RNG based on physical phenomena such as unstable oscillators, thermal noise, etc. Some devices have sensors which capture somewhat unpredictable values, such as the low-order bits of light or sound sensors.
Beware that hardware RNG often have precise usage conditions. Most methods require some time after power-up before their output is truly random. Often environmental factors such as extreme temperatures can affect the randomness. Read the RNG's usage notes very carefully. For cryptographic applications, it is generally recommended to make statistical tests the HRNG's output and refuse to operate if these tests fail.
Never use a hardware RNG directly. The output is rarely fully unpredictable — e.g. each bit may have a 60% probability of being 1, or the probability of two consecutive bits being equal may be only 48%. Use the hardware RNG to seed a PRNG as explained below.
You can preload a random seed during manufacturing and use that afterwards. Entropy doesn't wear off when you use it²: if you have enough entropy to begin with, you'll have enough entropy during the lifetime of your device. The danger with keeping entropy around is that it must remain confidential: if the entropy pool accidentally leaks, it's toast.
If your device has a connection to a trusted third party (e.g. a server of yours, or a master node in a sensor network), it can download entropy from that (over a secure channel).
Pseudo-random number generator
A PRNG, also called deterministic random bit generator (DRBG), is a deterministic algorithm that generates a sequence of random numbers by transforming an internal state. The state must be seeded with sufficient entropy, after which the PRNG can run practically forever. Cryptographic-quality PRNG algorithms are based on cryptographic primitives; always use a vetted algorithm (preferably some well-audited third-party code if available).
The PRNG needs to be seeded with entropy. You can choose to inject entropy once during manufacturing, or at each boot, or periodically, or any combination.
Entropy after a reboot
You need to take care that the device doesn't boot twice in the same RNG state: otherwise an observer can repeat the same sequence of RNG calls after a reset and will know the RNG output the second time round. This is an issue for factory-injected entropy (which by definition is always the same) as well as for entropy derived from sensors (which takes time to accumulate).
If possible, save the RNG state to persistent storage. When the device boots, read the RNG state, apply some transformation to it (e.g. by generating one random word), and save the modified state. After this is done, you can start returning random numbers to applications and system services. That way, the device will boot with a different RNG state each time.
If this is not possible, you ned to be very careful. If your device has factory-injected entropy plus a reliable clock, you can mix the clock value into the RNG state to achieve unicity; however, beware that if your device loses power and the clock restarts from some fixed origin (blinking twelve), you'll be in a repeatable state.
Predictable RNG state after a reset or at the first boot is a common problem with embedded devices (and with servers). For example, a study of RSA public keys showed that many had been generated with insufficient entropy, resulting in many devices generating the same key³.
Statistical RNG
If you can't achieve a cryptographic quality, you can fall back to a less good RNG. You need to be aware that some applications (including a lot of cryptography) will be impossible.
Any RNG relies on a two-part structure: a unique seed (i.e. an entropy source) and a deterministic algorithm based on that seed.
If you can't gather enough entropy, at least gather as much as possible. In particular, make sure that no two devices start from the same state (this can usually be achieved by mixing the serial number into the RNG seed). If at all possible, arrange for the seed not to repeat after a reset.
The only excuse not to use a cryptographic DRBG is if your device doesn't have enough computing power. In that case, you can fall back to faster algorithm that allow observers to guess some numbers based on the RNG's past or future output. The Mersenne twister is a popular choice, but there have been improvements since its invention.
¹ Even this is debatable: with non-crypto-quality random backoff, another device could cause a denial of service by aligning its retransmission time with yours. But there are other ways to cause a DoS, by transmitting more often.
² Technically, it does, but only at an astronomical scale.
³ Or at least with one factor in common, which is just as bad.
One way to do it would be to create a Pseudo Random Bit Sequence, just a train of zeros and ones, and read the bottom bits as a number.
PRBS can be generated by tapping bits off a shift register, doing some logic on them, and using that logic to produce the next bit shifted in. Seed the shift register with any non zero number. There's a math that tells you which bits you need to tap off of to generate a maximum length sequence (i.e., 2^N-1 numbers for an N-bit shift register). There are tables out there for 2-tap, 3-tap, and 4-tap implementations. You can find them if you search on "maximal length shift register sequences" or "linear feedback shift register.
from: http://www.markharvey.info/fpga/lfsr/
HOROWITZ AND HILL gave a great part of a chapter on this. Most of the math surrounds the nature of the PRBS and not the number you generate with the bit sequence. There are some papers out there on the best ways to get a number out of the bit sequence and improving correlation by playing around with masking the bits you use to generate the random number, e.g., Horan and Guinee, Correlation Analysis of Random Number Sequences based on Pseudo Random Binary Sequence Generation, In the Proc. of IEEE ISOC ITW2005 on Coding and Complexity; editor M.J. Dinneen; co-chairs U. Speidel and D. Taylor; pages 82-85
An advantage would be that this can be achieved simply by bitshifting and simple bit logic operations. A one-liner would do it. Another advantage is that the math is pretty well understood. A disadvantage is that this is only pseudorandom, not random. Also, I don't know much about random numbers, and there might be better ways to do this that I simply don't know about.
How much energy you expend on this would depend on how random you need the number to be. If I were running a gambling site, and needed random numbers to generate deals, I wouldn't depend on Pseudo Random Bit Sequences. In those cases, I would probably look into analog noise techniques, maybe Johnson Noise around a big honking resistor or some junction noise on a PN junction, amplify that and sample it. The advantages of that are that if you get it right, you have a pretty good random number. The disadvantages are that sometimes you want a pseudorandom number where you can exactly reproduce a sequence by storing a seed. Also, this uses hardware, which someone must pay for, instead of a line or two of code, which is cheap. It also uses A/D conversion, which is yet another peripheral to use. Lastly, if you do it wrong -- say make a mistake where 60Hz ends up overwhelming your white noise-- you can get a pretty lousy random number.
What are typical means by which a random number can be generated in an embedded system?
Giles indirectly stated this: it depends on the use.
If you are using the generator to drive a simulation, then all you need is a uniform distribution and a linear congruential generator (LCG) will work fine.
If you need a secure generator, then its a trickier problem. I'm side-stepping what it means to be secure, but from 10,000 feet think "wrap it in a cryptographic transformation", like a SHA-1/HMAC or SHA-512/HMAC. There are others ways, like sampling random events, but they may not be viable.
When you need secure random numbers, some low resource devices are notoriously difficult to work with. See, for example, Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices and Traffic sensor flaw that could allow driver tracking fixed. And a caveat for Linux 3.0 kernel users: the kernel removed a couple of entropy sources, so entropy depletion and starvation might have gotten worse. See Appropriate sources of entropy on LWN.
If you have a secure generator, then your problem becomes getting your hands on a good seed (or seeds over time). One of the better methods I have seen for environments that are constrained is Hedging. Hedging was proposed for Virtual Machines where a program could produce the same sequence after a VM reset.
The idea for hedging is to extract the randomness provided by your peer, and use it to keep you secure generator fit. For example, in the case of TLS, there is a client_random and a server_random. If the device is a server, then it would stir in the client_random. If the device is a client, then it would stir in server_random.
You can find the two papers of interest that address hedging at:
When Good Randomness Goes Bad: Virtual Machine Reset
Vulnerabilities and Hedging Deployed Cryptography
When Virtual is Harder than Real: Resource Allocation Challenges in
Virtual Machine Based IT Environments
Using client_random and a server_random is consistent with Peter Guttman's view on the subject: "mix every entropy source you can get your hands on into your PRNG, including less-than-perfect ones". Gutmann is the author of Engineering Security.
Hedging only solves part of the problem. You will still need to solve other problems, like how to bootstrap the entropy pool, how to regenerate system key pairs when the pool is in a bad state, and how persist the entropy across reboots when there's no filesystem.
Although it may not be the most complex or sound method, it can be fun to use external stimuli as your seed for random number generation. Consider using analogue input from a photodiode, or a thermistor. Even random noise from a floating pin could potentially yield some interesting results.
For example you measure the data coming from some device, it can be a mass of the object moving on the bridge. Because it is moving the mass will give data which will vibrate in some amplitude depending on the mass of the object. Bigger the mass - bigger the vibrations.
Are there any methods for filtering such kind of noise from that data?
May be using some formulas of vibrations? Have no idea what kind of formulas or algorithms (filters) can be used here. Please suggest anything.
EDIT 2:
Better picture, I just draw it for better understanding:
Not very good picture. From that graph you can see that the frequency is the same every
time, but the amplitude chanbges periodically. Something like that I have when there are no objects on the moving road. (conveyer belt). vibrating near zero value.
When the object moves, I there are the same waves with changing amplitude.
The graph can tell that there may be some force applying to the system and which produces forced occilations. So I am interested in removing such kind of noise. I do not know what force causes such occilations. Soon I hope I will get some data on the non moving road with and without object on it for comparison with moving road case.
What you have in your last plot is basically an amplitude modulated oscillation coming from a function like:
f[x] := 10 * (4 + Sin[x]) * Sin[80 * x]
The constants have been chosen to match your plot (using just a rule of thumb)
The Plot of this function is
That isn't "noise" (although may be some noise is there too), but can be filtered easily.
Let's see your data for the static and moving payloads ....
Edit
Based on your response to several comments, and based in my previous experience with weighting devices:
You are interfacing the physical world, not just getting input from a mouse and keyboard. It is very important for you understand the device, how it works and how it is designed.
You need a calibration procedure. You have to use several master weights to be sure that the device is working properly and linearly in the whole scale, and that the static case is measured much better than your dynamic needs.
You'll not be able to predict if you can measure with several loads in the conveyor until you do some experiments and look very carefully at the resulting plots
You need to be sure that a load placed anywhere in the conveyor shows the same reading. Or at least you should be able to correlate reading and position.
As I said before, you need a lot of info, and it seems that is not available. I always worked as a team with the engineers designing the device.
Don't hesitate to add more info ...
Have you tried filters with lowpass characteristics? There are different approaches for smoothing data (i.e. Savitzky-Golay, Gauss, moving average) but often, a simple N-point median filter is already sufficient.
It really depends on what you're after.
Take a look at this book:
The Scientist and Engineer's Guide to Digital Signal Processing
You can download it for free. In particular, check chapters 14 and 15.
If the frequency changes with mass and you're trying to measure mass, why not measure the frequency of the oscillations and use that as your primary measure?
Otherwise you need a notch filter which is tunable - figure out the frequency of the "noise" and tune the notch filter to that.
Another book to try is Lyons Understanding Digital Signal Processing
In order to smooth the signal, I'd average the previous 2 * n samples where n is the maximum expected wavelength of the vibrations.
This should cause most of the noise to be eliminated.
If you have some idea of the range of frequencies, you could do a simple average as long as the measurement period were sufficiently long to give you the level of accuracy you want to achieve. The more wavelengths worth of data you average against, the smaller the ratio of contributed error from a partial wavelength.
I'd suggest first simulating/modeling this in software like Matlab.
Data you'll need to consider:
The expected range of vibration frequencies
The measurement accuracy you want to achieve
The expected range of mass you'll want to measure
The function of mass to vibration amplitude
You should be able to apply the same principles as noise-cancelling microphones: put two sensors out, then subtract the secondary sensor's (farther away from the good signal source) signal from the primary sensor's (closer to the good signal source) signal.
Obviously, this works best if the "noise" will reach both sensors fairly equally while the "signal" reaches the primary sensor much more strongly.
For things like sound, this is pretty easy to do in the sensor itself, which makes your software a lot easier and more performant. Depending on what you're measuring, this might be easier to do with multiple sets of hardware and doing the cancellation in software.
If you can characterize the frequency spectra of the unwanted vibration noise, you might be able to synthesize a set of (near) minimum phase notch or band reject filter(s) to allow you to acquire your desired signal at your desired S/N ratio with minimized latency or data set size.
Filtering noisy digital signals is straight forward, as previous posters have noted. There are lots of references. You have not however stated what your objectives are clearly, so we cannot point you into a good direction. Are you looking for a single measurement of a single object on a bridge? [Then see other answers].
Are you monitoring traffic on this bridge and weighing each entity as it passes by? Then you need to determine when entities are on the sensor and when they are not. Typically, as long as the sensor's noise floor is significantly lower than the signal you're measuring this can be accomplished by simple thresholding.
Are you trying to measure the vibrations of the bridge caused by other vehicles? In which case you need either a more expensive sensor if you're having problems doing this, or a clearer measuring objective.