Im studying computer performance and I cant understand why
clock cycle time = 1 / clock rate
Why is this obvious?
Clock rate is usually expressed in herts (Hz).
1 Hz = 1 cycle/second.
It's just simple mathematics. To get how much seconds it takes for 1 cycle, then you just have to invert it.
Related
So I am going through some tutorials, and it seems they keep using "instructions" and "cycles" interchangeably, so now I am confused what is actually measured in Hertz (on the most basic level, without going into what the modern processors can do in parallel etc, trying to learn the basics here).
Say, the program is as follows: load two numbers, add them, store result.
So there will be 4 cycles:
load number A [fetch-decode-execute]
load number B [fetch-decode-execute]
add A and B [fetch-decode-execute]
store result [fetch-decode-execute]
What is a cycle here, and what is an instruction?
There are 4 cycles, or 12 instructions, correct?
Say, it takes CPU 1 sec to run this program.
What will be the CPU clock speed? 12 instructions/1 sec or 4 cycles/1 sec?
If the former one, then is the clock speed of the CPU 12 Hertz?
If the latter one, then is the clock speed of the CPU 4 Hertz?
From helpful comments by #Nate Eldredge:
"A fetch-decode-execute cycle is one instruction cycle, but three clock cycles.
The clock speed measures the number of clock cycles per second."
Thus, if the program is executed within 1 second, and it takes 12 clock cycles, the clock speed of that particular CPU is 12 Hz.
From what I understand, to calculate CPI, it's the percentage of the type of instruction multiplied by the number of cycles right? Does the type of machine have any part of this calculation whatsoever?
I have a problem that asks me if a change should be recommended.
Machine 1: 40% R - 5 Cycles, 30% lw - 6 Cycles, 15% sw - 6 Cycles, 15% beq 3 - Cycles, on a 2.5 GHz machine
Machine 2: 40% R - 5 Cycles, 30% lw - 6 Cycles, 15% sw - 6 Cycles, 15% beq 4 - Cycles, on a 2.7 GHz machine
By my calculations, machine 1 has 5.15 CPI while machine 2 has 5.3 CPI. Is it okay to ignore the GHz of the machine and say that the change would not be a good idea or do I have to factor the machine in?
I think the point is to evaluate a design change that makes an instruction take more clocks, but allows you to raise the clock frequency. (i.e. leaning towards a speed-demon design like Pentium 4, instead of brainiac like Apple's A7/A8 ARM cores. http://www.lighterra.com/papers/modernmicroprocessors/)
So you need to calculate instructions per second to see which one will get more work done in the same amount of real time. i.e. (clock/sec) / (clocks/insn) = insn/sec, cancelling out the clocks from the units.
Your CPI calculation looks ok; I didn't check it, but yes a weighted average of the cycles according to the instruction mix.
These numbers are obviously super simplified; any CPU worth building at 2.5GHz would have some kind of branch prediction so the cost of a branch isn't just a 3 or 4 instruction bubble. And taking ~5 cycles per instruction on average is pathetic. (Most pipelined designs aim for at least 1 instruction per clock.)
Caches and superscalar CPUs also lead to complex interactions between instructions depending on whether they depend on earlier results or not.
But this is sort of like what you might do if considering increasing the L1d cache load-use latency by 1 cycle (for example), if that took it off the critical path and let you raise the clock frequency. Or vice versa, tightening up the latency or reducing the number of pipeline stages on something at the cost of reducing frequency.
Cycles per instruction a count of cycles. ghz doesnt matter as far as that average goes. But saying that we can see from your numbers that one instruction is more clocks but the processors are a different speed.
So while it takes more cycles to do the same job on the faster processor the speed of the processor DOES compensate for that so it seems clear this is a question about does the processor speed account for the extra clock?
5.15 cycles/instruction / 2.5 (giga) cycles/second, cycles cancels out you get
2.06 seconds/(giga) instruction or (nano) seconds/ instruction
5.30 / 2.7 = 1.96296 (nano) seconds / instruction
The faster one takes a slightly less amount of time so it will run the program faster.
Another way to see this to check the math.
For 100 clock cycles on the slower machine 15% of those are beq. So 15 of the 100 clocks, which is 5 beq instructions. The same 5 beq instructions take 20 clocks on the faster machine so 105 clocks total for the same instructions on the faster machine.
100 cycles at 2.5ghz vs 105 at 2.7ghz
we want the amount of time
hz is cycles / second we want seconds on the top
so we want
cycles / (cycles/second) to have cycles cancel out and have seconds on the top
1/2.5 = 0.400 (400 picoseconds)
1/2.7 = 0.370
0.400 * 100 = 40.00 units of time
0.370 * 105 = 38.85 units of time
So despite taking 5 more cycles the processor speed differences is fast enough to compensate.
2.7/2.5 = 1.08
105/100 = 1.05
so 2.5 * 1.05 = 2.625 so a processor 2.625ghz or faster would run that program faster.
Now what were the rules for changing computers, is less time defined as a reason to change computers? What is the definition of better? How much more power does the faster one consume it might take less time but the power consumption might not be linear so it may take more watts despite taking less time. I assume the question is not that detailed, meaning it is vague meaning it is a poorly written question on its own, so it goes to what the textbook or lecture defined as the threshold for change to the other processor.
Disclaimer, dont blame me if you miss this question on your homework/test.
Outside an academic exercise like this, the real world is full of pipelined processors (not all but most of the folks writing programs are writing programs for) and basically you cant put a number on clock cycles per instruction type in a way that you can do this calculation because of a laundry list of factors. Make sore you understand that, nice exercise, but that specific exercise is difficult and dangerous to attempt on real world processors. Dangerous in that as hard as you work you may be incorrectly measuring something and jumping to the wrong conclusions and as a result making bad recommendations. At the same time there is very much the reality that faster ghz does improve some percentage of the execution, but another percentage suffers, and is there a net gain or loss. Or a new processor design faster or slower may have features that perform better than an older processor, but not all feature will be better, there is a tradeoff and then we get into what "better" means.
I have been stuck on this for the past day. Im not sure how to calculate cpu utilization percentage for processes using round robin algorithm.
Let say we have these datas with time quantum of 1. Job Letter followed by arrival and burst time. How would i go about calculating the cpu utilization? I believe the formula is
total burst time / (total burst time + idle time). I know idle time means when the cpu are not busy but not sure how to really calculate it the processes. If anyone can walk me through it, it is greatly appreciated
A 2 6
B 3 1
C 5 9
D 6 7
E 7 10
Well,The formula is correct but in order to know the total-time you need to know the idle-time of CPU and you know when your CPU becomes idle? During the context-swtich it becomes idlt and it depends on short-term-scheduler how much time it take to assign the next proccess to CPU.
In 10-100 milliseconds of time quantua , context swtich time is arround 10 microseconds which is very small factor , now you can guess the context-switch time with time quantum of 1 millisecond. It will be ignoreable but it also results in too many context-switches.
I have designed an algorithm-SHA3 algorithm in 2 ways - combinational
and sequential.
The sequential design that is with clock when synthesized giving design summary as
Minimum clock period 1.275 ns and Maximum frequency 784.129 MHz.
While the combinational one which is designed without clock and has been put between input and output registers is giving synthesis report as
Minimum clock period 1701.691 ns and Maximum frequency 0.588 MHz.
so i want to ask is it correct that combinational will have lesser frequency than sequential?
As far as theory is concerned combinational design should be faster than sequential. But the simulation results I m getting for sequential is after 30 clock cycles where as combinational there is no delay in the output as there is no clock. In this way combinational is faster as we are getting instant output but why frequency of operation of combinational one is lesser than sequential one. Why this design is slow can any one explain please?
The design has been simulated in Xilinx ISE
Now I have applied pipe-lining to the combinational logic by inserting the registers in between the 5 main blocks which are doing the computation. And these registers are controlled by clock so now this pipelined design is giving design summary as
clock period 1.575 ns and freq 634.924 MHz
Min period 1.718 ns and freq 581.937.
So now this 1.575 ns is the delay between any of the 2 registers , its not the propagation delay of entire algorithm so how can i calculate propagation delay of entire pipelined algorithm.
What you are seeing is pipelining and its performance benefits. The combinational circuit will cause each input to go through the propagation delays of the entire algorithm, which will take at up to 1701.691ns on the FPGA you are working with, because the slowest critical path in the combinational circuitry needed to calculate the result will take up to that long. Your simulator is not telling you everything, since a behavioral simulation will not show gate propagation delays. You'll just see the instant calculation of your combinational function in your simulation.
In the sequential design, you have multiple smaller steps, the slowest of which takes 1.275ns in the worst case. Each of those steps might be easier to place-and-route efficiently, meaning that you get overall better performance because of the improved routing of each step. However, you will need to wait 30 cycles for a result, simply because the steps are part of a synchronous pipeline. With the correct design, you could improve this and get one output per clock cycle, with a 30-cycle delay, by having a full pipeline and passing data through it at every clock cycle.
I need to divide the 50 MHz clock to 1.5 Mhz with 50% duty cycle using VHDL. For that I wanted to use a counter to count the number of 50 Mhz clock pulses until half of the 1.5 Mhz clock period i.e 16.6666 which is in decimal which I don't know how to implement in the code. Can any one please help me with it?
Thank you very much.
I'm not an expert in VHDL but I don't think you can get exactly to 1.5 MHz with synthesizable code. You could maybe get an average of 1.5 MHz if you vary how many clock cycles you wait on the 50 MHz clock until you raise the 1.5 MHz one.
If you only want code that can be simulated then of course you can just use arbitrary delays.
The approach you are suggesting will result in an approximation, which may of course be ok. If not, I would suggest you use instantiate a PLL or clock manager specific to the target technology.