An engineer is designing a two-lane rural roadway
that has a traffic growth rate of 4%. The design
period is 10 years and the annual equivalent 80 kN
single axle load (ESALs) is 8,000 (mostly cars). It is
known that 6 years later, suburban sprawl will cause
the traffic growth rate to increase to 12%. What is
the total ESALs for the design period?
Answer with explanation.
Related
In a particular scenario I found that a code has taken 20 CPU Years and 4 real Months time. My goal is to approximate the amount of processing power utilized considering the fact that all the processors were on 100% usage all the time. So, my approach is as follows,
20 CPU Years = 20 * 365 * 24 CPU Hours = 175,200 CPU Hours.
Now, 1 CPU Year means 1 GFLOP machine working for 1 real Hour. Which means, in this case, the work done is, 1 GFLOP machine working for 175,200 real Hours. But in reality it took 4 * 30 * 24 = 2,880 real hours. So, approximately 175,200/2,880 =(approx.) 61 GLFOP machine.
My question is am I doing the approximation correctly or misunderstanding some particular term as per the calculations given above ? Or I am mixing GFLOPS and GFLOP together ?
Definitions
My question is am I doing the approximation correctly or misunderstanding some particular term as per the calculations given above ?
"100% usage" may mean the CPU spent 20% of its time doing nothing waiting for data to be transferred to/from RAM (and/or branch mispredictions or other stalls), 10% of its time running faster than normal because other CPUs where actually doing nothing, and 15% of its time running slower than normal for power/temperature management reasons; and (depending on where you got that "100% usage" statistic) "100% usage" may be significantly more confusing (e.g. http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html ).
Depending on context; GFLOPS is either "theoretical maximum under perfect conditions that will never occur in practice" (worthless marketing hype); or a direct measurement of a specific case that ignores most of the work a CPU did (everything involving integers, all control flow, all data transfer, all memory management, ...)
In a particular scenario I found that a code has taken 20 CPU Years and 4 real Months time. My goal is to approximate the amount of processing power utilized.
From this; you might (or might not) be able to say "most of the work that CPUs did was discarded due to lockless algorithm retries and/or transactions that couldn't be committed; and (partly because the bottleneck was RAM bandwidth and partly because of the way SMT works on this system) it would have been 4 times as fast if half as many CPUs were used."
TL;DR: Approximating processor power is just an inconvenient way to obfuscate the (more useful) information that you started with (e.g. that a specific piece of code running on a specific piece of hardware that was working on a specific piece of data happened to take 4 months of real time).
Your Calculation:
Yes; you're mixing GFLOP and GFLOPS (e.g. GFLOPS = GFLOP per second; and a "1 GFLOP machine" is a computer that can do a billion floating point operations in an infinite amount of time, which is every computer), and the web page you linked to is making the same mistake (e.g. saying "a 1 GFLOP reference machine" when it should be saying "a 1 GFLOPS reference machine").
Note that there's no need to care about GFLOPS or GFLOP for the calculation you're doing: If something was supposed to take 20 "reference CPU years" and actually took 4 months (or 4/12 years); then you'd say that your hardware is equivalent to "20 / (4/12) = 60 reference CPUs". Of course this is horribly silly and it'd make more sense to say that your hardware happened to achieve 60 GFLOPS without bothering with the misleading "reference CPU" nonsense.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
The goal is to determine metrics of an UDP protocol performance, specifically:
Minimal possible Theoretical RTT (round-trip time, ping)
Maximal possible Theoretical PPS of 1-byte-sized UDP Packets
Maximal possible Theoretical PPS of 64-byte-sized UDP Packets
Maximal and minimal possible theoretical jitter
This could and should be done without taking in account any slow software-caused issues(like 99% cpu usage by side process, inefficiently-written test program), or hardware (like busy channel, extremely long line, so on)
How should I go with estimating these best-possible parameters on a "real system"?
PS. I would offer a prototype, of what I call "a real system".
Consider 2 PCs, PC1 and PC2. They both are equipped with:
modern fast processors(read "some average typical socket-1151 i7 CPU"), so processing speed and single-coreness are not an issues.
some typical DDR4 #2400mhz..
average NICs (read typical Realteks/Intels/Atheroses, typically embedded in mobos), so there is no very special complicated circuitry.
a couple meters of ethernet 8 pair cable that connects their NICs, having established GBIT connection. So no internet, no traffic between them, other that generated by you.
no monitors
no any other I/O devices
single USB flash per PC, that booted their initramfs to the RAM, and used to mount and store program output after test program finishes
lightest possible software stack - There is probably busy box, running on top of latest Linux kernel, all libs are up-to-date. So virtually no software(read "busyware") runs on them.
And you run a server test program on PC1, and a client - on PC2. After program runs, USB stick is mounted and results are dumped to file, and system powers down then. So, I've described some ideal situation. I can't imagine more "sterile" conditions for such an experiment..
For the PPS calculations take the total size of the frames and divide it into the Throughput of the medium.
For IPv4:
Ethernet Preamble and start of frame and the interframe gap 7 + 1 + 12 = 20 bytes.(not counted in the 64 byte minimum frame size)
Ethernet II Header and FCS(CRC) 14 + 4 = 18 bytes.
IP Header 20 bytes.
UDP Header 8 bytes.
Total overhead 46 bytes(padded to min 64 if payload is less than ) + 20 bytes "more on the wire"
Payload(Data)
1 byte payload - becomes 18 based on 64 byte minimum + wire overhead. Totaling 84 bytes on the wire.
64 byte - 48 + 64 = 112 + 20 for the wire overhead = 132 bytes.
If the throughput of the medium is 125000000 bytes per second(1 Gb/s).
1-18 bytes of payload = 1.25e8 / 84 = max theoretical 1,488,095 PPS.
64 bytes payload = 1.25e8 / 132 = max theoretical 946,969 PPS.
These calculations assume a constant stream: The network send buffers are filled constantly. This is not an issue given your modern hardware description. If this were 40/100 Gig Ethernet CPU, bus speeds and memory would all be factors.
Ping RTT time:
To calculate the time it takes to transfer data through a medium divide the data transferred by the speed of the medium.
This is harder since the ping data payload could be any size 64 - MTU(~1500 bytes). ping typically uses the min frame size (64 bytes total frame size + 20 bytes wire overhead * 2 = 168 bytes) Network time(0.001344 ms) + Process response and reply time combined estimated between 0.35 and 0.9 ms. This value depends on too many internal CPU and OS factors, L1-3 caching, branch predictions, ring transitions (0 to 3 and 3 to 0) required, TCP/IP stack implemented, CRC calculations, interrupts processed, network card drivers, DMA, validation of data(skipped by most implementations)...
Max time should be < 1.25 ms based on anecdotal evidence.(My best eval was 0.6ms on older hardware(I would expect a consistent average of 0.7 ms or less on the hardware as described)).
Jitter:
The only inherent theoretical reason for network jitter is the asynchronous nature of transport which is resolved by the preamble. Max < (8 bytes)0.000512 ms. If sync is not established in this time the entire frame is lost. This is possibility that needs to be taken into account. Since UDP is best effort delivery.
As evidenced by the description of RTT: The possible variances in the CPU time in executing of identical code, as well as OS scheduling, and drivers makes this impossible to evaluate effectively.
If I had to estimate, I would design for a maximum of 1 ms jitter, with provisions for lost packets. It would be unwise to design a system intolerant of faults. Even for a "Perfect Scenario" as described faults will occur (a nearby lightening strike induces spurious voltages on the wire). UDP has no inherent method for tolerating lost packets.
From what I understand, to calculate CPI, it's the percentage of the type of instruction multiplied by the number of cycles right? Does the type of machine have any part of this calculation whatsoever?
I have a problem that asks me if a change should be recommended.
Machine 1: 40% R - 5 Cycles, 30% lw - 6 Cycles, 15% sw - 6 Cycles, 15% beq 3 - Cycles, on a 2.5 GHz machine
Machine 2: 40% R - 5 Cycles, 30% lw - 6 Cycles, 15% sw - 6 Cycles, 15% beq 4 - Cycles, on a 2.7 GHz machine
By my calculations, machine 1 has 5.15 CPI while machine 2 has 5.3 CPI. Is it okay to ignore the GHz of the machine and say that the change would not be a good idea or do I have to factor the machine in?
I think the point is to evaluate a design change that makes an instruction take more clocks, but allows you to raise the clock frequency. (i.e. leaning towards a speed-demon design like Pentium 4, instead of brainiac like Apple's A7/A8 ARM cores. http://www.lighterra.com/papers/modernmicroprocessors/)
So you need to calculate instructions per second to see which one will get more work done in the same amount of real time. i.e. (clock/sec) / (clocks/insn) = insn/sec, cancelling out the clocks from the units.
Your CPI calculation looks ok; I didn't check it, but yes a weighted average of the cycles according to the instruction mix.
These numbers are obviously super simplified; any CPU worth building at 2.5GHz would have some kind of branch prediction so the cost of a branch isn't just a 3 or 4 instruction bubble. And taking ~5 cycles per instruction on average is pathetic. (Most pipelined designs aim for at least 1 instruction per clock.)
Caches and superscalar CPUs also lead to complex interactions between instructions depending on whether they depend on earlier results or not.
But this is sort of like what you might do if considering increasing the L1d cache load-use latency by 1 cycle (for example), if that took it off the critical path and let you raise the clock frequency. Or vice versa, tightening up the latency or reducing the number of pipeline stages on something at the cost of reducing frequency.
Cycles per instruction a count of cycles. ghz doesnt matter as far as that average goes. But saying that we can see from your numbers that one instruction is more clocks but the processors are a different speed.
So while it takes more cycles to do the same job on the faster processor the speed of the processor DOES compensate for that so it seems clear this is a question about does the processor speed account for the extra clock?
5.15 cycles/instruction / 2.5 (giga) cycles/second, cycles cancels out you get
2.06 seconds/(giga) instruction or (nano) seconds/ instruction
5.30 / 2.7 = 1.96296 (nano) seconds / instruction
The faster one takes a slightly less amount of time so it will run the program faster.
Another way to see this to check the math.
For 100 clock cycles on the slower machine 15% of those are beq. So 15 of the 100 clocks, which is 5 beq instructions. The same 5 beq instructions take 20 clocks on the faster machine so 105 clocks total for the same instructions on the faster machine.
100 cycles at 2.5ghz vs 105 at 2.7ghz
we want the amount of time
hz is cycles / second we want seconds on the top
so we want
cycles / (cycles/second) to have cycles cancel out and have seconds on the top
1/2.5 = 0.400 (400 picoseconds)
1/2.7 = 0.370
0.400 * 100 = 40.00 units of time
0.370 * 105 = 38.85 units of time
So despite taking 5 more cycles the processor speed differences is fast enough to compensate.
2.7/2.5 = 1.08
105/100 = 1.05
so 2.5 * 1.05 = 2.625 so a processor 2.625ghz or faster would run that program faster.
Now what were the rules for changing computers, is less time defined as a reason to change computers? What is the definition of better? How much more power does the faster one consume it might take less time but the power consumption might not be linear so it may take more watts despite taking less time. I assume the question is not that detailed, meaning it is vague meaning it is a poorly written question on its own, so it goes to what the textbook or lecture defined as the threshold for change to the other processor.
Disclaimer, dont blame me if you miss this question on your homework/test.
Outside an academic exercise like this, the real world is full of pipelined processors (not all but most of the folks writing programs are writing programs for) and basically you cant put a number on clock cycles per instruction type in a way that you can do this calculation because of a laundry list of factors. Make sore you understand that, nice exercise, but that specific exercise is difficult and dangerous to attempt on real world processors. Dangerous in that as hard as you work you may be incorrectly measuring something and jumping to the wrong conclusions and as a result making bad recommendations. At the same time there is very much the reality that faster ghz does improve some percentage of the execution, but another percentage suffers, and is there a net gain or loss. Or a new processor design faster or slower may have features that perform better than an older processor, but not all feature will be better, there is a tradeoff and then we get into what "better" means.
I have been stuck on this for the past day. Im not sure how to calculate cpu utilization percentage for processes using round robin algorithm.
Let say we have these datas with time quantum of 1. Job Letter followed by arrival and burst time. How would i go about calculating the cpu utilization? I believe the formula is
total burst time / (total burst time + idle time). I know idle time means when the cpu are not busy but not sure how to really calculate it the processes. If anyone can walk me through it, it is greatly appreciated
A 2 6
B 3 1
C 5 9
D 6 7
E 7 10
Well,The formula is correct but in order to know the total-time you need to know the idle-time of CPU and you know when your CPU becomes idle? During the context-swtich it becomes idlt and it depends on short-term-scheduler how much time it take to assign the next proccess to CPU.
In 10-100 milliseconds of time quantua , context swtich time is arround 10 microseconds which is very small factor , now you can guess the context-switch time with time quantum of 1 millisecond. It will be ignoreable but it also results in too many context-switches.
My beacons have advertisement interval of 330ms. I use an iOS device to scan the advertisement packet whose scanning rate is 1 scan per second on average. I want to use the moving average filter to smooth the fluctuating RSSI values. Considering the walking speed of 1.2 m/s and the advertisement interval of 330 ms, what should be the size of a window in the moving average filter? Is there any mathematical relationship between them?
Thank you.
There is no one correct answer here. It is a trade-off between noise in the distance estimate and lag time.
The large (and longer) your statistical sample, the more lag time there will be in a running average. A 20 second window will tell you where you were on average over the last 20 seconds, and filter out a lot of noise. A 5 second running average will tell you where you were on average over the last 5 seconds, but with much more noise on the calculation.
How much lag you can tolerate and how much noise you can tolerate all depend on your use case. Use cases that are very time sensitive may sacrifice accuracy for the sake of less lag. Conversely use cases needing greater accuracy may accept more lag to filter out more noise on the estimate.