Why sometimes the mapreduce Average Reduce Time is a negative number? - hadoop

I run a mapreduce job on a hadoop cluster. The job's running time I saw in browser at master:8088 and master:19888 (job history server web UI) are shown below:
master:8088
master:19888
I have two questions:
Why are the elapsed times from two pictures different?
Why sometimes the Average Reduce Time is a negative number?

It looks like the Average Reduce Time is based on the times the previous tasks (shuffle/merge) took to finish and not necessarily the amount of time the reduce actually took to run.
Looking at this source code you can see the relevant calculations occurring around line 300.
if (attempt.getState() == TaskAttemptState.SUCCEEDED) {
numReduces++;
avgShuffleTime += (attempt.getShuffleFinishTime() - attempt.getLaunchTime());
avgMergeTime += attempt.getSortFinishTime() - attempt.getShuffleFinishTime();
avgReduceTime += (attempt.getFinishTime() - attempt.getSortFinishTime());
}
Followed by:
if (numReduces > 0) {
avgReduceTime = avgReduceTime / numReduces;
avgShuffleTime = avgShuffleTime / numReduces;
avgMergeTime = avgMergeTime / numReduces;
}
Looking at your numbers, they seem to be generally in-line with this approach to calculating the run times (everything converted to seconds):
Total Pre-reduce time = Map Run Time + Ave Shuffle + Ave Merge
143 = 43 + 83 + 17
Ave Reduce Time = Elapsed Time - Total Pre-reduce
-10 = 133 - 143
So looking at how long the Map, Shuffle and Merge took compared with the Elapsed we end up with a negative number close to your -8.

This is a partial answer, only for question 1!
I see a difference in "Submitted" and "Started" of 8 seconds in the second picture, while the time "Started" in the first picture is equal to the "Submitted" time of the second. I guess this covers the 8-second difference that you see as "Elapsed" time.
I am very curious for the second question as well, but it may not be a coincidence that it is also 8 seconds.

Related

Interview q: Data structure and algorithm for O(1) retrieval of avg. response time in client server architecture

Intv Q:
In a client-server architecture, there are multiple requests from multiple clients to the server. The server should maintain the response times of all the requests in the previous hour. What data structure and algo will be used for this? Also, the average response time needs to be maintained and has to be retrieved in O(1).
My take:
algo: maintain a running mean
mean = mean_prev *n + current_response_time
-------------------------------
n+1
DS: a set (using order statistic tree).
My question is whether there is a better answer. I felt that my answer is very trivial and the answer to the questions(in the interview) before this one and after this one where non trivial.
EDIT:
Based on what amit suggested:
cleanup()
while(queue.front().timestamp-curr_time > 1hr)
(timestamp,val)=queue.pop();
sum=sum-val
n=n-1;
insert(timestamp,value)
queue.push(timestamp,value);
sum=sum+val
n=n+1;
cleanup();
query_average()
cleanup();
return sum/n;
And if we can ensure that cleanup() is triggered once every hour or half an hour, then query_average() will not take very long. But if someone were to implement timer trigger for a function call, how would they do it?
The problem with your solution is it only takes the total average since the beginning of time, and not for the last one hour, as you supposed to.
To do so, you need to maintain 2 variables and a queue of entries (timestamp,value).
The 2 variables will be n (the number of elements that are relevant to the last hours) and sum - the sum of the elements from the last hour.
When a new element arrives:
queue.add(timestamp,value)
sum = sum + value
n = n+1
When you have a query for average:
while (queue.front().timestamp > currentTimeAtamp() - 1 hour):
(timestamp,value) = queue.pop()
sum = sum - value
n = n-1
return sum/n
Note that the above is still O(1) on average, because for every insertion to the queue - you do exactly one deletion. You might add the above loop to the insertion procedure as well.

2 billion increments in 2.3 Ghz cpu

This simple code snippet output is in the range of 11 - 13 milli seconds. Now assuming for sake of question, that an increment of x is just a single instruction, the 2.3Ghz cpu of mind should take rougly a second to execute, since value of intMAX is close of 2 billion. Why is the answer in order of few milli's (11 - 13 millis) rather than in order of seconds (900 millis - 1100 millis) ??
long time1 = System.currentTimeMillis();
int x = 0;
while (x < Integer.MAX_VALUE) {
x++;
}
System.out.println(System.currentTimeMillis() - time1);
In theory - this can be optimized down to x=Integer.MAX_VALUE and then if x is not used after, completely removed
it is getting harder and harder to test how long things take since they get optimized out, especially if unused
try setting x to different start values and using x after the timer in a print or other calculation that leads to a print

Compute arrival date given departure date and duration

I am trying to make a program to calculate when train comes for a to b.
I have time when train leaves and the time it takes to travel to distance from a to b.
I need help a algorithm to find if that train bypasses 24 hours.
Like I got these times: Train leaves at 20:55, train ride time - 11:40.
The result should be 8:35, but how could I get it?
program troleibusai;
var xxx:integer ;
f,g:text ;
a:real;
Begin
Assign(F,'train_times');
Reset(F);
Assign(G,'results.txt');
Rewrite(G);
Read(F,left_hour);
Read(F,left_minute);
Read(F,ride_hour);
Read(F,ride_minute);
Heres the code.
Have a look at the between functions in unit dateutils,
e.g. http://www.freepascal.org/docs-html/rtl/dateutils/minutesbetween.html
Calculate left_hour * 60 + left_minute + ride_hour * 60 + ride_minute
and get it div 60 and mod 60

Scala Futures are slow with many cores

For a study project I have written a Scala application that uses a bunch of futures to do a parallel computation. I noticed that on my local machine (4 cores) the code runs faster than on the many-core server of our computer science institute (64 cores). Now I want to know why this is.
Task in Detail
The task was to create random boolean k-CNF formulas with n different variables randomly distributed over m clauses and then see how at which m/n combination the probability that a formula is solvable drops below 50% for diffent random distributions. For this I have implemented a probabilistic k-SAT algorithm, a clause generator and some other code. The core is a function that takes n and m es well as the generator function, runs 100 futures and waits for the result. The function looks like this:
Code in question
def avgNonvalidClauses(n: Int, m: Int)(implicit clauseGenerator: ClauseGenerator) = {
val startTime = System.nanoTime
/** how man iteration to build the average **/
val TRIES = 100
// do TRIES iterations in parallel
val tasks = for (i <- 0 until TRIES) yield future[Option[Config]] {
val clause = clauseGenerator(m, n)
val solution = CNFSolver.probKSat(clause)
solution
}
/* wait for all threads to finish and collect the results. we will only wait
* at most TRIES * 100ms (note: flatten filters out all
* None's) */
val results = awaitAll(100 * TRIES, tasks: _*).asInstanceOf[List[Option[Option[Config]]]].flatten
val millis = Duration(System.nanoTime - startTime, NANOSECONDS).toMillis
val avg = (results count (_.isDefined)) / results.length.toFloat
println(s"n=$n, m=$m => $avg ($millis ms)")
avg
}
Problem
On my local machine I get these results
[info] Running Main
n=20, m=120 => 0.0 (8885 ms)
n=21, m=121 => 0.0 (9115 ms)
n=22, m=122 => 0.0 (8724 ms)
n=23, m=123 => 0.0 (8433 ms)
n=24, m=124 => 0.0 (8544 ms)
n=25, m=125 => 0.0 (8858 ms)
[success] Total time: 53 s, completed Jan 9, 2013 8:21:30 PM
On the 64-core server I get:
[info] Running Main
n=20, m=120 => 0.0 (43200 ms)
n=21, m=121 => 0.0 (38826 ms)
n=22, m=122 => 0.0 (38728 ms)
n=23, m=123 => 0.0 (32737 ms)
n=24, m=124 => 0.0 (41196 ms)
n=25, m=125 => 0.0 (42323 ms)
[success] Total time: 245 s, completed 09.01.2013 20:28:22
However, I the full load on both machines (the server averages around at a load of 60 to 65) so there are running enough threads. Why is this? Am I doing something completely wrong?
My local machine has an "AMD Phenom(tm) II X4 955 Processor" CPU the server is uses "AMD Opteron(TM) Processor 6272". The local CPU has 6800 bogomips, the servers 4200. So, while the local CPU is a 1/3 faster, there are 12 times more cors on the server.
Additional
If have a trimmed down example of my code pushed to github so you can try for yourselve if you are intereste: https://github.com/Blattlaus/algodemo (It's an sbt project using Scala 2.10).
Updates
I've eleminated any randomness by seeding the random number generators with 42. This changes nothing
I've changed the testset. Now the results are even more astonishing (the server is 5 times slower!) Note: all outputs for the average percentage of not solvable clauses are zeor because of the input. This is normal and expected.
Added info about CPUs
I've noticed that calls to Random.nextInt() are a factor of 10 slower on the Server. I have wrapped all calls in a helper that measures the runtime a prints is to the console if they are slower then 10ms. On my local machine i get a few, and the typically are araound 10-20ms. On the server I get much mure calls and they tend to be above 100ms. Could this be the issue???
You have already figured out the answer in that the problem is Random.nextInt() which uses a AtomicLong(). If this is being accessed frequently from different threads then you will get cache thrashing, which will be worse on your 64 core computer because the caches will be further apart (electrically) and hence it will take longer to get the necessary cache line locks.
See this stackoverflow answer for more details, and the solution on how to avoid this problem (which is basically to use a thread local random number generator): Contention in concurrent use of java.util.Random
Operations on denormalized floating numbers, could take an order of magnitude longer on x86 architecture. See:
Why does changing 0.1f to 0 slow down performance by 10x?
Haven't examined your code, but given that you return NaN that might be the case. Try removing randomness from your test to verify that hypothesis.

How to find maximum task running at x time?

Problem description is follows:
There are n events for particular day d having start time and duration. Example:
e1 10:15:06 11ms (ms = milli seconds)
e2 10:16:07 12ms
......
I need to find out the time x and n. Where x is the time when maximum events were getting executed.
Solution I am thinking is:
Scanning all ms in day d. But that request total 86400000*n calculation. Example
Check at 00::00::00::001 How many events are running
Check at 00::00::00::002 How many events are running
Take max of Range(00::00::00::01,00::00::00::00)
Second solution I am thinking is:
For eventi in all events
Set running_event=1
eventj in all events Where eventj!=eventi
if eventj.start_time in Range (eventi.start_time,eventi.execution_time)
running_event++
And then take max of running_event
Is there any better solution for this?
This can be solved in O(n log n) time:
Make an array of all events. This array is already partially sorted: O(n)
Sort the array: O(n log n); your library should be able to make use of the partial sortedness (timSort does that very well); look into distribution-based sorting algorithms for better expected running time.
Sort event boundaries ascending w.r.t. the boundary time
Sort event ends before sort starts if touching intervals are considered non-overlapping
(Sort event ends after sort starts if touching intervals are considered overlapping)
Initialise running = 0, running_best = 0, best_at = 0
For each event boundary:
If it's a start of an event, increment running
If running > running_best, set best_at = current event time
If it's an end of an event, decrement running
output best_at
You could reduce the number of points you check by checking only ends of all intervals, for each interval (task) I that lasts from t1 to t2, you only need to check how many tasks are running at t1 and at t2 (assuming the tasks runs from t1 to t2 inclusive, if it is exclusive, check for t1-EPSILON, t1+EPSILON, t2-EPSILON, T2+EPSILON.
It is easy to see (convince yourself why) that you cannot get anything better that these cases do not cover.
Example:
tasks run in `[0.5,1.5],[0,1.2],[1,3]`
candidates: 0,0.5,1,1.2,1.5,3
0 -> 1 tasks
0.5 -> 2 tasks
1 -> 3 tasks
1.2 -> 3 tasks (assuming inclusive, end of interval)
1.5 -> 2 tasks (assuming inclusive, end of interval)
3 -> 1 task (assuming inclusive, end of interval)

Resources