I'm really confused about this, how to compute general processing time based on its complexity?
the question is:
Let the algorithms A of complexity 0(n^1.5) and B of complexity 0(nlogn) process a list of 100 records for TA(100) = 1 and TB(100) = 20 microseconds, respectively. Find their processing time, TA(n) and TB(n), for n records and decide which of them will process faster a list of n = 100,000,000 records.
Anyone keen to help??
Firstly we can deal with A, It is similar to a quadratic and we must find a constant value of C with the initial conditions we are given; 1 microsecond (1e^-6 seconds, and 100 records.
Therefore
1µ = C * 100^1.5
1µ = 1e^-3 * 100^1.5
C = 1e^-3
We can then substitute in the values of 100,000,000 records and the value we found for c.
This results in a time of 1,000,000,000µs or 1000 seconds.
Try and do this process with the second algorithm in order to tell which one computes the records faster.
Related
I am trying to understand how to compute the constant, c, when given the data. Before showing the data, I will inform you that I have already graphed the data with a linear trend on Excel. I am still quite baffled as to what I should use to calculate c.
Key question: How do you find some c that makes O(g(n)) true?
Expecting that you do not need to find T(n). The graphs you create should be sufficient.
Data for HeapSort:
1 0
5 0
10 0
50 0
100 0
500 0
1000 0
5000 0
10,000 0.01
50,000 0.04
100,000 0.1
500,000 0.484
1,000,000 1.346
5,000,000 6.596667
10,000,000 14.854
Generally, this sort of problem is solved by fitting the data to an expected function (such as t = cn + b, or t = cnlogn + b) using a least-squares method. Assuming that the "c" you are requesting is the constant factor in front of the main term of your runtime, you will get c with that method.
The value of c will of course be dependent on the particular code that is running and the particular machine on which it is running.
I have to write a small program to implement the following algorithm:
Assume you have a search algorithm which, at each level of recursion, excludes half of the data from consideration when searching for a specific data item. Search stops only when one data item is left. How many levels of recursion are required when the number of elements in the data is 1024?
Do anybody has idea about how to analyze or any suggestion on how to start ?
You need to find the minimal value of d such that:
1 * 2 * 2 * 2 * .... * 2 = 1024
____________________
total of d times
The above is true, because each multiplication by 2 is actually one level up in the recursion, you go up from the stop clause of 1 element, until you get the initial data size, which is 1024.
The above equation is actually 2^d = 1024
And it is solved easily with extracting log_2 from both sides:
log_2(2^d) = log^2(1024)
d = 10
P.S. Note that the above is the number of recursive calls, exclusive of the initial call, so total number of calls to the method is d+1=11, one from the calling environment, and 10 from the method itself.
Intv Q:
In a client-server architecture, there are multiple requests from multiple clients to the server. The server should maintain the response times of all the requests in the previous hour. What data structure and algo will be used for this? Also, the average response time needs to be maintained and has to be retrieved in O(1).
My take:
algo: maintain a running mean
mean = mean_prev *n + current_response_time
-------------------------------
n+1
DS: a set (using order statistic tree).
My question is whether there is a better answer. I felt that my answer is very trivial and the answer to the questions(in the interview) before this one and after this one where non trivial.
EDIT:
Based on what amit suggested:
cleanup()
while(queue.front().timestamp-curr_time > 1hr)
(timestamp,val)=queue.pop();
sum=sum-val
n=n-1;
insert(timestamp,value)
queue.push(timestamp,value);
sum=sum+val
n=n+1;
cleanup();
query_average()
cleanup();
return sum/n;
And if we can ensure that cleanup() is triggered once every hour or half an hour, then query_average() will not take very long. But if someone were to implement timer trigger for a function call, how would they do it?
The problem with your solution is it only takes the total average since the beginning of time, and not for the last one hour, as you supposed to.
To do so, you need to maintain 2 variables and a queue of entries (timestamp,value).
The 2 variables will be n (the number of elements that are relevant to the last hours) and sum - the sum of the elements from the last hour.
When a new element arrives:
queue.add(timestamp,value)
sum = sum + value
n = n+1
When you have a query for average:
while (queue.front().timestamp > currentTimeAtamp() - 1 hour):
(timestamp,value) = queue.pop()
sum = sum - value
n = n-1
return sum/n
Note that the above is still O(1) on average, because for every insertion to the queue - you do exactly one deletion. You might add the above loop to the insertion procedure as well.
I was surprised to find the following difference cost between running the MATLAB for loops:
ksize = 100;
klist = 1:ksize;
tic
for m = 1:100000
for k = 1:ksize
end
end
toc
tic
for m = 1:100000
for k = klist
end
end
toc
The only difference being the way the index list is created. I would have suspected the second version to be faster, but lo!
Elapsed time is 0.055400 seconds.
Elapsed time is 1.695904 seconds.
My question is twofold: what is responsible for the above result, and where else does this nuance (or similar ones) occur in MATLAB programming? I hope to be able to better spot these inefficiencies in the future. Thanks all.
The documentation in for() states:
for index = values
...
end
where values has one of the following forms:
...
valArray: creates a column vector index from subsequent columns of array valArray on each iteration. For example, on the first iteration, index = valArray(:,1). The loop executes for a maximum of n times, where n is the number of columns of valArray, given by numel(valArray, 1, :). The input valArray can be of any MATLAB data type, including a string, cell array, or struct.
Therefore, I assume there is a significant overhead and the compiler does not check whether 1:ksize == klist to exploit the faster implementation. In other words, per Eitan's comment, the JIT applies to the first two types of accepted values.
The whole problem is related to the following indexing task (column vs element):
tic
for m = 1:100000
for k = 1:ksize
klist(:,k);
end
end
toc
tic
for m = 1:100000
for k = 1:ksize
klist(k);
end
end
toc
Index column: ~2.9 sec
Index element: ~0.28 sec
You can see how klist(:,k) effectively slows down the faster loop indicating that the issue in for k = klist is related to the column indexing used in this case.
For additional details see this lengthy discussion on (inefficient) indexing.
My answer is speculation (because only Mathworks guys know the implementation of their product), but I think the first k loop is optimized to not create the actual array of indices, but to just scan them one by one, because it explicitely shows how the values are "built". The second k loop cannot be optimized, because the interpreter doesn't know beforehand if the content of the index array will grow uniformly. So, each time the loop starts, it will copy access the original klist and that's why you have the performance penalty.
Later edit: Another performance penalty might be from indexed access int the klist array, compared to creating the index values "on the fly."
Few months ago I had asked a question on an "Algorithm to find factors for primes in linear time" in StackOverflow.
In the replies i was clear that my assumptions where wrong and the Algorithm cannot find factors in linear time.
However I would like to know if the algorithm is an unique way to do division and find factors; that is do any similar/same way to do division is known? I am posting the algorithm here again:
Input: A Number (whose factors is to be found)
Output: The two factor of the Number. If the one of the factor found is 1 then it can be concluded that the
Number is prime.
Integer N, mL, mR, r;
Integer temp1; // used for temporary data storage
mR = mL = square root of (N);
/*Check if perfect square*/
temp1 = mL * mR;
if temp1 equals N then
{
r = 0; //answer is found
End;
}
mR = N/mL; (have the value of mL less than mR)
r = N%mL;
while r not equals 0 do
{
mL = mL-1;
r = r+ mR;
temp1 = r/mL;
mR = mR + temp1;
r = r%mL;
}
End; //mR and mL has answer
Let me know your inputs/ The question is purely out of personal interest to know if a similar algorithm exists to do division and find factors, which I am not able to find.
I understand and appreciate thay you may require to understand my funny algorithm to give answers! :)
Further explanation:
Yes, it does work on numbers above 10 (which i tested) and all positive integers.
The algorithm depends on remainder r to proceed further.I basically formed the idea that for a number, its factors gives us the sides of the
rectangles whose area is the number itself. For all other numbers which are not factors there would be a
remainder left, or consequently the rectangle cannot be formed in complete.
Thus idea is for each decrease of mL, we can increase r = mR+r (basically shifting one mR from mRmL to r) and then this large r is divided by mL to see how much we can increase mR (how many times we can increase mR for one decrease of mL). Thus remaining r is r mod mL.
I have calculated the number of while loop it takes to find the factors and it comes below or equal 5*N for all numbers. Trial division will take more.*
Thanks for your time, Harish
The main loop is equivalent to the following C code:
mR = mL = sqrt(N);
...
mR = N/mL; // have the value of mL less than mR
r = N%mL;
while (r) {
mL = mL-1;
r += mR;
mR = mR + r/mL;
r = r%mL;
}
Note that after each r += mR statement, the value of r is r%(mL-1)+mR. Since r%(mL-1) < mL, the value of r/mL in the next statement is either mR/mL or 1 + mR/mL. I agree (as a result of numerical testing) that it works out that mR*mL = N when you come out of the loop, but I don't understand why. If you know why, you should explain why, if you want your method to be taken seriously.
In terms of efficiency, your method uses the same number of loops as Fermat factorization although the inner loop of Fermat factorization can be written without using any divides, where your method uses two division operations (r/mL and r%mL) inside its inner loop. In the worst case for both methods, the inner loop runs about sqrt(N) times.
There are others, for example Pollard's rho algorithm, and GNFS which you were already told about in the previous question.