Comparing complexity of Binary Indexed Tree operations with normal approach - algorithm

I was going through this article to understand BIT : https://www.hackerearth.com/practice/notes/binary-indexed-tree-or-fenwick-tree/#c217533
In this the author says the following at one place:
If we look at the for loop in update() operation, we can see that the loop runs at most the number of bits in index x which is restricted to be less or equal to n (the size of the given array), so we can say that the update operation takes at most O(log2(n)) time
Then my question is that, if it can go upto n (the size of the given array), then how is the time complexity any different from the normal approach he has mentioned at the starting because in that update should be O(1) ? and prefixsum(int k) can go upto max n.

The key is that you don't do a step of 1 in the loop, but a step of the size x&-x.
This is equivalent to going upwards in the tree, to the next relevant node that needs to include the current one and thus gives you a worst case of O(log n).

Related

Finding a node in singly linked list time complexity

I have this question in my DSA Course Mid-term test:
Consider a Single Linked List contains N nodes (N > 8), a method f1() is designed to
find the 8th node from beginning, and method f2() is designed to find the 8th node from end.
Which is the time complexity of f1() and f2()?
Select one:
a. O(N) and O(N)
b. O(1) and O(1)
c. O(1) and O(N)
d. O(N) and O(1)
The correct answer given is c. O(1) and O(N). However I think that the correct answer is a. I know if N = 8 it will take O(1) time to find the 8th node from the beginning (just return the tail node) but in this case N > 8. Could any one explain this for me please?
Thank you in advance for any help you can provide.
O(1) implies constant running time. In other words, it doesn't depend on the input size.
When you apply that definition here, you can see that fetching the 8th element is always a constant operation irrespective of the input size. This is because, irrespective of the size of the input (ex:10,100,100..), the operation get(8) will always take the same time. Also, since we know for sure that n > 8, there's no chance that trying to fetch the 8th element will result in going beyond the size of the input.

How can the worst case for an algorithm have different bounds?

I've been trying to figure this out all day. Some other threads address this, but I really don't understand the answers. There are also many answers that contradict one another.
I understand that an algorithm will never take longer than the upper bound and never be faster than the lower bound. However, I didn't know an upper bound existed for best case time and a lower bound existed for worst case time. This question really threw me in a loop. I can't wrap my head around this... a given run time can have a different upper and lower bound?
For example, if someone asked: "Show that the worst-case running time of some algorithm on a heap of size n is Big Omega(lg(n))". How do you possibly get a lower bound, any bound for that matter, when given a run time?
So, in summation, an algorithm's worst case upper bound can be different than its worst case lower bound? How can this be? Once given the case, don't bounds become irrelevant? Trying to independent study algorithms and I really need to wrap my head around this first.
The meat of my accepted answer to that question is a function whose running time oscillates between n^2 and n^3 depending on whether n is odd. The point that I was trying to make is that sometimes bounds of the form O(n^k) and Omega(n^k) aren't sufficiently descriptive, even though the worst case running time is a perfectly well defined function (which, like all functions, is its own best lower and upper bound). This happens with more natural functions like n log n, which is Omega(n^k) but not O(n^k) for k ≤ 1, and O(n^k) but not Omega(n^k) for k > 1 (and hence not Theta(n^k) regardless of how we choose a constant k).
Suppose you write a program like this to find the smallest prime factor of an integer:
function lpf(n):
for i = 2 to n
if n%i == 0 then return i
If you run the function on the number 10^11 + 3, it will take 10^11 + 2 steps. If you run it on the number 10^11 + 4 it will take just one step. So the function's best-case time is O(1) steps and its worst-case time is O(n) steps.
Big O notation, describes efficiency in runtime iterations, generally based on size of an input data set.
The notation is written in its simplest form, ignoring multiples or additives, but keeping exponential form. If you have an operation of O(1) it is executed in constant time, no matter the input data.
However if you have something such as O(N) or O(log(N)), they will execute at different rates depending on input data.
The high and low bounds describe the largest and least iterations, respectively, that an algorithm can take.
Example: O(N), high bound is largest input data and low bound is smallest.
Extra sources:
Big O Cheat Sheet and MIT Lecture Notes
UPDATE:
Looking at the Stack Overflow question mentioned above, that algorithm is broken into three parts, where it has 3 possible types of runtime, depending on data. Now really, this is three different algorithms designed to handle for different data values. An algorithm is generally classified with just one notation of efficiency and that is of the notation taking the least time for ALL possible values of N.
In the case of O(N^2), larger data will take exponentially longer, and having a smaller number will proceed quickly. The algorithm determines how quickly a data set will be run, yet bounds are given depending on the range of data the algorithm is designed to handle.
I will try to explain it in the quicksort algorithm.
In quicksort you have an array and choose an element as pivot. The next step is to partition the input array into two arrays. The first one will contain elements < pivot and the second one elements > pivot.
Now assume you will apply quicksort on an already sorted list and the pivot element will always be the last element of the array. The result of partition will be an array of size n-1 and an array oft size 1 (the pivot element). This will result in a runtime of O(n*n). Now assume that the pivot element will always split the array in two equal sized array. In every step the array size will be cut in halves. This will result in O(n log n). I hope this example will make this a bit clearer for you.
Another well known sort algorithm is mergesort. Mergesort has always runtime of O(n log n). In mergesort you will cut the array down until only one element is left und will climb up the call stack to merge the one sized arrays and after that merge the array of size two and so on.
Let's say you implement a set using an array. To insert a element you simply put in the next available bucket. If there is no available bucket you increase the capacity of the array by a value m.
For the insert algorithm "there is no enough space" is the worse case.
insert (S, e)
if size(S) >= capacity(S)
reserve(S, size(S) + m)
put(S,e)
Assume we never delete elements. By keeping track of the last available position, put, size and capacity are Θ(1) in space and memory.
What about reserve? If it is implemented like [realloc in C][1], in the best case you just allocate new memory at the end of the existing memory (best case for reserve), or you have to move all existing elements as well (worse case for reserve).
The worst case lower bound for insert is the best case of
reserve(), which is linear in m if we dont nitpick. insert in
worst case is Ω(m) in space and time.
The worst case upper bound for insert is the worse case of
reserve(), which is linear in m+n. insert in worst case is
O(m+n) in space and time.

Range query for a semigroup operator (union)

I'm looking to implement an algorithm, which is given an array of integers and a list of ranges (intervals) in that array, returns the number of distinct elements in each interval. That is, given the array A and a range [i,j] returns the size of the set {A[i],A[i+1],...,A[j]}.
Obviously, the naive approach (iterate from i to j and count ignoring duplicates) is too slow. Range-Sum seems inapplicable, since A U B - B isn't always equal to B.
I've looked up Range Queries in Wikipedia, and it hints that Yao (in '82) showed an algorithm that does this for semigroup operators (which union seems to be) with linear preprocessing time and space and almost constant query time. The article, unfortunately, is not available freely.
Edit: it appears this exact problem is available at http://www.spoj.com/problems/DQUERY/
There's rather simple algorithm which uses O(N log N) time and space for preprocessing and O(log N) time per query. At first, create a persistent segment tree for answering range sum query(initially, it should contain zeroes at all the positions). Then iterate through all the elements of the given array and store the latest position of each number. At each iteration create a new version of the persistent segment tree putting 1 to the latest position of each element(at each iteration the position of only one element can be updated, so only one position's value in segment tree changes so update can be done in O(log N)). To answer a query (l, r) You just need to find sum on (l, r) segment for the version of the tree which was created when iterating through the r's element of the initial array.
Hope this algorithm is fast enough.
Upd. There's a little mistake in my explanation: at each step, at most two positions' values in the segment tree might change(because it's necessary to put 0 to a previous latest position of a number if it's updated). However, it doesn't change the complexity.
You can answer any of your queries in constant time by performing a quadratic-time precomputation:
For every i from 0 to n-1
S <- new empty set backed by hashtable;
C <- 0;
For every j from i to n-1
If A[j] does not belong to S, increment C and add A[j] to S.
Stock C as the answer for the query associated to interval i..j.
This algorithm takes quadratic time since for each interval we perform a bounded number of operations, each one taking constant time (note that the set S is backed by a hashtable), and there's a quadratic number of intervals.
If you don't have additional information about the queries (total number of queries, distribution of intervals), you cannot do essentially better, since the total number of intervals is already quadratic.
You can trade off the quadratic precomputation by n linear on-the-fly computations: after receiving a query of the form A[i..j], precompute (in O(n) time) the answer for all intervals A[i..k], k>=i. This will guarantee that the amortized complexity will remain quadratic, and you will not be forced to perform the complete quadratic precomputation at the beginning.
Note that the obvious algorithm (the one you call obvious in the statement) is cubic, since you scan every interval completely.
Here is another approach which might be quite closely related to the segment tree. Think of the elements of the array as leaves of a full binary tree. If there are 2^n elements in the array there are n levels of that full tree. At each internal node of the tree store the union of the points that lie in the leaves beneath it. Each number in the array needs to appear once in each level (less if there are duplicates). So the cost in space is a factor of log n.
Consider a range A..B of length K. You can work out the union of points in this range by forming the union of sets associated with leaves and nodes, picking nodes as high up the tree as possible, as long as the subtree beneath those nodes is entirely contained in the range. If you step along the range picking subtrees that are as big as possible you will find that the size of the subtrees first increases and then decreases, and the number of subtrees required grows only with the logarithm of the size of the range - at the beginning if you could only take a subtree of size 2^k it will end on a boundary divisible by 2^(k+1) and you will have the chance of a subtree of size at least 2^(k+1) as the next step if your range is big enough.
So the number of semigroup operations required to answer a query is O(log n) - but note that the semigroup operations may be expensive as you may be forming the union of two large sets.

Finding a specific ratio in an unsorted array. Time complexity

This is a homework assignment.
The goal is to present an algorithm in pseudocode that will search an array of numbers (doesn't specify if integers or >0) and check if the ratio of any two numbers equals a given x. Time complexity must be under O(nlogn).
My idea was to mergesort the array (O(nlogn) time) and then if |x| > 1 start checking for every number in desending order (using a binary traversal algorithm). The check should also take O(logn) time for each number, with a worst case of n checks gives a total of O(nlogn). If I am not missing anything this should give us a worst case of O(nlogn) + O(nlogn) = O(nlogn), within the parameters of the assignment.
I realize that it doesn't really matter where I start checking the ratios after sorting, but the time cost is amortized by 1/2).
Is my logic correct? Is there a faster algorithm?
An example in case it isn't clear:
Given an array { 4, 9, 2, 1, 8, 6 }
If we want to seach for a ratio of 2:
Mergesort { 9, 8, 6, 4, 2, 1 }
Since the given ratio is >1 we will search from left to right.
2a. First number is 9. Checking 9 / 4 > 2. Checking 9/6 < 2 Next Number.
2b. Second number is 8. Checking 8 / 4 = 2. DONE
The analysis you have presented is correct and is a perfectly good way to solve this problem. Sorting does work in time O(n log n), and 2n binary searches also takes O(n log n) time. That said, I don't think you want to use the term "amortized" here, since that refers to a different type of analysis.
As a hint for how to speed up your solution a bit, the general idea of your solution is to make it possible to efficiently query, for any number, whether that number exists in the array. That way, you can just loop over all numbers and look for anything that would make the ratio work. However, if you use an auxiliary data structure outside the array that supports fast access, you can possibly whittle down your runtime at the cost of increasing the memory usage. Try thinking about what data structures support very fast access (say, O(1) lookups) and see if you can use any of them here.
Hope this helps!
to solve this problem, only O(nlgn) is enough
step 1, sort the array. that cost O(nlgn)
step 2, check whether the ratio exists, this step only needs o(n)
u just need two pointers, one points to the first element(smallest one), another points to the last element(biggest one).
calculate the ratio.
if the ratio is bigger than the specified one, move the second pointer to its previous element.
if the ratio is smaller than the specified one, move the first pointer to its next element.
repeat the above steps until:
u find the exact ratio, or
either the first pointer reaches the end, or the second point reaches the beginning
The complexity of your algorithm is O(n²), because after sorting the array, you iterate over each element (up to n times) and in each iteration you execute up to n - 1 divisions.
Instead, after sorting the array, iterate over each element, and in each iteration divide the element by the ratio, then see if the result is contained in the array:
division: O(1)
search in sorted list: O(log n)
repeat for each element: n times
Results in time complexity O(n log n)
In your example:
9/2 = 4.5 (not found)
8/2 = 4 (found)
(1) Build a hashmap of this array. Time Cost: O(n)
(2) For every element a[i], search a[i]*x in HashMap. Time Cost: O(n).
Total Cost: O(n)

Finding the max and min of a BIT in linear or sub-linear time

I have to perform a series of range updations on an array, i.e., adding or subtracting some constant to and from a range. After that I have to find the RANGE of the final array, i.e., (max-min). Initially the numbers are 1 to n.
I'm using Binary Indexed Tree. Each update is in log N. I want to know if there is a way to find thus RANGE (or max and min) in O(n) or less time. Conventionally, it takes O(n log n).
You need direct indexed access to the array elements since you need to address them for doing the incremental updates.
You also need to maintain a min-heap and max-heap.
When you update an element, you also need to update the corresponding entries in the two heaps. So you need to store the pointers into corresponding elements in the two heaps in the array.
Creating the original heap is O(n) and any modifications are O(lg(N)).
Why not just sort the array once? Then adding or subtracting a constant from the whole array still gives the same ordering, as does multiplying by a positive number. Maybe there's more to the picture though.
This question is almost 2 years old, hence I am not sure if this answer is going to help much. Anyway...
I have never used BIT to answer minimum or maximum queries. And here there are range queries, which change a lot of numbers all at once. So the maximums and minimums also get updated. As far as I know, I have never seen BITs to be used in queries other than point query, range sum search, etc.
In general, segment trees provide better option for searching for minimum and maximum values. After performing all updates, you can find those in O(lg n) time. However, during updates, you must update the min max values for each node, which can be done using Lazy Propagation. The update cost is O(lg n).
To sum up, if m lg n < n for your application, you can go with Segment tree, albeit with more space.

Resources