How to find the number of occurences of a number in a sorted Array in O(logn) - algorithm

I have a sorted Array of Integers. Given a number, how to find the number of occurences of that number in O(logn) even the worst case.

Do one binary search for the point where all elements to the left are smaller than your number and all to the right are equal or greater and one for the point where all elements are smaller or equal to the left and greater to the right.
In other words: in one search your "test" is <, while in the other search the test is <=.
Both searches are log n, so that's your total.
For example, in C++ the two functions you'd need are std::lower_bound and std::upper_bound - it seems like the existing Java binary search functions (on Collections) always try to look for a specific value, so you probably have to implement the search yourself if you're using that.
It's important that you use a binary search variant that uses only a binary predicate! If you use a variant that checks whether it hit a specific key "by accident" will sometimes terminate too soon for this specific task.

Binary search for the number, then binary search for the start and the end of the run.

Search for the insertion points of number + 0.5 and number - 0.5 using two binary searches. The number of elements with value number in the list is the difference of the positions (indices) of these two insertion points.

Run the function below once as it is and once with the if condition changed to if(a[mid]<=val) and the corresponding changes in the recursive calls. The two values returned are the starting and ending occurence of the particular value.
int binmin(int a[], int start, int end,int val )
{
if(start<end)
{
int mid = (start+end)/2;
if(a[mid]>=val)
{
binmin(a,start,mid-1,val);
}
else if(a[mid]<val)
{
binmin(a,mid+1,end,val);
}
}
else if(start>end)
return start;
}

Related

How to instruct longest palindrome from a list of numbers?

I am trying to solve a question which says that we need to write a function in which given a list of numbers, we need to find the longest palindrome that we can from given only the numbers in the list.
For eg:
If the given list is : [3,47,6,6,5,6,15,22,1,6,15]
The longest palindrome that we can return is one of length 9, such as [6,15,6,3,47,3,6,15,6].
Additionally, we have the following constraints:
One can only use an array queue, array stack, and a chaining hashmap, and the list we are supposed to return, and the function must run in linear time. And we can use only constant additional space.
My approach was the following:
Since a palindrome can be formed if have an even number of certain characters, we can iterate over all the elements in the list, and store in a chaining hash map, the number of times each number appears in the list. This should take O(N) time since each lookup in the chaining hash map takes constant time, and iterating over the list takes linear time.
Then we can iterate over all the numbers in the chaining hash map, to see which numbers appear an even number of times, and accordingly, just make a palindrome. In the worst case, this will take a O(n) linear time.
Now there are two things I am wondering:
How should I make the actual palindrome? Like how do I use the data structures that I am being allowed to use in order to make a palindrome? I am thinking that since the queue is a LIFO data structure, for each number that occurs an even number of times, we add it once to the queue and once to the stack, and so on and so forth. And finally, we can just dequeue everything from the queue, and pop once from the stack, and then add it to the list!
It seems that with my approach, it is taking me two linear runs to solve the question. I am wondering if there is a faster way to do this.
Any and all help will be appreciated. Thanks!
It is not possible to get a better algorithm than one that is O(n), as every number in the input has to be inspected, as it might provide a possibility for a longer palindrome. If indeed the output must be a longest palindrome itself (and not only its length), then producing that output itself represents O(n).
You have also omitted one additional thing you have to do in your algorithm: there can be one value in the final palindrome that occurs only once (in the centre). So whenever you encounter a value that occurs an odd number of times, you may reserve one occurrence of that value for putting in the middle of an odd-length palindrome. The even remainder of the occurrences can be used as usual.
As to your questions:
How should I make the actual palindrome?
There are many ways to do it. But don't forget that if you have an even number of occurrences, you should use all those occurrences, not just two. So add half of them to the queue and half of them to the stack. When the frequency is odd, then still do the same (rounded down), and log the number also as a potential centre value.
When you have done this for all values, then dump the queue and stack together in the result list as you suggested, but don't forget to put the centre value in between the two, if you identified such a centre value (i.e. when not all occurrences were even).
It seems that with my approach, it is taking me two linear runs to solve the question.
You cannot do this better than with a linear time complexity. You can save a bit of time if you use the stack also for the result, and just dump the queue unto the stack (after potentially pushing the centre value).
I've got a solution when its palindrome only for the number and not the digit.
for the input: [51,15]
we will return [15] || [51] and not [51,15] =>(5,1,1,5);
feature more your example as a problem 3 doesn't appear twice(and appears in the answer)
or maybe I didn't understand the question.
public static int[] polidrom(int [] numbers){
HashMap<Integer/*the numbere*/,Integer/*how many time appeared*/> hash = new HashMap<>();
boolean middleFree= false;
int middleNumber = 0;
int space = 0;
Stack<Integer> stack = new Stack<>();
for (Integer num:numbers) {//count how mant times each digit appears
if(hash.containsKey(num)){hash.replace(num,1+hash.get(num));}
else{hash.put(num,1);}
}
for (Integer num:hash.keySet()) {//how many times i can use pairs
int original =hash.get(num);
int insert = (int)Math.floor(hash.get(num)/2);
if(hash.get(num) % 2 !=0){middleNumber = num;middleFree = true;}//as no copy
hash.replace(num,insert);
if(insert != 0){
for(int i =0; i < original;i++){
stack.push(num);
}
}
}
space = stack.size();
if(space == numbers.length){ space--;};//all the numbers are been used
int [] answer = new int[space+1];//the middle dont have to have an pair
int startPointer =0 , endPointer= space;
while (!stack.isEmpty()){
int num = stack.pop();
answer[startPointer] = num;
answer[endPointer] = num;
startPointer++;
endPointer--;
}
if (middleFree){answer[answer.length/2] = middleNumber;}
return answer;
}
space O(n) => {stack , hashMap , answer Array};
complexity: O(n)
You can skip the part where I used the stack and build the answer array in the same loop.
and I can't think of a way where you will not iterate at least twice;
Hope I've helped

Algorithm to find string which is longest and contains lowest possible numbers?

I have to find string out of these which is longest and contains the lowest possible numbers.
10,20
10,20,30
10,30,40
30,40,50
Ans 10,20,30
I am trying to think of algorithm which can be used to find such record.
Record will not contain duplicate number for example e.g 10,10,20 won't be case.
Based on clarification of requirements:
split strings by delimiters and convert to lists of (unique) numbers
determine the max # of elements, M, in any list
get the set of lists with M elements
for each list, find the min element
pick the list with the lowest min
There is a general solution for problems where you need to find a string out of a collection of strings.
In case your data can be sorted then use a PriorityQueue.
From the docs:
An unbounded priority queue based on a priority heap. The elements of
the priority queue are ordered according to their natural ordering, or
by a Comparator
PriorityQueue<String> a = new PriorityQueue<>((o1, o2) -> {
int compare = Integer.compare(o1.length(), o2.length());
if(compare != 0){
return compare;
}
int result = compare numbers...
return result;
});
a.add("10,20");
a.add("10,20,30");
a.add("10,30,40");
a.add("30,40,50");
System.out.println(a.remove()); // 10,20,30
In case it's not, use any List and do the filtering one by one.

Binary search with gaps

Let's imagine two arrays like this:
[8,2,3,4,9,5,7]
[0,1,1,0,0,1,1]
How can I perform a binary search only in numbers with an 1 below it, ignoring the rest?
I know this can be in O(log n) comparisons, but my current method is slower because it has to go through all the 0s until it hits an 1.
If you hit a number with a 0 below, you need to scan in both directions for a number with a 1 below until you find it -- or the local search space is exhausted. As the scan for a 1 is linear, the ratio of 0s to 1s determines whether the resulting algorithm can still be faster than linear.
This question is very old, but I've just discovered a wonderful little trick to solve this problem in most cases where it comes up. I'm writing this answer so that I can refer to it elsewhere:
Fast Append, Delete, and Binary Search in a Sorted Array
The need to dynamically insert or delete items from a sorted collection, while preserving the ability to search, typically forces us to switch from a simple array representation using binary search to some kind of search tree -- a far more complicated data structure.
If you only need to insert at the end, however (i.e., you always insert a largest or smallest item), or you don't need to insert at all, then it's possible to use a much simpler data structure. It consists of:
A dynamic (resizable) array of items, the item array; and
A dynamic array of integers, the set array. The set array is used as a disjoint set data structure, using the single-array representation described here: How to properly implement disjoint set data structure for finding spanning forests in Python?
The two arrays are always the same size. As long as there have been no deletions, the item array just contains the items in sorted order, and the set array is full of singleton sets corresponding to those items.
If items have been deleted, though, items in the item array are only valid if the there is a root set at the corresponding position in the set array. All sets that have been merged into a single root will be contiguous in the set array.
This data structure supports the required operations as follows:
Append (O(1))
To append a new largest item, just append the item to the item array, and append a new singleton set to the set array.
Delete (amortized effectively O(log N))
To delete a valid item, first call search to find the adjacent larger valid item. If there is no larger valid item, then just truncate both arrays to remove the item and all adjacent deleted items. Since merged sets are contiguous in the set array, this will leave both arrays in a consistent state.
Otherwise, merge the sets for the deleted item and adjacent item in the set array. If the deleted item's set is chosen as the new root, then move the adjacent item into the deleted item's position in the item array. Whichever position isn't chosen will be unused from now on, and can be nulled-out to release a reference if necessary.
If less than half of the item array is valid after a delete, then deleted items should be removed from the item array and the set array should be reset to an all-singleton state.
Search (amortized effectively O(log N))
Binary search proceeds normally, except that we need to find the representative item for every test position:
int find(item_array, set_array, itemToFind) {
int pos = 0;
int limit = item_array.length;
while (pos < limit) {
int testPos = pos + floor((limit-pos)/2);
if (item_array[find_set(set_array, testPos)] < itemToFind) {
pos = testPos + 1; //testPos is too low
} else {
limit = testPos; //testPos is not too low
}
}
if (pos >= item_array.length) {
return -1; //not found
}
pos = find_set(set_array, pos);
return (item_array[pos] == itemToFind) ? pos : -1;
}

minimum interval of an array of unique elements

How can i find the minimum interval of an integer array in which all the unique elements of that array
are present .
For example my array is : 1 1 1 2 3 1 1 4 3 3 3 2 1 2 2 4 1
minimum interval is from index 3 to index 7.
I'm looking for an algorithm of O(nlogn) or less (n<=100000)
The strategy is iterate from the end to the start, remembering when you last saw each integer. Eg. somewhere in the middle, you last saw 1 at index 15, 2 at index 20, 3 at index 17. The interval length is the maximum index you last saw something minus your current index.
To find the maximum index easily, you should use a self-balancing binary search tree (BST), because it has O(log n) insert and removal time, and constant lookup time for the largest index.
For example, if you have to update the index you last saw a 1, you remove the current last seen index (the 15), and insert the new last seen index.
By updating the self balancing BST with all the end indices allowed by each integer type, we can pick the largest, and say that we can end there.
The exact code depends on how the input is defined (eg. whether you know what all the integers are, ie. you know there exists all integers between 1 and 4 in array, then the code is simplified).
Iteration is O(n), the BST is O(log n). Overall is O(n log n).
Implementation Details
Implementation of this takes a little bit of work.
Initialize:
the interval length for each starting index.
an array for when you last saw a certain integer. (If you don't know what possible integers might be in the array, instead of using a normal array, use an associative array (eg. map<> in C++)).
a priority queue-like type heap, where the top of the queue is the maximum integer in it. You need to be able to easily remove stuff from it, so use a self-balancing binary search tree
Now inside the loop (looping index from end of input array to start of input array),
You can update your last seen array for this particular index.
Just check what integer you see, and update the entry in the index last seen array.
Using before and after in the last seen array, update the BST (remove old end index, add new index)
Update interval length for this starting index, based on largest end index required (from BST).
If you see an integer you haven't seen before, invalidate all interval lengths for starting indices above this index (or just avoid updating interval length until all integers have been seen at least once).
C++ code implementation
Assuming all integers 0-(k-1) are found in input array
Disclaimer: untested
ignores #include and main function
Code:
int n=10,k=3;
int input[n]=?;
unsigned int interval[n];
for (int i=0;i<n;i++) interval[i]=-1; // initialize interval to very large number
int lastseen[k];
for (int i=0;i<k;i++) lastseen[i]=-1; // initialize lastseen
multiset<int> pq;
for (int i=n-1;i>=0;i--) {
if (lastseen[input[i]] != -1) // if lastseen[] already has index
pq.erase(pq.find(lastseen[input[i]])); // erase single copy
lastseen[input[i]]=i; // update last seen
pq.insert(i); // put last seen index into BST
if (pq.size()==k) { // if all integers seen (nothing missing)
// get (maximum of endindex requirements) - current index
interval[i] = (*pq.rbegin())-i+1;
}
}
// find best answer
unsigned int minlength=-1;
int startindex;
for (int i=0;i<n;i++) {
if (minlength>interval[i]) { // better answer?
minlength=interval[i];
startindex=i;
}
}
// Your answer is [startindex,startindex+minlength)

What is the difference between Linear search and Binary search?

What is the difference between Linear search and Binary search?
A linear search looks down a list, one item at a time, without jumping. In complexity terms this is an O(n) search - the time taken to search the list gets bigger at the same rate as the list does.
A binary search is when you start with the middle of a sorted list, and see whether that's greater than or less than the value you're looking for, which determines whether the value is in the first or second half of the list. Jump to the half way through the sublist, and compare again etc. This is pretty much how humans typically look up a word in a dictionary (although we use better heuristics, obviously - if you're looking for "cat" you don't start off at "M"). In complexity terms this is an O(log n) search - the number of search operations grows more slowly than the list does, because you're halving the "search space" with each operation.
As an example, suppose you were looking for U in an A-Z list of letters (index 0-25; we're looking for the value at index 20).
A linear search would ask:
list[0] == 'U'? No.
list[1] == 'U'? No.
list[2] == 'U'? No.
list[3] == 'U'? No.
list[4] == 'U'? No.
list[5] == 'U'? No.
...
list[20] == 'U'? Yes. Finished.
The binary search would ask:
Compare list[12] ('M') with 'U': Smaller, look further on. (Range=13-25)
Compare list[19] ('T') with 'U': Smaller, look further on. (Range=20-25)
Compare list[22] ('W') with 'U': Bigger, look earlier. (Range=20-21)
Compare list[20] ('U') with 'U': Found it! Finished.
Comparing the two:
Binary search requires the input data to be sorted; linear search doesn't
Binary search requires an ordering comparison; linear search only requires equality comparisons
Binary search has complexity O(log n); linear search has complexity O(n) as discussed earlier
Binary search requires random access to the data; linear search only requires sequential access (this can be very important - it means a linear search can stream data of arbitrary size)
Think of it as two different ways of finding your way in a phonebook. A linear search is starting at the beginning, reading every name until you find what you're looking for. A binary search, on the other hand, is when you open the book (usually in the middle), look at the name on top of the page, and decide if the name you're looking for is bigger or smaller than the one you're looking for. If the name you're looking for is bigger, then you continue searching the upper part of the book in this very fashion.
A linear search works by looking at each element in a list of data until it either finds the target or reaches the end. This results in O(n) performance on a given list.
A binary search comes with the prerequisite that the data must be sorted. We can leverage this information to decrease the number of items we need to look at to find our target. We know that if we look at a random item in the data (let's say the middle item) and that item is greater than our target, then all items to the right of that item will also be greater than our target. This means that we only need to look at the left part of the data. Basically, each time we search for the target and miss, we can eliminate half of the remaining items. This gives us a nice O(log n) time complexity.
Just remember that sorting data, even with the most efficient algorithm, will always be slower than a linear search (the fastest sorting algorithms are O(n * log n)). So you should never sort data just to perform a single binary search later on. But if you will be performing many searches (say at least O(log n) searches), it may be worthwhile to sort the data so that you can perform binary searches. You might also consider other data structures such as a hash table in such situations.
A linear search starts at the beginning of a list of values, and checks 1 by 1 in order for the result you are looking for.
A binary search starts in the middle of a sorted array, and determines which side (if any) the value you are looking for is on. That "half" of the array is then searched again in the same fashion, dividing the results in half by two each time.
Make sure to deliberate about whether the win of the quicker binary search is worth the cost of keeping the list sorted (to be able to use the binary search). I.e. if you have lots of insert/remove operations and only an occasional search the binary search could in total be slower than the linear search.
Try this: Pick a random name "Lastname, Firstname" and look it up in your phonebook.
1st time: start at the beginning of the book, reading names until you find it, or else find the place where it would have occurred alphabetically and note that it isn't there.
2nd time: Open the book at the half way point and look at the page. Ask yourself, should this person be to the left or to the right. Whichever one it is, take that 1/2 and find the middle of it. Repeat this procedure until you find the page where the entry should be and then either apply the same process to columns, or just search linearly along the names on the page as before.
Time both methods and report back!
[also consider what approach is better if all you have is a list of names, not sorted...]
Linear search also referred to as sequential search looks at each element in sequence from the start to see if the desired element is present in the data structure. When the amount of data is small, this search is fast.Its easy but work needed is in proportion to the amount of data to be searched.Doubling the number of elements will double the time to search if the desired element is not present.
Binary search is efficient for larger array. In this we check the middle element.If the value is bigger that what we are looking for, then look in the first half;otherwise,look in the second half. Repeat this until the desired item is found. The table must be sorted for binary search. It eliminates half the data at each iteration.Its logarithmic.
If we have 1000 elements to search, binary search takes about 10 steps, linear search 1000 steps.
binary search runs in O(logn) time whereas linear search runs in O(n) times thus binary search has better performance
A linear search looks down a list, one item at a time, without jumping. In complexity terms this is an O(n) search - the time taken to search the list gets bigger at the same rate as the list does.
A binary search is when you start with the middle of a sorted list, and see whether that's greater than or less than the value you're looking for, which determines whether the value is in the first or second half of the list. Jump to the half way through the sublist, and compare again etc. This is pretty much how humans typically look up a word in a dictionary (although we use better heuristics, obviously - if you're looking for "cat" you don't start off at "M"). In complexity terms this is an O(log n) search - the number of search operations grows more slowly than the list does, because you're halving the "search space" with each operation.
Linear Search looks through items until it finds the searched value.
Efficiency: O(n)
Example Python Code:
test_list = [1, 3, 9, 11, 15, 19, 29]
test_val1 = 25
test_val2 = 15
def linear_search(input_array, search_value):
index = 0
while (index < len(input_array)) and (input_array[index] < search_value):
index += 1
if index >= len(input_array) or input_array[index] != search_value:
return -1
return index
print linear_search(test_list, test_val1)
print linear_search(test_list, test_val2)
Binary Search finds the middle element of the array. Checks that middle value is greater or lower than the search value. If it is smaller, it gets the left side of the array and finds the middle element of that part. If it is greater, gets the right part of the array. It loops the operation until it finds the searched value. Or if there is no value in the array finishes the search.
Efficiency: O(logn)
Example Python Code:
test_list = [1, 3, 9, 11, 15, 19, 29]
test_val1 = 25
test_val2 = 15
def binary_search(input_array, value):
low = 0
high = len(input_array) - 1
while low <= high:
mid = (low + high) / 2
if input_array[mid] == value:
return mid
elif input_array[mid] < value:
low = mid + 1
else:
high = mid - 1
return -1
print binary_search(test_list, test_val1)
print binary_search(test_list, test_val2)
Also you can see visualized information about Linear and Binary Search here: https://www.cs.usfca.edu/~galles/visualization/Search.html
For a clear understanding, please take a look at my codepen implementations https://codepen.io/serdarsenay/pen/XELWqN
Biggest difference is the need to sort your sample before applying binary search, therefore for most "normal sized" (meaning to be argued) samples will be quicker to search with a linear search algorithm.
Here is the javascript code, for html and css and full running example please refer to above codepen link.
var unsortedhaystack = [];
var haystack = [];
function init() {
unsortedhaystack = document.getElementById("haystack").value.split(' ');
}
function sortHaystack() {
var t = timer('sort benchmark');
haystack = unsortedhaystack.sort();
t.stop();
}
var timer = function(name) {
var start = new Date();
return {
stop: function() {
var end = new Date();
var time = end.getTime() - start.getTime();
console.log('Timer:', name, 'finished in', time, 'ms');
}
}
};
function lineerSearch() {
init();
var t = timer('lineerSearch benchmark');
var input = this.event.target.value;
for(var i = 0;i<unsortedhaystack.length - 1;i++) {
if (unsortedhaystack[i] === input) {
document.getElementById('result').innerHTML = 'result is... "' + unsortedhaystack[i] + '", on index: ' + i + ' of the unsorted array. Found' + ' within ' + i + ' iterations';
console.log(document.getElementById('result').innerHTML);
t.stop();
return unsortedhaystack[i];
}
}
}
function binarySearch () {
init();
sortHaystack();
var t = timer('binarySearch benchmark');
var firstIndex = 0;
var lastIndex = haystack.length-1;
var input = this.event.target.value;
//currently point in the half of the array
var currentIndex = (haystack.length-1)/2 | 0;
var iterations = 0;
while (firstIndex <= lastIndex) {
currentIndex = (firstIndex + lastIndex)/2 | 0;
iterations++;
if (haystack[currentIndex] < input) {
firstIndex = currentIndex + 1;
//console.log(currentIndex + " added, fI:"+firstIndex+", lI: "+lastIndex);
} else if (haystack[currentIndex] > input) {
lastIndex = currentIndex - 1;
//console.log(currentIndex + " substracted, fI:"+firstIndex+", lI: "+lastIndex);
} else {
document.getElementById('result').innerHTML = 'result is... "' + haystack[currentIndex] + '", on index: ' + currentIndex + ' of the sorted array. Found' + ' within ' + iterations + ' iterations';
console.log(document.getElementById('result').innerHTML);
t.stop();
return true;
}
}
}

Resources