binary search in array that contains range - algorithm

let's say that we have an ordered array contains elements like this ,
[1, 2-5, 6, 8-9, 11-13] , 2-5 is a range that represents 2, 3, 4 and 5, if we want to find "4" then index 1 (start from 0) is answer we need.
It's that possible we apply binary search like this type of elements with constans space and log(n) time?

You can just use binary search, the concept will also work with the ranges like a charm. Actually this is a concept commonly used to reduce time and space complexity, for example in gap encoding.
However you need to write it on your own instead of using any library as the library-method will probably not accept the ranges.
Let us briefly go through the execution of a binary search on your given input of [1, 2-5, 6, 8-9, 11-13] searching for the value 4 which is at index 1.
The array [1, 2-5, 6, 8-9, 11-13] has length 5, we decide for the index in the middle which is 2. It reads the value 6 there. We search for the value 4 so we continue the search to the left.
We now reduced the search interval to [1, 2-5, 6], length 3 and we decide for the middle index 1. It reads 2-5. As 4 is inside that range we have finished and return index 1 as result.
If for example it would read 5-7 there then we would continue the search to the left as 4 is not inside 5-7. Analogously we would continue the search to the right if it would read 1-3.
Here is an explanation of binary search with some pseudo-code: Binary search algorithm at Wikipedia
If you have problems implementing than just edit your question and show us what you have done so far, we will then adapt and help.

Related

Longest substring in which all bits are the same (DP algorithm)

You're given a bit string that consists of 1 <= n <= 32 bits.
You are also given a sequence of changes that invert some of the bits. If the original string is 001011, and the change is "3", then the changed bit string would be 000011. (The 3rd bit from the right was flipped)
I have to find after each change, the length of the longest substring in which each bit is the same. Therefore, for 000011, the answer would be 4.
I think brute force would just be a sliding window that starts at size of the string and shrinks until the first instance where all the bits in the window are the same.
How would that be altered for a dynamic programming solution?
You can solve this by maintaining a list of indices at which the bit flips. You start by creating that list: shift the sequence by one bit (losing the one on the end), and compare to the original. Any mismatch here is a bit flip. Make a list of those indices:
001011
01011
-234--
In this case, you have bit flips after locations 2, 3, and 4.
Now, you need to develop a simple function to process your change operation. This is very simple: for a change of bit n, you need to change whether indices n-1 and n are in the list: if it's not in the list, add it; if it is in the list, remove it. In the case of changing bit 3, both are in the list, so you now remove them:
---4--
Any time you want to check the longest substring, you need merely check adjacent indices for the largest different. Include 0 and the string length as endpoints. Thus, when you have the list [0, 2, 3, 4, 6], you have a maximum difference of 2 at 2-0 and 6-4. After the change, with the list [0, 4, 6], you have the maximum difference of 4 at 4-0.
If you have a large list with many indices, you can simply maintain differences, altering only the adjacent intervals affected by the single change.
This should get you going; I leave the implementation details to the student. :-)

Algorithm for finding change in progression of two sets of arrays

I'm looking for an algorithm that will search for a change in the progression of two sets of numbers with the same length. The set starts at the same number all the time. For example:
Assumptions:
1. Arrays 1 and 2 are the same length
2. Progressions are not available at the start, and need to be computed. But computing it will be expensive with resources.
Array 1 [1, 3, 5, 7, 10]
Progression: +2, +2, +2, +3
Array 2 [1, 2, 4, 6, 5]
Progression: +1, +2, +2, -1
Result: Array or numbers deviates on first progression by -1 and last progression by -4.
Is there a way to do this without resorting to any sort of linear search?
Since you are only concerned with the first and last deviations you can search from the front of the list until you find the first deviation (if you don't find one then you obviously don't need to do the second step), and then search from the end of the list until you either find the last one, or reach the position where you found the first one (in which case there is only 1 deviation).
However, I don't believe you can do it without any sort of linear search, and in the worst case you would end up having to check every single deviation.

canceling arrays by number of items that I am ready to lose

We are writing c# program that will help us to remove some of unnecessary data repeaters and already found some repeaters to remove with help of this Finding overlapping data in arrays. Now we are going to check maybe we can to cancel some repeaters by other term. The question is:
We have arrays of numbers
{1, 2, 3, 4, 5, 6, 7, ...}, {4, 5, 10, 100}, {100, 1, 20, 50}
some numbers can be repeated in other arrays, some numbers can be unique and to belong only to specific array. We want to remove some arrays when we are ready to lose up to N numbers from the arrays.
Explanation:
{1, 2}
{2, 3, 4, 5}
{2, 7}
We are ready to lose up to 3 numbers from these arrays it means that we can remove array 1 cause we will lose only number "1" it's only unique number. Also we can remove array 1 and 3 cause we will lose numbers "1", "7" or array 3 cause we will lose number "7" only and it less than 3 numbers.
In our output we want to give maximum amount of arrays that can be removed with condition that we going to lose less then N where N is number of items we are ready to lose.
This problem is equivalent to the Set Cover problem (e.g.: take N=0) and thus efficient, exact solutions that work in general are unlikely. However, in practice, heuristics and approximations are often good enough. Given the similarity of your problem with Set Cover, the greedy heuristic is a natural starting point. Instead of stopping when you've covered all elements, stop when you've covered all but N elements.
You need to first get a number for each array which tells you hwo many numbers are unique to that particular array.
An easy way to do this is O(n²) since for each element, you need to check through all arrays if it's unique.
You can do this much more efficiently by having sorted arrays, sorting first or using a heap-like data structure.
After that, you only have to find a sum so that the numbers for a certain amount of arrays sum up to N.That's similar to the subset sum problem, but much less complex because N > 0 and all your numbers are > 0.
So you simply have to sort these numbers from smallest to greatest and then iterate over the sorted array and take the numbers as long as the sum < N.
Finally, you can remove every array that corresponds to a number which you were able to fit into N.

List value reduction algorithm

Forgive me, but I am very confused and I cannot find any sources that are pointing my in the right direction.
Given list of n elements:
[3, 6, 5, 1]
Reduce the values to be no larger than the size of the list while keeping prioritization values relative to one another (In their original order).
Constraints:
Order must be maintained
Elements are >= 0
Distinct values
I am trying to stay away from sorting and creating a new list, but modifying the list in-place.
What my expected outcome should be:
[1, 3, 2, 0]
Is there an algorithm that exists for this problem?
You could do this in O(n^2).
Just go through the list n times, setting the minimum element(while >= i) to i each time, where i starts at 0 and increments to n-1
I suspect you're looking for something better than that, but I'm not sure how much better you can do in-place.
Example:
Input: 3 6 5 1
3 6 5 0*
1* 6 5 0
1 6 2* 0
1 3* 2 0
Note: this assumes elements are >= 0 and distinct
There may be one, but you don't need it if you think about the steps needed to take to solve this problem.
First, you know that each value in the array cannot be greater than 4, since that is the size in this particular example.
You need to go through each number in the array and with a if condition check to see if the number is greater; if it is then you'll need to decrement it until it is meets the correct condition (in this case, that it is less than 4).
Perform these steps for each index of the array. As far as order, don't swap any indices, since you must retain the original order. Hope that helps!

Subset calculation of list of integers

I'm currently implementing an algorithm where one particular step requires me to calculate subsets in the following way.
Imagine I have sets (possibly millions of them) of integers. Where each set could potentially contain around a 1000 elements:
Set1: [1, 3, 7]
Set2: [1, 5, 8, 10]
Set3: [1, 3, 11, 14, 15]
...,
Set1000000: [1, 7, 10, 19]
Imagine a particular input set:
InputSet: [1, 7]
I now want to quickly calculate to which this InputSet is a subset. In this particular case, it should return Set1 and Set1000000.
Now, brute-forcing it takes too much time. I could also parallelise via Map/Reduce, but I'm looking for a more intelligent solution. Also, to a certain extend, it should be memory-efficient. I already optimised the calculation by making use of BloomFilters to quickly eliminate sets to which the input set could never be a subset.
Any smart technique I'm missing out on?
Thanks!
Well - it seems that the bottle neck is the number of sets, so instead of finding a set by iterating all of them, you could enhance performance by mapping from elements to all sets containing them, and return the sets containing all the elements you searched for.
This is very similar to what is done in AND query when searching the inverted index in the field of information retrieval.
In your example, you will have:
1 -> [set1, set2, set3, ..., set1000000]
3 -> [set1, set3]
5 -> [set2]
7 -> [set1, set7]
8 -> [set2]
...
EDIT:
In inverted index in IR, to save space we sometimes use d-gaps - meaning we store the offset between documents and not the actual number. For example, [2,5,10] will become [2,3,5]. Doing so and using delta encoding to represent the numbers tends to help a lot when it comes to space.
(Of course there is also a downside: you need to read the entire list in order to find if a specific set/document is in it, and cannot use binary search, but it sometimes worths it, especially if it is the difference between fitting the index into RAM or not).
How about storing a list of the sets which contain each number?
1 -- 1, 2, 3, 1000000
3 -- 1, 3
5 -- 2
etc.
Extending amit's solution, instead of storing the actual numbers, you could just store intervals and their associated sets.
For example using a interval size of 5:
(1-5): [1,2,3,1000000]
(6-10): [2,1000000]
(11-15): [3]
(16-20): [1000000]
In the case of (1,7) you should consider intervals (1-5) and (5-10) (which can be determined simply by knowing the size of the interval). Intersecting those ranges gives you [2,1000000]. Binary search of the sets shows that indeed, (1,7) exists in both sets.
Though you'll want to check the min and max values for each set to get a better idea of what the interval size should be. For example, 5 is probably a bad choice if the min and max values go from 1 to a million.
You should probably keep it so that a binary search can be used to check for values, so the subset range should be something like (min + max)/N, where 2N is the max number of values that will need to be binary searched in each set. For example, "does set 3 contain any values from 5 to 10?" this is done by finding the closest values to 5 (3) and 10 (11), in this case, no it does not. You would have to go through each set and do binary searches for the interval values that could be within the set. This means ensuring that you don't go searching for 100 when the set only goes up to 10.
You could also just store the range (min and max). However, the issue is that I suspect your numbers are going be be clustered, thus not providing much use. Although as mentioned, it'll probably be useful for determining how to set up the intervals.
It'll still be troublesome to pick what range to use, too large and it'll take a long time to build the data structure (1000 * million * log(N)). Too small, and you'll start to run into space issues. The ideal size of the range is probably such that it ensures that the number of set's related to each range is approximately equal, while also ensuring that the total number of ranges isn't too high.
Edit:
One benefit is that you don't actually need to store all intervals, just the ones you need. Although, if you have too many unused intervals, it might be wise to increase the interval and split the current intervals to ensure that the search is fast. This is especially true if processioning time isn't a major issue.
Start searching from biggest number (7) of input set and
eliminate other subsets (Set1 and Set1000000 will returned).
Search other input elements (1) in remaining sets.

Resources