Related
I was asked in an interview today below question. I gave O(nlgn) solution but I was asked to give O(n) solution. I could not come up with O(n) solution. Can you help?
An input array is given like [1,2,4] then every element of it is doubled and
appended into the array. So the array now looks like [1,2,4,2,4,8]. How
this array is randomly shuffled. One possible random arrangement is
[4,8,2,1,2,4]. Now we are given this random shuffled array and we want to
get original array [1,2,4] in O(n) time.
The original array can be returned in any order. How can I do it?
Here's an O(N) Java solution that could be improved by first making sure that the array is of the proper form. For example it shouldn't accept [0] as an input:
import java.util.*;
class Solution {
public static int[] findOriginalArray(int[] changed) {
if (changed.length % 2 != 0)
return new int[] {};
// set Map size to optimal value to avoid rehashes
Map<Integer,Integer> count = new HashMap<>(changed.length*100/75);
int[] original = new int[changed.length/2];
int pos = 0;
// count frequency for each number
for (int n : changed) {
count.put(n, count.getOrDefault(n,0)+1);
}
// now decide which go into the answer
for (int n : changed) {
int smallest = n;
for (int m=n; m > 0 && count.getOrDefault(m,0) > 0; m = m/2) {
//System.out.println(m);
smallest = m;
if (m % 2 != 0) break;
}
// trickle up from smallest to largest while count > 0
for (int m=smallest, mm = 2*m; count.getOrDefault(mm,0) > 0; m = mm, mm=2*mm){
int ct = count.getOrDefault(mm,0);
while (count.get(m) > 0 && ct > 0) {
//System.out.println("adding "+m);
original[pos++] = m;
count.put(mm, ct -1);
count.put(m, count.get(m) - 1);
ct = count.getOrDefault(mm,0);
}
}
}
// check for incorrect format
if (count.values().stream().anyMatch(x -> x > 0)) {
return new int[] {};
}
return original;
}
public static void main(String[] args) {
int[] changed = {1,2,4,2,4,8};
System.out.println(Arrays.toString(changed));
System.out.println(Arrays.toString(findOriginalArray(changed)));
}
}
But I've tried to keep it simple.
The output is NOT guaranteed to be sorted. If you want it sorted it's going to cost O(NlogN) inevitably unless you use a Radix sort or something similar (which would make it O(NlogE) where E is the max value of the numbers you're sorting and logE the number of bits needed).
Runtime
This may not look that it is O(N) but you can see that it is because for every loop it will only find the lowest number in the chain ONCE, then trickle up the chain ONCE. Or said another way, in every iteration it will do O(X) iterations to process X elements. What will remain is O(N-X) elements. Therefore, even though there are for's inside for's it is still O(N).
An example execution can be seen with [64,32,16,8,4,2].
If this where not O(N) if you print out each value that it traverses to find the smallest you'd expect to see the values appear over and over again (for example N*(N+1)/2 times).
But instead you see them only once:
finding smallest 64
finding smallest 32
finding smallest 16
finding smallest 8
finding smallest 4
finding smallest 2
adding 2
adding 8
adding 32
If you're familiar with the Heapify algorithm you'll recognize the approach here.
def findOriginalArray(self, changed: List[int]) -> List[int]:
size = len(changed)
ans = []
left_elements = size//2
#IF SIZE IS ODD THEN RETURN [] NO SOLN. IS POSSIBLE
if(size%2 !=0):
return ans
#FREQUENCY DICTIONARY given array [0,0,2,1] my map will be: {0:2,2:1,1:1}
d = {}
for i in changed:
if(i in d):
d[i]+=1
else:
d[i] = 1
# CHECK THE EDGE CASE OF 0
if(0 in d):
count = d[0]
half = count//2
if((count % 2 != 0) or (half > left_elements)):
return ans
left_elements -= half
ans = [0 for i in range(half)]
#CHECK REST OF THE CASES : considering the values will be 10^5
for i in range(1,50001):
if(i in d):
if(d[i] > 0):
count = d[i]
if(count > left_elements):
ans = []
break
left_elements -= d[i]
for j in range(count):
ans.append(i)
if(2*i in d):
if(d[2*i] < count):
ans = []
break
else:
d[2*i] -= count
else:
ans = []
break
return ans
I have a simple idea which might not be the best, but I could not think of a case where it would not work. Having the array A with the doubled elements and randomly shuffled, keep a helper map. Process each element of the array and, each time you find a new element, add it to the map with the value 0. When an element is processed, increment map[i] and decrement map[2*i]. Next you iterate over the map and print the elements that have a value greater than zero.
A simple example, say that the vector is:
[1, 2, 3]
And the doubled/shuffled version is:
A = [3, 2, 1, 4, 2, 6]
When processing 3, first add the keys 3 and 6 to the map with value zero. Increment map[3] and decrement map[6]. This way, map[3] = 1 and map[6] = -1. Then for the next element map[2] = 1 and map[4] = -1 and so forth. The final state of the map in this example would be map[1] = 1, map[2] = 1, map[3] = 1, map[4] = -1, map[6] = 0, map[8] = -1, map[12] = -1.
Then you just process the keys of the map and, for each key with a value greater than zero, add it to the output. There are certainly more efficient solutions, but this one is O(n).
In C++, you can try this.
With time is O(N + KlogK) where N is the length of input, and K is the number of unique elements in input.
class Solution {
public:
vector<int> findOriginalArray(vector<int>& input) {
if (input.size() % 2) return {};
unordered_map<int, int> m;
for (int n : input) m[n]++;
vector<int> nums;
for (auto [n, cnt] : m) nums.push_back(n);
sort(begin(nums), end(nums));
vector<int> out;
for (int n : nums) {
if (m[2 * n] < m[n]) return {};
for (int i = 0; i < m[n]; ++i, --m[2 * n]) out.push_back(n);
}
return out;
}
};
Not so clear about the space complexity required in the question, so this is my top-of-the-mind attempt to this question if this requires O(n) time complexity.
If the length of the input array is not even, then its wrong !!
Create a map, add the elements of the input array to it.
Divide each element in the input array by 2 and check if that value exists in the map. If it exists, add it to the array (slice) orig.
There is a chance we have added duplicate values to this original array, clean it!!
Here is a sample go code:
https://go.dev/play/p/w4mm-rloHyi
I am sure we can optimize this code in a lot of ways for space complexities. But its O(n) time complexity.
This is a problem about substrings that I created. I am wondering how to implement an O(nlog(n)) solution to this problem because the naive approach is pretty easy. Here is how it goes. You have a string S. S has many substrings. In some substrings, the first character and last character are there more than once. Find how many substrings where the first and last character are there more than once.
Input: "ABCDCBE"
Expected output: 2
Explanation: "BCDCB" and "CDC" are two such substrings
That test case explanation only has "BCDCB" and "CDC" where first and last char are same.
There can be another case aside from the sample case with "ABABCAC" being the substring where the first character "A" appears 3 times and the last character "C" appears twice. "AAAABB" is also another substring.
"AAAAB" does not satisfy.
What I have learned that is O(nlog(n)) that might or might not contribute to solution is Binary Indexed Trees. Binary Indexed Trees can somehow be used to solve this. There is also sorting and binary search, but first I want to focus especially on Binary Indexed Trees.
I am looking for a space complexity of O(n log(n)) or better.
Also Characters are in UTF-16
The gist of my solution is as follows:
Iterate over the input array, and, for each position, compute the amount of 'valid' substrings that end on that position. The sum of these values is the total amount of valid substrings. We achieve this by counting the amount of valid starts to a substring, that come before the current position, using a Binary Indexed Tree.
Now for the full detail:
As we iterate over the array we think of the current element as the end of a substring, and we say that the positions that are a valid start are those such that its value appears again between it, and the position we are currently iterating over. (i.e. if the value at the start of a substring appears at least twice in it)
For example:
current index V
data = [1, 2, 3, 4, 1, 4, 3, 2]
valid = [1, 0, 1, 1, 0, 0, 0, 0]
0 1 2 3 4 5 6 7
The first 1 (at index 0) is a valid start, because there is another 1 (at index 4) after it, but before the current index (index 6).
Now, counting the amount of valid starts that come before the current index gives us something pretty close to what we wanted, except that we may grab some substrings that don't have two appearances of the last value of the substring (i.e. the one we are currently iterating over)
For example:
current index V
data = [1, 2, 3, 4, 1, 4, 3, 2]
valid = [1, 0, 1, 1, 0, 0, 0, 0]
0 1 2 3 4 5 6 7
^--------^
Here, the 4 is marked as a valid start (because there is another 4 that comes after it), but the corresponding substring does not have two 3s.
To fix this, we shall only consider valid starts up to the previous appearance of the current value. (this means that the substring will contain both the current value, and its previous appearance, thus, the last element will be in the substring at least twice)
The pseudocode goes as follows:
fn solve(arr) {
answer := 0
for i from 1 to length(arr) {
previous_index := find_previous(arr, i)
if there is a previous_index {
arr[previous_index].is_valid_start = true
answer += count_valid_starts_up_to_and_including(arr, previous_index)
}
}
return answer
}
To implement these operations efficiently, we use a hash table for looking up the previous position of a value, and a Binary Indexed Tree (BIT) to keep track of and count the valid positions.
Thus, a more fleshed out pseudocode would look like
fn solve(arr) {
n := length(arr)
prev := hash_table{}
bit := bit_indexed_tree{length = n}
answer := 0
for i from 1 to length(arr) {
value := arr[i]
previous_index := prev[value]
if there is a previous_index {
bit.update(previous_index, 1)
answer += bit.query(previous_index)
}
prev[value] = i
}
return answer
}
Finally, since a pseudocode is not always enough, here is an implementation in C++, where the control flow is a bit munged, to ensure efficient usage of std::unordered_map (C++'s built-in hash table)
class Bit {
std::vector<int> m_data;
public:
// initialize BIT of size `n` with all 0s
Bit(int n);
// add `value` to index `i`
void update(int i, int value);
// sum from index 0 to index `i` (inclusive)
int query(int i);
};
long long solve (std::vector<int> const& arr) {
int const n = arr.size();
std::unordered_map<int, int> prev_index;
Bit bit(n);
long long answer = 0;
int i = 0;
for (int value : arr) {
auto insert_result = prev_index.insert({value, i});
if (!insert_result.second) { // there is a previous index
int j = insert_result.first->second;
bit.update(j, 1);
answer += bit.query(j);
insert_result.first->second = i;
}
++i;
}
return answer;
}
EDIT: For transparency, here is the Fenwick tree implementation i used to test this code
struct Bit {
std::vector<int> m_data;
Bit(int n) : m_data(n+2, 0) { }
int query(int i) {
int res = 0;
for(++i; i > 0; i -= i&-i) res += m_data[i];
return res;
}
void update(int i, int x) {
for(++i; i < m_data.size(); i += i&-i) m_data[i] += x;
}
};
I only heard of this question, so I don't know the exact limits. You are given a list of positive integers. Each two consecutive values form a closed interval. Find the number that appears in most intervals. If two values appear the same amount of times, select the smallest one.
Example: [4, 1, 6, 5] results in [1, 4], [1, 6], [5, 6] with 1, 2, 3, 4, 5 each showing up twice. The correct answer would be 1 since it's the smallest.
I unfortunately have no idea how this can be done without going for an O(n^2) approach. The only optimisation I could think of was merging consecutive descending or ascending intervals, but this doesn't really work since [4, 3, 2] would count 3 twice.
Edit: Someone commented (but then deleted) a solution with this link http://www.zrzahid.com/maximum-number-of-overlapping-intervals/. I find this one the most elegant, even though it doesn't take into account the fact that some elements in my input would be both the beginning and end of some intervals.
Sort intervals based on their starting value. Then run a swipe line from left (the global smallest value) to the right (the global maximum value) value. At each meeting point (start or end of an interval) count the number of intersection with the swipe line (in O(log(n))). Time complexity of this algorithm would be O(n log(n)) (n is the number of intervals).
The major observation is that the result will be one of the numbers in the input (proof left to the reader as simple exercise, yada yada).
My solution will be inspired by #Prune's solution. The important step is mapping the input numbers to their order within all different numbers in the input.
I will work with C++ std. We can first load all the numbers into a set. We can then create map from that, which maps a number to its order within all numbers.
int solve(input) {
set<int> vals;
for (int n : input) {
vals.insert(n);
}
map<int, int> numberOrder;
int order = 0;
for (int n : vals) { // values in a set are ordered
numberOrder[n] = order++;
}
We then create process array (similar to #Prune's solution).
int process[map.size() + 1]; // adding past-the-end element
int curr = input[0];
for (int i = 0; i < input.size(); ++i) {
last = curr;
curr = input[i];
process[numberOrder[min(last, curr)]]++;
process[numberOrder[max(last, curr)] + 1]--;
}
int appear = 0;
int maxAppear = 0;
for (int i = 0; i < process.size(); ++i) {
appear += process[i];
if (appear > maxAppear) {
maxAppear = appear;
maxOrder = i;
}
}
Last, we need to find our found value in the map.
for (pair<int, int> a : numberOrder) {
if (a.second == maxOrder) {
return a.first;
}
}
}
This solution has O(n * log(n)) time complexity and O(n) space complexity, which is independent on maximum input number size (unlike other solutions).
If the maximum number in the range array is less than the maximum size limit of an array, my solution will work with complexity o(n).
1- I created a new array to process ranges and use it to find the
numbers that appears most in all intervals. For simplicity let's use
your example. the input = [1, 4], [1, 6], [5, 6]. let's call the new
array process and give it length 6 and it is initialized with 0s
process = [0,0,0,0,0,0].
2-Then loop through all the intervals and mark the start with (+1) and
the cell immediately after my range end with (-1)
for range [1,4] process = [1,0,0,0,-1,0]
for range [1,6] process = [2,0,0,0,-1,0]
for range [5,6] process = [2,0,0,0,0,0]
3- The p rocess array will work as accumulative array. initialize a
variable let's call it appear = process[0] which will be equal to 2
in our case. Go through process and keep accumulating what can you
notice? elements 1,2,3,4,5,6 will have appear =2 because each of
them appeared twice in the given ranges .
4- Maximize while you loop through process array you will find the
solution
public class Test {
public static void main(String[] args) {
int[] arr = new int[] { 4, 1, 6, 5 };
System.out.println(solve(arr));
}
public static int solve(int[] range) {
// I assume that the max number is Integer.MAX_VALUE
int size = (int) 1e8;
int[] process = new int[size];
// fill process array
for (int i = 0; i < range.length - 1; ++i) {
int start = Math.min(range[i], range[i + 1]);
int end = Math.max(range[i], range[i + 1]);
process[start]++;
if (end + 1 < size)
process[end + 1]--;
}
// Find the number that appears in most intervals (smallest one)
int appear = process[0];
int max = appear;
int solu = 0;
for (int i = 1; i < size; ++i) {
appear += process[i];
if (appear > max){
solu = i;
max = appear;
}
}
return solu;
}
}
Think of these as parentheses: ( to start and interval, ) to end. Now check the bounds for each pair [a, b], and tally interval start/end markers for each position: the lower number gets an interval start to the left; the larger number gets a close interval to the right. For the given input:
Process [4, 1]
result: [0, 1, 0, 0, 0, -1]
Process [1, 6]
result: [0, 2, 0, 0, 0, -1, 0, -1]
Process [6, 5]
result: [0, 2, 0, 0, 0, -1, 1, -2]
Now, merely make a cumulative sum of this list; the position of the largest value is your desired answer.
result: [0, 2, 0, 0, 0, -1, 1, -2]
cumsum: [0, 2, 2, 2, 2, 1, 2, 0]
Note that the final sum must be 0, and can never be negative. The largest value is 2, which appears first at position 1. Thus, 1 is the lowest integer that appears the maximum (2) quantity.
No that's one pass on the input, and one pass on the range of numbers. Note that with a simple table of values, you can save storage. The processing table would look something like:
[(1, 2)
(4, -1)
(5, 1)
(6, -2)]
If you have input with intervals both starting and stopping at a number, then you need to handle the starts first. For instance, [4, 3, 2] would look like
[(2, 1)
(3, 1)
(3, -1)
(4, -1)]
NOTE: maintaining a sorted insert list is O(n^2) time on the size of the input; sorting the list afterward is O(n log n). Either is O(n) space.
My first suggestion, indexing on the number itself, is O(n) time, but O(r) space on the range of input values.
[
How can I find duplicates in array when there is more than one duplicated element?
When the array is only one duplicated element (for example: 1, 2, 3, 4, 4, 4, 5, 6, 7) then it is very easy:
int duplicate(int* a, int s)
{
int x = a[0];
for(int i = 1; i < s; ++i)
{
x = x ^ a[i];
}
for(int i = 0; i < a[s]; ++i)
{
x = x ^ i;
}
return x;
}
But if the input array contains more than one duplicated element (for example: 1, 2, 2, 2, 3, 4, 4, 4, 5, 6, 7), the above won't work. How can we solve this problem in O(n) time?
If space is no concern or the maximal number is quite low, you can simple use a kind of a bit-array and mark all already occurred numbers by setting the bit at the position of the number.
It'a a kind of HashSet with trivial (identity) hash-function.
Tests and set cost O(1) time.
Using a set is one of the possible generic solutions. Example in c++:
template <typename T>
void filter_duplicates(T* arr, int length) {
std::unordered_set<T> set;
for (int i = 0; i < length; ++i) {
if (set.count(arr[i]) > 0) {
// then it's a duplicate
}
set.insert(arr[i]);
}
// the set contains all the items, unduplicated
}
As unordered_set is implemented as a hash table, insertion and lookup are of amortized constant complexity. As a set can only contain unique keys, this effectively de-duplicates the items. We could finally convert back the set to an array. We could also use a map to count the occurrences.
If array elements are integers and that the maximum possible value is known, and fairly low, then the set can be replaced by a simple array either 1. of boolean or 2. of integer if we want to count the number of occurrences.
How could we find longest increasing sub-sequence starting at each position of the array in O(n log n) time, I have seen techniques to find longest increasing sequence ending at each position of the array but I am unable to find the other way round.
e.g.
for the sequence " 3 2 4 4 3 2 3 "
output must be " 2 2 1 1 1 2 1 "
I made a quick and dirty JavaScript implementation (note: it is O(n^2)):
function lis(a) {
var tmpArr = Array(),
result = Array(),
i = a.length;
while (i--) {
var theValue = a[i],
longestFound = tmpArr[theValue] || 1;
for (var j=theValue+1; j<tmpArr.length; j++) {
if (tmpArr[j] >= longestFound) {
longestFound = tmpArr[j]+1;
}
}
result[i] = tmpArr[theValue] = longestFound;
}
return result;
}
jsFiddle: http://jsfiddle.net/Bwj9s/1/
We run through the array right-to-left, keeping previous calculations in a separate temporary array for subsequent lookups.
The tmpArray contains the previously found subsequences beginning with any given value, so tmpArray[n] will represent the longest subsequence found (to the right of the current position) beginning with the value n.
The loop goes like this: For every index, we look up the value (and all higher values) in our tmpArray to see if we already found a subsequence which the value could be prepended to. If we find one, we simply add 1 to that length, update the tmpArray for the value, and move to the next index. If we don't find a working (higher) subsequence, we set the tmpArray for the value to 1 and move on.
In order to make it O(n log n) we observe that the tmpArray will always be a decreasing array -- it can and should use a binary search rather than a partial loop.
EDIT: I didn't read the post completely, sorry. I thought you needed the longest increasing sub-sequence for all sequence. Re-edited the code to make it work.
I think it is possible to do it in linear time, actually. Consider this code:
int a[10] = {4, 2, 6, 10, 5, 3, 7, 5, 4, 10};
int maxLength[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; // array of zeros
int n = 10; // size of the array;
int b = 0;
while (b != n) {
int e = b;
while (++e < n && a[b] < a[e]) {} //while the sequence is increasing, ++e
while (b != e) { maxLength[b++] = e-b-1; }
}