Neat way of computing functions on key-value pairs

Neat way of computing functions on key-value pairs - data-structures

Suppose you have a have list with key - value pairs. Neither
keys, nor values, nor the pair are required to be unique.
The following example
a -> 1
b -> 2
c -> 3
a -> 3
b -> 1
would be valid.
Now suppose I want to associate to any key value pair (k->v) another value V,
which has the following properties:
it is the same for two pairs, if their keys are identical
it is uniquely determined by the set of key-value pairs in the entire list
This sounds abstract, but for example the sum, the maximum and counting function qualify as examples
Pair SUM MAX COUNT
a -> 1 4 3 2
b -> 2 3 2 2
c -> 3 3 3 1
a -> 3 4 3 2
b -> 1 3 2 2
I am looking for a fast methods/data structures to compute such functions on the entire list.
If the keys can be sorted, one can simply sort the list, then iterate through the sorted list, and compute the function V in each block with identical keys.
I am asking whether there are nice methods to do this, if the values are not comparable or one does
not want to change the order of the entries.
Some thoughts:
Of course, one could apply a hash function to the keys, in order to obtain sortable keys.
Of course, one could also store the original position of each pair, then do the sorting, then compute
the function, and finally undo the sorting.
So essentially the question is already answered. However, I am interested in whether
there are more elegant solutions maybe using some adapt data structure
EDIT: To clarify Sunny Agrawal comment, what I mean by associate. Well this is also part of the question on how to nicely arrange the data structure.
In my example, I would get another list/map with (k->v) as key and V as value. However, it might make sense to not arrange the data that way. I require, that V is stored in such a way that for given k it needs constant time to obtain V.

Maintain 2 DS
1. List< Pair< Key_Type, Value_Type > >
2. Map<Key_Type, Stats>
where Stats is struct as follows
struct Stats
{
int Sum;
int Count;
int Max;
};
First DS Contains all your key,val pairs in the order you want to store,
Second maintains the data stats for each key as shown in your example
Insert will work as follows(Pseudo C++ Code)
void Insert(key,val)
{
list.insert(Pair(key,val))
Stats curr;
if(map.contains(key))
{
curr = map[key];
curr.Max = Max(curr.Max, val);
curr.Count++;
curr.Sum += val;
}
else
{
curr.Max = val
curr.Count = 1;
curr.Sum = val;
}
map[key] = curr;
}
Complexity will be O(1) for updating list and O(lgM) for updating map
where M is no of unique Keys and
if N is total no of objects in list
Total time in inserts will be O(N) + O(NlogM)
Note: this will work if we have inserts only, in case of deletions, Updating Max will be difficult

Related

Which data structure supports given operations efficiently

I need to think of a data structure, which supports the following operations efficiently:
1) Add an integer x
2) Delete an integer with maximum frequency (if there are more than one element with the same maximum frequency delete all of them).
I am thinking of implementing a segment tree where each node stores the index of its child having largest frequency.
Any ideas or suggestions on how to approach this problem or how should it be implemented would be kindly appreciated.

We can use a combination of data structures. A hash_map to maintain the frequency mappings, where the key is the integer, and value a pointer to a "frequency" node representing the frequency value and the set of integers having the same frequency. The frequency nodes will be maintained in a list ordered by the values of the frequencies.
The Frequency node can be defined as
class Freq {
int frequency;
Set<Integer> values_with_frequency;
Freq prev;
Freq next;
}
The elements HashMap would then contain entries of the form
Entry<Integer, Freq>
So, for a snapshot of the dataset such as
a,b,c,b,d,d,a,e,a,f,b where the letters denote integers, the following would be how the data structure would look like.
c -----> (1, [c, e, f])
|
|
e --
|
|
f --
a -----> (3, [a, b])
|
|
b --
d --> (2, [d])
The Freq nodes would be maintained in a linked list, say freq_nodes, sorted by the frequency value. Note that, as explained below, there wouldn't be any log(n) operation needed for keeping the list sorted on the add/delete operations.
The way the add(x), and delete_max_freq() operations could be implemented is as follows
add(x) :
If x is not found in the elements map, check if the first element of the freq_nodes contains the Freq object with frequency 1. If so, add x to the values_with_frequency set of the Freq object. Otherwise, create a new Freq object with 1 as the frequency value and x added to the (now only single element) wrapped set values_with_frequency
Otherwise, (i.e. if x is already there in the elements map), follow the pointer in the value of the entry corresponding to x in elements to the Freq object in the freq_nodes, remove x from the values_with_frequency field of the Freq object, noting the current value of x’s frequency which is the value of elements.get(x).frequency(Hold this value in say F). If the set values_with_frequency is rendered empty due to this removal, delete the corresponding node from the freq_nodes linked list. Finally if the next Freq node in the freq_nodes linked list has the frequency F+1, just add x to the values_with_frequency field of the next node. Otherwise just create a Freq node as was done in the case of non-existence of Freq node with frequency 1 above.
Finally, add the entry (x, Freq) to the elements map.
Note that this whole add(x) operation is going to be O(1) in time.
Here's an example of a sequence of add() operations with the subsequent state of the data structure.
add(a)
a -> N1 : freq_nodes : |N1 (1, {a}) | ( N1 is actually a Freq object)
add(b)
a -> N1 : freq_nodes : |N1 (1, {a, b}) |
b -> N1
add(a)
At this point ‘a’ points to N1, however, its current frequency is 2, so we need to insert a node N2 next to N1 in the DLL, after removing it from N1’s values_with_frequency set {a,b}
a -> N2 : freq_nodes : |N1 (1, {b}) | --> |N2 (2, {a}) |
b -> N1
The interesting thing to note here is that any time we increase the frequency of an existing element from F to say F+1, we need to do the following
if (next node has a higher frequency than F+1 or we have reached the end of the list):
create a new Freq node with frequency equal to F+1 (as is done above)
and insert it next to the current node
else :
add ‘a’ (the input to the add() operation) to the ```values_with_frequency``` set of the next node
The delete_max_freq() operation would just involve removing the last entry of the linked list freq_nodes, and iterating over the keys in the wrapped set values_with_frequency to remove the corresponding keys from the elements map. This operation would take O(k) time where k is the number of elements with maximum frequency.

Assuming "efficient" refers to the way the complexity of those operations scale, big-O style, I'd consider something consisting of:
a hashmap with the integers as keys and their frequencies as values
a tree structure (possibly a binary search tree, e.g.) where its nodes have a number representing a frequency and a hashset of numbers which have that frequency.
When a number is inserted:
1. Look up the number in the hashmap to find its frequency. (O(1))
2. Look up the frequency in the tree (O(log N)). Remove the number from its collection (O(1)). If the collection is empty, remove the frequency from the tree (O(log N)).
3. Increment the number's frequency. Set that value in the hashmap (O(1)).
4. Look up its new frequency in the tree (O(log N)). If it's there, add the number to the collection there (O(1)). If not, add a new node with the number in its collection (O(log N)).
When deleting items with the maximum frequency:
1. Remove the highest-valued node from the tree (O(log N)).
2. For each number in that node's collection, remove that number's entry from the hashmap (O(1) for each number removed).
If you have N numbers to add and remove, your worst-case scenario should be O(N log N) regardless of the actual distribution of frequencies or the order in which numbers are added and removed.
If you know of any assumptions you can make about the numbers being added, it's possible you could make further enhancements like using an indexed array rather than an ordered tree. But if your inputs are fairly unbounded, this seems like a pretty good structure to handle all the operations you want without getting into O(n²) territory.

My thoughts:
You will need 2 maps.
Map 1: Integer as key with frequency as value.
Map 2: Have a map of frequencies as keys and list of integers as values.
Add Integer: Add the integer to map 1. Get the frequency. Add it to the list of frequency key in map 2.
Delete Integer : We can obviously maintain maximum frequency in a variable across these operations. Now, remove the key from map2 which has this max frequency and decrement max frequency.
So, adding and deleting performance should be O(1) on average.
In the above scenario, we will still have integers in map 1 which exist and have the frequency which is unrealistic after the delete from map 2. In this case, when same integer gets added, we do an on demand update in map 1, meaning, if current frequency in map 1 doesn't exist in map 2 for this integer, it means it was deleted and we can reset that to 1 again.
Implementation:
import java.util.*;
class Foo{
Map<Integer,Integer> map1;
Map<Integer,Set<Integer>> map2;
int max_freq;
Foo(){
map1 = new HashMap<>();
map2 = new HashMap<>();
map2.put(0,new HashSet<>());
max_freq = 0;
}
public void add(int x){
map1.putIfAbsent(x,0);
int curr_f = map1.get(x);
if(!map2.containsKey(curr_f)){
map1.put(x,1);
}else{
map1.merge(x,1,Integer::sum);
}
map2.putIfAbsent(map1.get(x),new HashSet<>());
map2.get(map1.get(x)-1).remove(x); // remove from previous frequency list
map2.get(map1.get(x)).add(x);// add to current frequency list
max_freq = Math.max(max_freq,map1.get(x));
printState();
}
public List<Integer> delete(){
List<Integer> ls = new ArrayList<>(map2.get(max_freq));
map2.remove(max_freq);
max_freq--;
while(max_freq > 0 && map2.get(max_freq).size() == 0) max_freq--;
printState();
return ls;
}
public void printState(){
System.out.println(map1.toString());
System.out.println("Maximum frequency: " + max_freq);
for(Map.Entry<Integer,Set<Integer>> m : map2.entrySet()){
System.out.println(m.getKey() + " " + m.getValue().toString());
}
System.out.println("----------------------------------------------------");
}
}
Demo: https://ideone.com/tETHKV
Note: The call to delete() is amortized.

Grouping numbers in a list

I came across the following question,
You are given an array A of n elements. These elements are now added to a new list L which is initially empty , in a certain order based on the given q queries.
In each query you are given an integer i that corresponds to A[i] in the array A. This means that you have to add the element A[i] to the list L.
After each element is added to the list L, make groups among the elements in the list L. Two elements will be in same group if their indexes in the array A are consecutive.
For each group we define the group’s value as axb where a is the largest value in that group and b is the size of that group.
Print the maximum group value among all the groups that are formed after each element is added to the list L.
My approach was to use a map<int,vector<int>> where key is the group number and value is a vector containing group size, max. of group. I also had an array g and g[i] indicated group number of a[i], -1 if it is not in any group. The code below is a part of my implementation, but I'm sure there are better ways to solve this question as this solution of mine gave TLE and WA in some cases,and I can't seem to figure out the correct approach. Pls suggest optimal way to solve this.
int g[a.size()+2]; //+2 because queries start with index 1, and g[i] corresponds to a[i-1]
for(int i=0;i<a.size()+2;i++)
g[i]=-1;
int gno=1;
map<int,vector<int> > m;
vector<int> ans;
int mx=0;
for(unsigned int i=0;i<queries.size();i++){
int q = queries[i];
if(g[q-1]==-1 && g[q+1]==-1){
//create new group with current eleent as first element
g[q] = gno; //gno is the group number.
vector<int> v;
v.push_back(1);
v.push_back(a[q-1]);
m[gno]=v;
mx = max(mx,m[gno][0]*m[gno][1]);
gno++;
}
else if(g[q-1]!=-1 && g[q+1]==-1){
//join current element to left group
g[q] = g[q-1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else if(g[q-1]==-1 && g[q+1]!=-1){
//join current element to right group
g[q] = g[q+1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else{
//join both groups to left and right
g[q]=g[q-1];
int g1 = g[q];
int i;
m[g[q]][0] += 1 + m[g[q+1]][0];
m[g[q]][1] = max(m[g[q]][1],max(a[q-1],m[g[q+1]][1]));
for(i=q+1;g[i]==g[i+1];i++){
g[i]=g1;
}
g[i]=g1;
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
ans.push_back(mx);
}
.

I would not actually build list L. It may be too costly in time to find what to do with a new value: is it a new group on itself, does it extend an existing group, do two groups need to merge into one? If the first values are all far apart, you'll have many groups, and you need to iterate them with each new incoming value: this is not efficient.
I would just collect all the values first and only then see how they fit in groups.
There are two ways to collect the values:
Store them in a list, and when all values have been collected, sort the list in ascending order
Flag the entry in an array of booleans of size n. This way you do not have to sort it, but afterwards you do need to iterate the whole array to find the values in ascending order.
Method 1 will be the best when q is a lot less than n. Method 2 will be better for greater q.
With both methods you'll be able to iterate over the found values in ascending order, and while doing so you can identify the groups, their value, and also keep track of the largest group-value. Only one sweep is needed to find the answer.

Let's start with two simplifying assumptions:
no duplicates. Once a given index i has been "queried", it will never be queried again.
no negative numbers. All elements are positive or zero, so the largest value in a group is always positive or zero, so expanding a group (or merging two groups) will never cause the overall "maximum group value" to decrease.
(Further below I'll show how to not require those assumptions, but for now this will simplify the picture.)
So, whenever we "query" an index i, there are four cases:
i-1 is currently the right-endpoint of a group (by which I mean its greatest index) and i+1 is currently the left-endpoint of another group.
In this case, we need to merge the two groups into a single group, with i bridging the gap between them.
i-1 is currently the right-endpoint of a group, but i+1 is not currently in any group.
In this case we need to extend the group to cover i.
i-1 is not currently in any group, but i+1 is currently the left-endpoint of a group.
In this case, as in the previous case, we need to extend the group to cover i.
Neither i-1 nor i+1 is in a group.
In this case, we have a new group with just one element.
In all cases, the key thing to note is that we're only interested in the endpoints of groups. So we don't need a general mapping from indices to their groups . . . which is good, because when we merge two groups, it would be expensive to then go and update every single index from one group to point to the other.
So we just need three mappings:
std::unordered_map<int, int> map_from_left_endpoint_to_right_endpoint;
std::unordered_map<int, int> map_from_right_endpoint_to_left_endpoint;
std::unordered_map<int, int> map_from_left_endpoint_to_largest_value;
To distinguish the four cases, we use e.g. map_from_right_endpoint_to_left_endpoint.find(i - 1) (which returns an iterator pointing to the left-endpoint of the group that i-1 is the right-endpoint of, if applicable; otherwise it returns map_from_right_endpoint_to_left_endpoint.end()). We then delete entries as they become no-longer-applicable (due to groups being extended or merged in a given direction), in addition to (obviously) inserting new entries, and updating the values of existing entries.
In addition to those values, we also need an
int maximum_group_value = 0;
and whenever we extend a group or merge two groups, we check whether the value of the resulting group (meaning its largest_value * (right_endpoint - left_endpoint + 1) is greater than maximum_group_value. If so, we update maximum_group_value and return it; if not, we return maximum_group_value as-is.
Now, what if duplicates are allowed, such that a given index i might be "queried" after it already belongs to a group?
The simplest approach is to simply keep track of which i-s have already been queried; but a more elegant approach, if desired, might be to change map_from_left_endpoint_to_right_endpoint from a std::unordered_map to a std::map, and then use something like this:
bool is_already_in_a_group(
std::map<int, int> const & map_from_left_endpoint_to_right_endpoint,
int const i) {
// get iterator to first element *after* index (or to 'end()' if no such):
auto iter = map_from_left_endpoint_to_right_endpoint.upper_bound(index);
// if that pointer points to 'begin()', then there are no elements
// at or before index:
if (iter == map_from_left_endpoint_to_right_endpoint.begin()) {
return false;
}
// otherwise, move iterator to point to the last element whose key is
// less than or equal to index:
--iter;
// . . . and check whether the value of that element is greater than
// or equal to index (meaning that [key, value] spans index):
return iter->second >= index;
}
to check if the greatest key in map_from_left_endpoint_to_right_endpoint that is less than or equal to i is mapped to a value that is greater than or equal to i.
This adds a fifth case to our case analysis above — "if i is already inside a group, just do nothing and return maximum_group_value" — but other than that, has no effect.
Note that this same approach also lets us eliminate map_from_right_endpoint_to_left_endpoint, if we want: the above function could easily be tweaked to int get_left_endpoint_for_right_endpoint by changing its return statement to return iter->second == index ? iter->first : -1;.
At this point it becomes sensible to define a Group class with three fields (left_endpoint, right_endpoint, and largest_value), and just keep a single map_from_left_endpoint_to_group.
Lastly — what if negative values are allowed, such that the "maximum group value" can actually decrease as the result of a query? (For example, if the array elements are [-1, -10] and the queries are i=0, i=1, then the results are maximum_group_value=-1, maximum_group_value=-2.) In such a case, we need to keep track of the values of all current groups, because any one of them might suddenly become the maximum.
For that, instead of storing a single int maximum_group_value, we can maintain a heap of groups, ordered by value, that we push into every time we create/extend/merge groups. (We can just use a std::vector<Group> for this, plus std::push_heap with an appropriate comparator, or with an appropriate definition for operator<(Group const &, Group const &).) After each query, we check if the top group on the heap (the first element in the vector) is still a group that actually exists; if so, we return its value, otherwise we pop it (using std::pop_heap) and repeat.
As an optimization, we can also store int maximum_group_value, and eliminate the heap once we've encountered a nonnegative array-element (since as soon as a given group contains a nonnegative array-element, its value can never decrease again, and obviously the maximum group value will be the value of one of those groups).

Hash function required for custom data structure containing 12 integers

I have a custom structure that holds 12 integer values, x1,y1,x2,y2,x3,y3,x4,y4,x5,y5,x6,y6.
The range of the numbers is between 1 and 5 inclusive and every structure is guaranteed to have different combinations i.e NO two structures can have all the values of x1,y1,x2,y2,x3,y3,x4,y4,x5,y5,x6,y6 same as the respective values of other.
I need a good hash function to perform O(1) operations.
The requirement is to find out a structure with specific x1,y1....x6,y6 values
Right now I am using the following:-
struct Hash_6
{
size_t operator () ( const Node& n ) const
{
int result=17;
result=31*result+n.x1;
result=31*result+n.x2;
result=31*result+n.x3;
result=31*result+n.x4;
result=31*result+n.x5;
result=31*result+n.x6;
result=31*result+n.y1;
result=31*result+n.y2;
result=31*result+n.y3;
result=31*result+n.y4;
result=31*result+n.y5;
result=31*result+n.y6;
return result;
}
};
I want to know if there is any better more efficient hash function out there which I could use for this specific case.

If the values are always between one and five inclusive, then you can get a unique hash within a 32-bit value.
That's because five (the values) to the power of twelve (the number of variables) is 244,140,625, a value that can be represented in 28 bits.
Hence you hash function becomes (pseudo-code):
def hasher(s):
res = s.x1 - 1
for val in s.x2, s.x3, s.x4, s.x5, s.x6 s.y1, s.y2, s.y3, s.y4, s.y5, s.y6:
res = res * 5 + val - 1;
return res
With your constraints, you get a unique value out of that hash function.
If you wanted to use that hash for bucket selection (such as used in a set or dictionary), you would probably want to reduce it with a modulus to a more suitable value (introducing collisions as part of the process).
But it's unclear whether you're needing a hash for identification (leave as is) or bucketing (reduce it). If the latter, and values are reasonably evenly distributed, that would be along the lines of:
bucket_to_use = hasher(item) modulo num_buckets

Is there an efficient data structure for row and column swapping?

I have a matrix of numbers and I'd like to be able to:
Swap rows
Swap columns
If I were to use an array of pointers to rows, then I can easily switch between rows in O(1) but swapping a column is O(N) where N is the amount of rows.
I have a distinct feeling there isn't a win-win data structure that gives O(1) for both operations, though I'm not sure how to prove it. Or am I wrong?

Without having thought this entirely through:
I think your idea with the pointers to rows is the right start. Then, to be able to "swap" the column I'd just have another array with the size of number of columns and store in each field the index of the current physical position of the column.
m =
[0] -> 1 2 3
[1] -> 4 5 6
[2] -> 7 8 9
c[] {0,1,2}
Now to exchange column 1 and 2, you would just change c to {0,2,1}
When you then want to read row 1 you'd do
for (i=0; i < colcount; i++) {
print m[1][c[i]];
}

Just a random though here (no experience of how well this really works, and it's a late night without coffee):
What I'm thinking is for the internals of the matrix to be a hashtable as opposed to an array.
Every cell within the array has three pieces of information:
The row in which the cell resides
The column in which the cell resides
The value of the cell
In my mind, this is readily represented by the tuple ((i, j), v), where (i, j) denotes the position of the cell (i-th row, j-th column), and v
The would be a somewhat normal representation of a matrix. But let's astract the ideas here. Rather than i denoting the row as a position (i.e. 0 before 1 before 2 before 3 etc.), let's just consider i to be some sort of canonical identifier for it's corresponding row. Let's do the same for j. (While in the most general case, i and j could then be unrestricted, let's assume a simple case where they will remain within the ranges [0..M] and [0..N] for an M x N matrix, but don't denote the actual coordinates of a cell).
Now, we need a way to keep track of the identifier for a row, and the current index associated with the row. This clearly requires a key/value data structure, but since the number of indices is fixed (matrices don't usually grow/shrink), and only deals with integral indices, we can implement this as a fixed, one-dimensional array. For a matrix of M rows, we can have (in C):
int RowMap[M];
For the m-th row, RowMap[m] gives the identifier of the row in the current matrix.
We'll use the same thing for columns:
int ColumnMap[N];
where ColumnMap[n] is the identifier of the n-th column.
Now to get back to the hashtable I mentioned at the beginning:
Since we have complete information (the size of the matrix), we should be able to generate a perfect hashing function (without collision). Here's one possibility (for modestly-sized arrays):
int Hash(int row, int column)
{
return row * N + column;
}
If this is the hash function for the hashtable, we should get zero collisions for most sizes of arrays. This allows us to read/write data from the hashtable in O(1) time.
The cool part is interfacing the index of each row/column with the identifiers in the hashtable:
// row and column are given in the usual way, in the range [0..M] and [0..N]
// These parameters are really just used as handles to the internal row and
// column indices
int MatrixLookup(int row, int column)
{
// Get the canonical identifiers of the row and column, and hash them.
int canonicalRow = RowMap[row];
int canonicalColumn = ColumnMap[column];
int hashCode = Hash(canonicalRow, canonicalColumn);
return HashTableLookup(hashCode);
}
Now, since the interface to the matrix only uses these handles, and not the internal identifiers, a swap operation of either rows or columns corresponds to a simple change in the RowMap or ColumnMap array:
// This function simply swaps the values at
// RowMap[row1] and RowMap[row2]
void MatrixSwapRow(int row1, int row2)
{
int canonicalRow1 = RowMap[row1];
int canonicalRow2 = RowMap[row2];
RowMap[row1] = canonicalRow2
RowMap[row2] = canonicalRow1;
}
// This function simply swaps the values at
// ColumnMap[row1] and ColumnMap[row2]
void MatrixSwapColumn(int column1, int column2)
{
int canonicalColumn1 = ColumnMap[column1];
int canonicalColumn2 = ColumnMap[column2];
ColumnMap[row1] = canonicalColumn2
ColumnMap[row2] = canonicalColumn1;
}
So that should be it - a matrix with O(1) access and mutation, as well as O(1) row swapping and O(1) column swapping. Of course, even an O(1) hash access will be slower than the O(1) of array-based access, and more memory will be used, but at least there is equality between rows/columns.
I tried to be as agnostic as possible when it comes to exactly how you implement your matrix, so I wrote some C. If you'd prefer another language, I can change it (it would be best if you understood), but I think it's pretty self descriptive, though I can't ensure it's correctedness as far as C goes, since I'm actually a C++ guys trying to act like a C guy right now (and did I mention I don't have coffee?). Personally, writing in a full OO language would do it the entrie design more justice, and also give the code some beauty, but like I said, this was a quickly whipped up implementation.

Find unique common element from 3 arrays

Original Problem:
I have 3 boxes each containing 200 coins, given that there is only one person who has made calls from all of the three boxes and thus there is one coin in each box which has same fingerprints and rest of all coins have different fingerprints. You have to find the coin which contains same fingerprint from all of the 3 boxes. So that we can find the fingerprint of the person who has made call from all of the 3 boxes.
Converted problem:
You have 3 arrays containing 200 integers each. Given that there is one and only one common element in these 3 arrays. Find the common element.
Please consider solving this for other than trivial O(1) space and O(n^3) time.

Some improvement in Pelkonen's answer:
From converted problem in OP:
"Given that there is one and only one common element in these 3 arrays."
We need to sort only 2 arrays and find common element.

If you sort all the arrays first O(n log n) then it will be pretty easy to find the common element in less than O(n^3) time. You can for example use binary search after sorting them.

Let N = 200, k = 3,
Create a hash table H with capacity ≥ Nk.
For each element X in array 1, set H[X] to 1.
For each element Y in array 2, if Y is in H and H[Y] == 1, set H[Y] = 2.
For each element Z in array 3, if Z is in H and H[Z] == 2, return Z.
throw new InvalidDataGivenByInterviewerException();
O(Nk) time, O(Nk) space complexity.

Use a hash table for each integer and encode the entries such that you know which array it's coming from - then check for the slot which has entries from all 3 arrays. O(n)

Use a hashtable mapping objects to frequency counts. Iterate through all three lists, incrementing occurrence counts in the hashtable, until you encounter one with an occurrence count of 3. This is O(n), since no sorting is required. Example in Python:
def find_duplicates(*lists):
num_lists = len(lists)
counts = {}
for l in lists:
for i in l:
counts[i] = counts.get(i, 0) + 1
if counts[i] == num_lists:
return i
Or an equivalent, using sets:
def find_duplicates(*lists):
intersection = set(lists[0])
for l in lists[1:]:
intersection = intersection.intersect(set(l))
return intersection.pop()

O(N) solution: use a hash table. H[i] = list of all integers in the three arrays that map to i.
For all H[i] > 1 check if three of its values are the same. If yes, you have your solution. You can do this check with the naive solution even, it should still be very fast, or you can sort those H[i] and then it becomes trivial.
If your numbers are relatively small, you can use H[i] = k if i appears k times in the three arrays, then the solution is the i for which H[i] = 3. If your numbers are huge, use a hash table though.
You can extend this to work even if you can have elements that can be common to only two arrays and also if you can have elements repeating elements in one of the arrays. It just becomes a bit more complicated, but you should be able to figure it out on your own.

If you want the fastest* answer:
Sort one array--time is N log N.
For each element in the second array, search the first. If you find it, add 1 to a companion array; otherwise add 0--time is N log N, using N space.
For each non-zero count, copy the corresponding entry into the temporary array, compacting it so it's still sorted--time is N.
For each element in the third array, search the temporary array; when you find a hit, stop. Time is less than N log N.
Here's code in Scala that illustrates this:
import java.util.Arrays
val a = Array(1,5,2,3,14,1,7)
val b = Array(3,9,14,4,2,2,4)
val c = Array(1,9,11,6,8,3,1)
Arrays.sort(a)
val count = new Array[Int](a.length)
for (i <- 0 until b.length) {
val j =Arrays.binarySearch(a,b(i))
if (j >= 0) count(j) += 1
}
var n = 0
for (i <- 0 until count.length) if (count(i)>0) { count(n) = a(i); n+= 1 }
for (i <- 0 until c.length) {
if (Arrays.binarySearch(count,0,n,c(i))>=0) println(c(i))
}
With slightly more complexity, you can either use no extra space at the cost of being even more destructive of your original arrays, or you can avoid touching your original arrays at all at the cost of another N space.
Edit: * as the comments have pointed out, hash tables are faster for non-perverse inputs. This is "fastest worst case". The worst case may not be so unlikely unless you use a really good hashing algorithm, which may well eat up more time than your sort. For example, if you multiply all your values by 2^16, the trivial hashing (i.e. just use the bitmasked integer as an index) will collide every time on lists shorter than 64k....

//Begineers Code using Binary Search that's pretty Easy
// bool BS(int arr[],int low,int high,int target)
// {
// if(low>high)
// return false;
// int mid=low+(high-low)/2;
// if(target==arr[mid])
// return 1;
// else if(target<arr[mid])
// BS(arr,low,mid-1,target);
// else
// BS(arr,mid+1,high,target);
// }
// vector <int> commonElements (int A[], int B[], int C[], int n1, int n2, int n3)
// {
// vector<int> ans;
// for(int i=0;i<n2;i++)
// {
// if(i>0)
// {
// if(B[i-1]==B[i])
// continue;
// }
// //The above if block is to remove duplicates
// //In the below code we are searching an element form array B in both the arrays A and B;
// if(BS(A,0,n1-1,B[i]) && BS(C,0,n3-1,B[i]))
// {
// ans.push_back(B[i]);
// }
// }
// return ans;
// }

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Neat way of computing functions on key-value pairs - data-structures

Related

Which data structure supports given operations efficiently

Grouping numbers in a list

Hash function required for custom data structure containing 12 integers

Is there an efficient data structure for row and column swapping?

Find unique common element from 3 arrays

Categories

Resources