Unique representation of 2 or more arrays - algorithm

I have got several arrays of fixed length where each component can take on natural number values. In my program 2 vectors are identical in this simple case
0001112
1110002
2220001 would also be identical to these 2 arrays
My question is how can I get a unique representation for these two arrays?
Cheers

It's not entirely clear how your equivalence relation is defined, but building a set representation out of the arrays satisfies the constraint you've given. There are two ways of doing this:
Convert to an appropriate data structure (sets are built-in in many languages, otherwise a hash table or BST will do).
Sort each array, remove the duplicate elements and truncate them. Since they're fixed-length, you'll have to store the number of distinct elements somewhere, or use -1 to signal "end of elements".

One way is to store them in a dictionary (hash table) mapping each number to the number of times it appears. Your two arrays would have the same representation:
{0: 3, 1: 3, 2: 1}

public static List<int> GetUniqueRepresentation(int[] array)
{
int count = 1;
var output = new List<int>();
for (int i = 1; i <= array.Length; i++)
{
if (i < array.Length && array[i] == array[i - 1])
{
count++;
}
else
{
output.Add(count);
count = 1;
}
}
return output;
}

Related

find the X most frequent number of an N size array

I'm practicing for a coding interview and I found some question the company usally made to new juniors like me in a website and I would like to know if exist a better solution to this one (it's a pseudocode):
"Given an array of N size of integers, find the most X frequents numbers (data array may contain duplicates)"
V[N] = {...} //Data Array
C1[N] = {...} //Count Array (Store the V[k] number)
C2[N] = {...} //Count Array (Store the V[k] number frequency)
M[X] = {...} //Most Frequent Array
lastFreePositionInC = 0;
//iterate over the data array to count all ocurrence of V[k]
for i=0 to N
indexOfViInC1 = checkIfViExistInC1(V[i],C1); //This iterates over C1 to find V[i]
if indexOfViInC1 != -1
C2[indexOfViInC1]++;
else //Couldn't find the number, must be added to C
C1[lastFreePositionInC] = V[i]
C2[lastFreePositionInC++] = 1
findXMostFrequent(M,X,C1,C2); //You can sort the C array so this is just a merge sort
And yes, it's "ilegal" to sort the data array to solve the challenge.

Find element at indexs after n number of rotations

This questions is from hackerrank.
https://www.hackerrank.com/challenges/circular-array-rotation/problem
Input:
an array(int[]) of n numbers
number(k) of rotations 1 >= k <= n
an array(int[]) of m size of the indexes of rotated array
Output:
an array(int[]) of m size with elements of rotated array
I have solved it with the approach of rotation which will change the position of array elements.
I was trying to solved it without changing the original array(by manipulating rotated array index mathematically). here is the method I wrote, it is working for use cases I used but on hackerrank it is showing 1 test failed.
private int[] withoutRotateArray(int[] input, int k, int[] checkIndex) {
for (int i = 0; i < checkIndex.length; i++) {
if(checkIndex[i] < k) {
checkIndex[i] = input[input.length - (k - checkIndex[i])];
} else {
checkIndex[i] = input[Math.abs(k - (checkIndex[i]))];
}
}
return checkIndex;
}
Can someone help me to understand what is wrong in my method?
Example:
I/P array[1,2,3,4,5]
rotation number: 2
Check Index: [2,4]
O/P array[1,3]
On HackerRank you should take care not to mutate any of the objects/arrays you get as arguments (unless told to do so in the challenge): often the testing code relies on the assumption that your code does not do that.
So instead of storing the result in checkIndex, create a new array for it, and return that.

Finding triplicates in 4 lists

I'm trying to find, given 4 arrays of N strings, a string that is common to at least 3 of the arrays in O(N*log(N)) time, and if it exists return the lexicographically first string.
What I tried was creating an array of size 4*N and adding items from the 4 arrays to it while removing the duplicates. Then I did a Quick sort on the big array to find the first eventual triplicate.
Does anyone know a better solution?
You can do this in O(n log n), with constant extra space. It's a standard k-way merge problem, after sorting the individual lists. If the individual lists can contain duplicates, then you'll need to remove the duplicates during the sorting.
So, assuming you have list1, list2, list3, and list4:
Sort the individual lists, removing duplicates
Create a priority queue (min-heap) of length 4
Add the first item from each list to the heap
last-key = ""
last-key-count = 0
while not done
remove the smallest item from the min-heap
add to the heap the next item from the list that contained the item you just removed.
if the item matches last-key
increment last-key-count
if last-key-count == 3 then
output last-key
exit done
else
last-key-count = 1
last-key = item key
end while
// if you get here, there was no triplicate item
An alternate way to do this is to combine all the lists into a single list, then sort it. You can then go through it sequentially to find the first triplicate. Again, if the individual lists can contain duplicates, you should remove them before you combine the lists.
combined = list1.concat(list2.concat(list3.concat(list4)))
last-key = ""
last-key-count = 0
for i = 0 to combined.length-1
if combined[i] == last-key
last-key-count++
if last-key-count == 3
exit done
else
last-key = combined[i]
last-key-count = 1
end for
// if you get here, no triplicate was found
Here we have 4 arrays of N strings, where N = 5. My approach to get all triplicates is:
Get the 1st string of the 1st array and add it in a Map< String, Set< Integer > > with the array number in the Set (I'm using a Hash because insertion and search are O(1));
Get the 1st string of the 2nd array and add it in a Map< String, Set< Integer > > with the array number in the Set;
Repeat step 2, but using 3rd and 4th arrays instead of 2nd;
Repeat steps 1, 2 and 3 but using the 2nd string instead of 1st;
Repeat steps 1, 2 and 3 but using the 3nd string instead of 1st;
Etc.
In the worst case, we will have N*4 comparisons, O(N*log(N)).
public class Main {
public static void main(String[] args) {
String[][] arr = {
{ "xxx", "xxx", "xxx", "zzz", "aaa" },
{ "ttt", "bbb", "ddd", "iii", "aaa" },
{ "sss", "kkk", "uuu", "rrr", "zzz" },
{ "iii", "zzz", "lll", "hhh", "aaa" }};
List<String> triplicates = findTriplicates(arr);
Collections.sort(triplicates);
for (String word : triplicates)
System.out.println(word);
}
public static List<String> findTriplicates(String[][] arr) {
Map<String, Set<Integer>> map = new HashMap<String, Set<Integer>>();
List<String> triplicates = new ArrayList<String>();
final int N = 5;
for (int i = 0; i < N; i++) {
for (int j = 0; j < 4; j++) {
String str = arr[j][i];
if (map.containsKey(str)) {
map.get(str).add(j);
if (map.get(str).size() == 3)
triplicates.add(str);
} else {
Set<Integer> set = new HashSet<Integer>();
set.add(j);
map.put(str, set);
}
}
}
return triplicates;
}
}
Output:
aaa
zzz
Ok, if you don't care about the constant factors this can be done in O(N) where N is the size of strings. It is important to distinguish number of strings vs their total size for practical purposes. (At the end I propose an alternative version which is O(N log N) where N is number of string comparisons.
You need one map string -> int for count, and one temporary already_counted map string -> bool. The latter one is basically a set. Important thing is to use unordered/hash versions of the associative containers, to avoid log factors.
For each array, for each element, you check whether the current element is in already_counted set. If not, do count[current_string] ++. Before going over to the next array empty the already_counted set.
Now you basically need a min search. Go over each element of count and if an element has value 3 or more, then compare the key associated with it, to your current min. VoilĂ . min is the lowest string with 3 or more occurences.
You don't need the N log N factor, because you do not need all the triplets, so no sorting or ordered data structures are needed. You have O(3*N) (again N is the total size of all string). This is an over estimation, later I give more detailed estimation.
Now, the caveat is that this method is based on string hashing, which is O(S), where S is the size of string. Twice, to deal with per-array repetitions. So, alternatively, might be faster, at least in c++ implementation, to actually use ordered versions of the containers. There are two reasons for this:
Comparing strings might be faster then hashing them. If the strings are different, then you will get a result of a comparison relatively fast, whereas with hashing you always go over whole string, and hashing quite more complicated.
They are contiguous in memory - cache friendly.
Hashing also has a problem with rehashing, etc. etc.
If the number of strings is not large, or if their size is very big, I would place my bet on the ordered versions. Also, if you have ordered count you get an edge in finding the least element because it's the 1st with count > 3, though in worst case you will get tons of a* with count 1 and z with 3.
So, to sum all of it up, if we call n the number of string comparisons, and N the number of string hashes.
Hash-based method is O(2 N + n) and with some trickery you can bring down constant factor by 1, e.g. reusing hash for count and the already_checked.\, or combining both data structures for example via bitset. So you would get O(N + n).
Pure string comparison based method would be O(2 n log n + n). Maybe somehow it would be possible to easily use hinting to drop the constant, but I am not sure.
It can be solved in O(N) using Trie.
You loop 4 lists one by one, for each list you insert the strings into the Trie.
When you inserting a string s of list L, increase the counter only if there is string s in previous lists. Update the answer if the counter >= 3 and is lexicographically smaller than the current answer.
Here is a sample C++ code, you can input 4 list of string, each contains 5 string to test it.
http://ideone.com/fTmKgJ
#include<bits/stdc++.h>
using namespace std;
vector<vector<string>> lists;
string ans = "";
struct TrieNode
{
TrieNode* l[128];
int n;
TrieNode()
{
memset(l, 0, sizeof(TrieNode*) * 128);
n = 0;
}
} *root = new TrieNode();
void add(string s, int listID)
{
TrieNode* p = root;
for (auto x: s)
{
if (!p->l[x]) p->l[x] = new TrieNode();
p = p->l[x];
}
p->n |= (1<<listID);
if(__builtin_popcount(p->n) >= 3 && (ans == "" || s < ans)) ans = s;
}
int main() {
for(int i=0; i<4;i++){
string s;
vector<string> v;
for(int i=0; i<5; i++){
cin >> s;
v.push_back(s);
}
lists.push_back(v);
}
for(int i=0; i<4;i++){
for(auto s: lists[i]){
add(s, i);
}
}
if(ans == "") cout << "NO ANSWER" << endl;
else cout << ans << endl;
return 0;
}

How to count unique items in a list?

How would someone go on counting the number of unique items in a list?
For example say I have {1, 3, 3, 4, 1, 3} and I want to get the number 3 which represent the number of unique items in the list(namely |A|=3 if A={1, 3, 4}). What algorithm would someone use for this?
I have tryied a double loop:
for firstItem to lastItem
currentItem=a
for currentItem to lastItem
currentItem=b
if a==b then numberOfDublicates++
uniqueItems=numberOfItems-numberOfDublicates
That doesn't work as it counts the duplicates more times than actually needed. With the example in the beginning it would be:
For the first loop it would count +1 duplicates for number 1 in the list.
For the second loop it would count +2 duplicates for number 3 in the list.
For the third loop it would count +1 duplicates for number 3 again(overcounting the last '3') and
there's where the problem comes in.
Any idea on how to solve this?
Add the items to a HashSet, then check the HashSet's size after you finish.
Assuming that you have a good hash function, this is O(n).
You can check to see if there are any duplicates following the number. If not increment the uniqueCount:
uniqueCount = 0;
for (i=0;i<size;i++) {
bool isUnique = true;
for (j=i+1;j<size;j++)
if (arr[i] == arr[j] {
isUnique = false;
break;
}
}
if(isUnique) {
uniqueCount ++;
}
}
The above approach is O(N^2) in time and O(1) in space.
Another approach would be to sort the input array which will put duplicate elements next to each other and then look for adjacent array elements. This approach is O(NlgN) in time and O(1) in space.
If you are allowed to use additional space you can get this done in O(N) time and O(N) space by using a hash. The keys for the hash are the array elements and the values are their frequencies.
At the end of hashing you can get the count of only those hash keys which have value of 1.
Sort it using a decent sorting algorithm like mergesort or heapsort (both habe O(n log n) as worst-case) and loop over the sorted list:
sorted_list = sort(list)
unique_count = 0
last = sorted_list[0]
for item in sorted_list[1:]:
if not item == last:
unique_count += 1
last = item
list.sort();
for (i = 0; i < list.size() - 1; i++)
if (list.get(i)==list.get(i+1)
duplicates++;
Keep Dictionary and add count in loop
This is how it will look at c#
int[] items = {1, 3, 3, 4, 1, 3};
Dictionary<int,int> dic = new Dictionary<int,int>();
foreach(int item in items)
dic[item]++
Of course there is LINQ way in C#, but as I understand question is general ;)

How can I efficiently determine if two lists contain elements ordered in the same way?

I have two ordered lists of the same element type, each list having at most one element of each value (say ints and unique numbers), but otherwise with no restrictions (one may be a subset of the other, they may be completely disjunct, or share some elements but not others).
How do I efficiently determine if A is ordering any two items in a different way than B is? For example, if A has the items 1, 2, 10 and B the items 2, 10, 1, the property would not hold as A lists 1 before 10 but B lists it after 10. 1, 2, 10 vs 2, 10, 5 would be perfectly valid however as A never mentions 5 at all, I cannot rely on any given sorting rule shared by both lists.
You can get O(n) as follows. First, find the intersection of the two sets using hashing. Second, test whether A and B are identical if you only consider elements from the intersection.
My approach would be to first make sorted copies of A and B which also record the positions of elements in the original lists:
for i in 1 .. length(A):
Apos[i] = (A, i)
sortedApos = sort(Apos[] by first element of each pair)
for i in 1 .. length(B):
Bpos[i] = (B, i)
sortedBpos = sort(Bpos[] by first element of each pair)
Now find those elements in common using a standard list merge that records the positions in both A and B of the shared elements:
i = 1
j = 1
shared = []
while i <= length(A) && j <= length(B)
if sortedApos[i][1] < sortedBpos[j][1]
++i
else if sortedApos[i][1] > sortedBpos[j][1]
++j
else // They're equal
append(shared, (sortedApos[i][2], sortedBpos[j][2]))
++i
++j
Finally, sort shared by its first element (position in A) and check that all its second elements (positions in B) are increasing. This will be the case iff the elements common to A and B appear in the same order:
sortedShared = sort(shared[] by first element of each pair)
for i = 2 .. length(sortedShared)
if sortedShared[i][2] < sortedShared[i-1][2]
return DIFFERENT
return SAME
Time complexity: 2*(O(n) + O(nlog n)) + O(n) + O(nlog n) + O(n) = O(nlog n).
General approach: store all the values and their positions in B as keys and values in a HashMap. Iterate over the values in A and look them up in B's HashMap to get their position in B (or null). If this position is before the largest position value you've seen previously, then you know that something in B is in a different order than A. Runs in O(n) time.
Rough, totally untested code:
boolean valuesInSameOrder(int[] A, int[] B)
{
Map<Integer, Integer> bMap = new HashMap<Integer, Integer>();
for (int i = 0; i < B.length; i++)
{
bMap.put(B[i], i);
}
int maxPosInB = 0;
for (int i = 0; i < A.length; i++)
{
if(bMap.containsKey(A[i]))
{
int currPosInB = bMap.get(A[i]);
if (currPosInB < maxPosInB)
{
// B has something in a different order than A
return false;
}
else
{
maxPosInB = currPosInB;
}
}
}
// All of B's values are in the same order as A
return true;
}

Resources