How to quick search in huge range-value pairs?

How to quick search in huge range-value pairs? - algorithm

For example we have some range-value pairs:
<<1, 2>, 65>
<<3, 37>, 75>
<<45, 159>, 12>
<<160, 200>, 23>
<<210, 255>, 121>
And these ranges are disjoint.
Give a integer 78, and the corresponding range is <45, 159>, so output the value 12.
As the range maybe very large, I now use map to storage these range-value pairs.
Evevy searching will scan the entire map set, it's O(n).
Is there any good ways except binary search?

Binary search is definitely your best option. O(log n) is hard to beat!
If you don't want to hand-code the binary search, and you're using C++, you can use the std::map::upper_bound function, which implements binary search:
std::map<std::pair<int, int>, int> my_map;
int val = 78; // value to search for
auto it = my_map.upper_bound(std::make_pair(val, val));
if(it != my_map.begin()) it--;
if(it->first.first <= val && it->first.second >= val) {
/* Found it, value is it->second */
} else {
/* Didn't find it */
}
std::map::upper_bound is guaranteed to run in logarithmic time, because the map is inherently sorted (and usually implemented as a red-black tree). It returns the element after the element you're searching for, and hence the iterator that it returns is decremented to obtain a useful iterator.

How about a Binary search tree?
Though you'll probably need a self balancing search tree: AVL Trees and Red black trees are a good start, but you probably should stick with something that you already have in your "Ecosystem" for .Net that would be a SortedDictionary,probably with a costum IComparer but that's just an example...
For C++ you should use an ordered map, with a custom compare here
Links:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
http://msdn.microsoft.com/en-us/library/f7fta44c%28v=vs.110%29.aspx
http://msdn.microsoft.com/en-us/library/8ehhxeaf(v=vs.110).aspx
http://www.cplusplus.com/reference/map/map/
Again for C++, I would do something like (without compiler, so careful) :
class Comparer
{
bool operator()(pair<int,int> const & one, pair<int,int> const & two)
{
if (a.second < b.first)
{
return true;
}
else
{
return false;
}
}
};
then you define the map as:
map<pair<int,int>,int,Comparer> myMap;
myMap.insert(make_pair(make_pair(-3,9),1));
int num = myMap[make_pair(7,7)]

Related

How do I use union find data structure to group Strings?

I have been using Union-Find (Disjoint set) for a lot of graph problems and know how this works. But I have almost always used this data structure with integers or numbers. While solving this leetcode problem I need to group strings and I am thinking of using Union-Find for this. But I do not know how to use this with strings. Looking for suggestions.

TLDR: Use the same union find code you would for an integer/number, but use a hash map instead of an array to store the parent of each element in the union find. This approach generalizes to any data type that can be stored in hash map, not just strings, i.e. in the code below the two unordered maps could have something other than strings or ints as keys.
class UnionFind {
public:
string find(string s) {
string stringForPathCompression = s;
while(parent[s] != s) s = parent[s];
// The following while loop implements what is known as path compression, which reduces the time complexity.
while(stringForPathCompression != s) {
string temp = parent[stringForPathCompression];
parent[stringForPathCompression] = s;
stringForPathCompression = temp;
}
return s;
}
void unify(string s1, string s2) {
string rootS1 = find(s1), rootS2 = find(s2);
if(rootS1 == rootS2) return;
// we unify the smaller component to the bigger component, thus preserving the bigger component.
// this is known as union by size, and reduces the time complexity
if(sz[rootS1] < sz[rootS2]) parent[rootS1] = rootS2, sz[rootS2] += sz[rootS1];
else parent[rootS2] = rootS1, sz[rootS1] += sz[rootS2];
}
private:
// If we were storing numbers in our union find, both of the hash maps below could be arrays
unordered_map<string, int> sz; // aka size.
unordered_map<string, string> parent;
};

Union Find doesn't really care what kind of data is in the objects. You can decide what strings to union in your main code, and then union find their representative values.

How to calculate Hash value of a Tree

What is the best way to calculate the hash value of a Tree?
I need to compare the similarity between several trees in O(1). Now, I want to precalculate the hash values and compare them when needed. But then I realized, hashing a tree is different than hashing a sequence. I wasn't able to come up with a good hash function.
What is the best way to calculate hash value of a tree?
Note : I will implement the function in c/c++

Well hasing a tree means representing it in a unique way so that we can differ other trees from this tree using a simple representation or number. On normal polynomial hash we use number base conversion, we convert a string or a sequence in a specific prime base and use a mod value which is also a large prime. Now using this same technique we can hash a tree.
Now fix the root of the tree at any vertex. Let root = 1 and,
B = The base in which we want to convert.
P[i] = i th power of B (B^i).
level[i] = Depth of the ith vertex where (distance from the root).
child[i] = Total number of the vertex in the subtree of ith vertex including i.
degree[i] = Number of adjacent node of vertex i.
Now the contribution of the ith vertex in the hash value is -
hash[i] = ( (P[level[i]]+degree[i]) * child[i] ) % modVal
And the hash value of the entire tree is the summation of the all vertices hash value-
(hash[1]+hash[2]+....+hash[n]) % modVal

If we use this definition of tree equivalence:
T1 is equivalent to T2 iff
all paths to leaves of T1 exist exactly once in T2, and
all paths to leaves of T2 exist exactly once in T2
Hashing a sequence (a path) is straightforward. If h_tree(T) is a hash of all paths-to-leafs of T, where the order of the paths does not alter the outcome, then it is a good hash for the whole of T, in the sense that equivalent trees will produce equal hashes, according to the above definition of equivalence. So I propose:
h_path(path) = an order-dependent hash of all elements in the path.
Requires O(|path|) time to calculate,
but child nodes can reuse the calculation of their
parent node's h_path in their own calculations.
h_tree(T) = an order-independent hashing of all its paths-to-leaves.
Can be calculated in O(|L|), where L is the number of leaves
In pseudo-c++:
struct node {
int path_hash; // path-to-root hash; only use for building tree_hash
int tree_hash; // takes children into account; use to compare trees
int content;
vector<node> children;
int update_hash(int parent_path_hash = 1) {
path_hash = parent_path_hash * PRIME1 + content; // order-dependent
tree_hash = path_hash;
for (node n : children) {
tree_hash += n.update_hash(path_hash) * PRIME2; // order-independent
}
return tree_hash;
}
};
After building two trees, update their hashes and compare away. Equivalent trees should have the same hash, different trees not so much. Note that the path and tree hashes that I am using are rather simplistic, and chosen rather for ease of programming than for great collision resistance...

Child hashes should be successively multiplied by a prime number & added. Hash of the node itself should be multiplied by a different prime number & added.
Cache the hash of the tree overall -- I prefer to cache it outside the AST node, if I have a wrapper object holding the AST.
public class RequirementsExpr {
protected RequirementsAST ast;
protected int hash = -1;
public int hashCode() {
if (hash == -1)
this.hash = ast.hashCode();
return hash;
}
}
public class RequirementsAST {
protected int nodeType;
protected Object data;
// -
protected RequirementsAST down;
protected RequirementsAST across;
public int hashCode() {
int nodeHash = nodeType;
nodeHash = (nodeHash * 17) + (data != null ? data.hashCode() : 0);
nodeHash *= 23; // prime A.
int childrenHash = 0;
for (RequirementsAST child = down; child != null; child = child.getAcross()) {
childrenHash *= 41; // prime B.
childrenHash += child.hashCode();
}
int result = nodeHash + childrenHash;
return result;
}
}
The result of this, is that child/descendant nodes in different positions are always multiplied in by different factors; and the node itself is always multiplied in by a different factor from any possible child/descendant nodes.
Note that other primes should also be used in building the nodeHash of the node data, itself. This helps avoid eg. different values of nodeType colliding with different values of data.
Within the limits of 32-bit hashing, this scheme overall gives a very high chance of uniqueness for any differences in tree-structure (eg, transposing two siblings) or value.
Once calculated (over the entire AST) the hashes are highly efficient.

I would recommend converting the tree to a canonical sequence and hashing the sequence. (The details of the conversion depend on your definition of equivalence. For example, if the trees are binary search trees and the equivalence relation is structural, then the conversion could be to enumerate the tree in preorder, as the structure of binary search trees can be recovered from the preorder enumeration.)
Thomas's answer boils down at first glance to associating a multivariable polynomial with each tree and evaluating the polynomial at a particular location. There are two steps that, at the moment, have to be assumed on faith; the first is that the map doesn't send inequivalent trees to the same polynomial, and the second is that the evaluation scheme doesn't introduce too many collisions. I can't evaluate the first step presently, though there are reasonable definitions of equivalence that permit reconstruction from a two-variable polynomial. The second is not theoretically sound but could be made so via Schwartz--Zippel.

trie or balanced binary search tree to store dictionary?

I have a simple requirement (perhaps hypothetical):
I want to store english word dictionary (n words) and given a word (character length m), the dictionary is able to tell, if the word exists in dictionary or not.
What would be an appropriate data structure for this?
a balanced binary search tree? as done in C++ STL associative data structures like set,map
or
a trie on strings
Some complexity analysis:
in a balanced bst, time would be (log n)*m (comparing 2 string takes O(m) time character by character)
in trie, if at each node, we could branch out in O(1) time, we can find using O(m), but the assumption that at each node, we can branch in O(1) time is not valid. at each node, max possible branches would be 26. if we want O(1) at a node, we will keep a short array indexible on characters at each node. This will blow-up the space. After a few levels in the trie, branching will reduce, so its better to keep a linked list of next node characters and pointers.
what looks more practical? any other trade-offs?
Thanks,

I'd say use a Trie, or better yet use its more space efficient cousin the Directed Acyclic Word Graph (DAWG).
It has the same runtime characteristics (insert, look up, delete) as a Trie but overlaps common suffixes as well as common prefixes which can be a big saving on space.

If this is C++, you should also consider std::tr1::unordered_set. (If you have C++0x, you can use std::unordered_set.)
This just uses a hash table internally, which I would wager will out-perform any tree-like structure in practice. It is also trivial to implement because you have nothing to implement.

Binary search is going to be easier to implement and it's only going to involve comparing tens of strings at the most. Given you know the data up front, you can build a balanced binary tree so performance is going to be predictable and easily understood.
With that in mind, I'd use a standard binary tree (probably using set from C++ since that's typically implemented as a tree).

A simple solution is to store the dict as sorted, \n-separated words on disk, load it into memory and do a binary search. The only non-standard part here is that you have to scan backwards for the start of a word when you're doing the binary search.
Here's some code! (It assumes globals wordlist pointing to the loaded dict, and wordlist_end which points to just after the end of the loaded dict.
// Return >0 if word > word at position p.
// Return <0 if word < word at position p.
// Return 0 if word == word at position p.
static int cmp_word_at_index(size_t p, const char *word) {
while (p > 0 && wordlist[p - 1] != '\n') {
p--;
}
while (1) {
if (wordlist[p] == '\n') {
if (*word == '\0') return 0;
else return 1;
}
if (*word == '\0') {
return -1;
}
int char0 = toupper(*word);
int char1 = toupper(wordlist[p]);
if (char0 != char1) {
return (int)char0 - (int)char1;
}
++p;
++word;
}
}
// Test if a word is in the dictionary.
int is_word(const char* word_to_find) {
size_t index_min = 0;
size_t index_max = wordlist_end - wordlist;
while (index_min < index_max - 1) {
size_t index = (index_min + index_max) / 2;
int c = cmp_word_at_index(index, word_to_find);
if (c == 0) return 1; // Found word.
if (c < 0) {
index_max = index;
} else {
index_min = index;
}
}
return 0;
}
A huge benefit of this approach is that the dict is stored in a human-readable way on disk, and that you don't need any fancy code to load it (allocate a block of memory and read() it in in one go).
If you want to use a trie, you could use a packed and suffix-compressed representation. Here's a link to one of Donald Knuth's students, Franklin Liang, who wrote about this trick in his thesis.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.123.7018&rep=rep1&type=pdf
It uses half the storage of the straightforward textual dict representation, gives you the speed of a trie, and you can (like the textual dict representation) store the whole thing on disk and load it in one go.
The trick it uses is to pack all the trie nodes into a single array, interleaving them where possible. As well as a new pointer (and an end-of-word marker bit) in each array location like in a regular trie, you store the letter that this node is for -- this lets you tell if the node is valid for your state or if it's from an overlapping node. Read the linked doc for a fuller and clearer explanation, as well as an algorithm for packing the trie into this array.
It's not trivial to implement the suffix-compression and greedy packing algorithm described, but it's easy enough.

Industry standard is to store the dictionary in a hashtable and have an amortized O(1) lookup time. Space is no more critical in industry especially due to the advancement in distributive computing.
hashtable is how google implement its autocomplete feature. Specifically have every prefix of a word as a key and put the word as the value in the hashtable.

Efficient algorithm to remove any map that is contained in another map from a collection of maps

I have set (s) of unique maps (Java HashMaps currently) and wish to remove from it any maps that are completely contained by some other map in the set (i.e. remove m from s if m.entrySet() is a subset of n.entrySet() for some other n in s.)
I have an n^2 algorithm, but it's too slow. Is there a more efficient way to do this?
Edit:
the set of possible keys is small, if that helps.
Here is an inefficient reference implementation:
public void removeSubmaps(Set<Map> s) {
Set<Map> toRemove = new HashSet<Map>();
for (Map a: s) {
for (Map b : s) {
if (a.entrySet().containsAll(b.entrySet()))
toRemove.add(b);
}
}
s.removeAll(toRemove);
}

Not sure I can make this anything other than an n^2 algorithm, but I have a shortcut that might make it faster. Make a list of your maps with the length of the each map and sort it. A proper subset of a map must be shorter or equal to the map you're comparing - there's never any need to compare to a map higher on the list.

Here's another stab at it.
Decompose all your maps into a list of key,value,map number. Sort the list by key and value. Go through the list, and for each group of key/value matches, create a permutation of all the map number pairs - these are all potential subsets. When you have the final list of pairs, sort by map numbers. Go through this second list, and count the number of occurrences of each pair - if the number matches the size of one of the maps, you've found a subset.

Edit: My original interpretation of the problem was incorrect, here is new answer based on my re-read of the question.
You can create a custom hash function for HashMap which returns the product of all hash value of its entries. Sort the list of hash value and start loop from biggest value and find all divisor from smaller hash values, these are possible subsets of this hashmap, use set.containsAll() to confirm before marking them for removal.
This effectively transforms the problem into a mathematical problem of finding possible divisor from a collection. And you can apply all the common divisor-search optimizations.
Complexity is O(n^2), but if many hashmaps are subsets of others, the actual time spent can be a lot better, approaching O(n) in best-case scenario (if all hashmaps are subset of one). But even in worst case scenario, division calculation would be a lot faster than set.containsAll() which itself is O(n^2) where n is number of items in a hashmap.
You might also want to create a simple hash function for hashmap entry objects to return smaller numbers to increase multiply/division performance.

Here's a subquadratic (O(N**2 / log N)) algorithm for finding maximal sets from a set of sets: An Old Sub-Quadratic Algorithm for Finding Extremal Sets.
But if you know your data distribution, you can do much better in average case.

This what I ended up doing. It works well in my situation as there is usually some value that is only shared by a small number of maps. Kudos to Mark Ransom for pushing me in this direction.
In prose: Index the maps by key/value pair, so that each key/value pair is associated with a set of maps. Then, for each map: Find the smallest set associated with one of it's key/value pairs; this set is typically small for my data. Each of the maps in this set is a potential 'supermap'; no other map could be a 'supermap' as it would not contain this key/value pair. Search this set for a supermap. Finally remove all the identified submaps from the original set.
private <K, V> void removeSubmaps(Set<Map<K, V>> maps) {
// index the maps by key/value
List<Map<K, V>> mapList = toList(maps);
Map<K, Map<V, List<Integer>>> values = LazyMap.create(HashMap.class, ArrayList.class);
for (int i = 0, uniqueRowsSize = mapList.size(); i < uniqueRowsSize; i++) {
Map<K, V> row = mapList.get(i);
Integer idx = i;
for (Map.Entry<K, V> entry : row.entrySet())
values.get(entry.getKey()).get(entry.getValue()).add(idx);
}
// find submaps
Set<Map<K, V>> toRemove = Sets.newHashSet();
for (Map<K, V> submap : mapList) {
// find the smallest set of maps with a matching key/value
List<Integer> smallestList = null;
for (Map.Entry<K, V> entry : submap.entrySet()) {
List<Integer> list = values.get(entry.getKey()).get(entry.getValue());
if (smallestList == null || list.size() < smallestList.size())
smallestList = list;
}
// compare with each of the maps in that set
for (int i : smallestList) {
Map<K, V> map = mapList.get(i);
if (isSubmap(submap, map))
toRemove.add(submap);
}
}
maps.removeAll(toRemove);
}
private <K,V> boolean isSubmap(Map<K, V> submap, Map<K,V> map){
if (submap.size() >= map.size())
return false;
for (Map.Entry<K,V> entry : submap.entrySet()) {
V other = map.get(entry.getKey());
if (other == null)
return false;
if (!other.equals(entry.getValue()))
return false;
}
return true;
}

Algorithm to tell if two arrays have identical members

What's the best algorithm for comparing two arrays to see if they have the same members?
Assume there are no duplicates, the members can be in any order, and that neither is sorted.
compare(
[a, b, c, d],
[b, a, d, c]
) ==> true
compare(
[a, b, e],
[a, b, c]
) ==> false
compare(
[a, b, c],
[a, b]
) ==> false

Obvious answers would be:
Sort both lists, then check each
element to see if they're identical
Add the items from one array to a
hashtable, then iterate through the
other array, checking that each item
is in the hash
nickf's iterative search algorithm
Which one you'd use would depend on whether you can sort the lists first, and whether you have a good hash algorithm handy.

You could load one into a hash table, keeping track of how many elements it has. Then, loop over the second one checking to see if every one of its elements is in the hash table, and counting how many elements it has. If every element in the second array is in the hash table, and the two lengths match, they are the same, otherwise they are not. This should be O(N).
To make this work in the presence of duplicates, track how many of each element has been seen. Increment while looping over the first array, and decrement while looping over the second array. During the loop over the second array, if you can't find something in the hash table, or if the counter is already at zero, they are unequal. Also compare total counts.
Another method that would work in the presence of duplicates is to sort both arrays and do a linear compare. This should be O(N*log(N)).

Assuming you don't want to disturb the original arrays and space is a consideration, another O(n.log(n)) solution that uses less space than sorting both arrays is:
Return FALSE if arrays differ in size
Sort the first array -- O(n.log(n)) time, extra space required is the size of one array
For each element in the 2nd array, check if it's in the sorted copy of
the first array using a binary search -- O(n.log(n)) time
If you use this approach, please use a library routine to do the binary search. Binary search is surprisingly error-prone to hand-code.
[Added after reviewing solutions suggesting dictionary/set/hash lookups:]
In practice I'd use a hash. Several people have asserted O(1) behaviour for hashes, leading them to conclude a hash-based solution is O(N). Typical inserts/lookups may be close to O(1), and some hashing schemes guarantee worst-case O(1) lookup, but worst-case insertion -- in constructing the hash -- isn't O(1). Given any particular hashing data structure, there would be some set of inputs which would produce pathological behaviour. I suspect there exist hashing data structures with the combined worst-case to [insert-N-elements then lookup-N-elements] of O(N.log(N)) time and O(N) space.

You can use a signature (a commutative operation over the array members) to further optimize this in the case where the array are usually different, saving the o(n log n) or the memory allocation.
A signature can be of the form of a bloom filter(s), or even a simple commutative operation like addition or xor.
A simple example (assuming a long as the signature side and gethashcode as a good object identifier; if the objects are, say, ints, then their value is a better identifier; and some signatures will be larger than long)
public bool MatchArrays(object[] array1, object[] array2)
{
if (array1.length != array2.length)
return false;
long signature1 = 0;
long signature2 = 0;
for (i=0;i<array1.length;i++) {
signature1=CommutativeOperation(signature1,array1[i].getHashCode());
signature2=CommutativeOperation(signature2,array2[i].getHashCode());
}
if (signature1 != signature2)
return false;
return MatchArraysTheLongWay(array1, array2);
}
where (using an addition operation; use a different commutative operation if desired, e.g. bloom filters)
public long CommutativeOperation(long oldValue, long newElement) {
return oldValue + newElement;
}

This can be done in different ways:
1 - Brute force: for each element in array1 check that element exists in array2. Note this would require to note the position/index so that duplicates can be handled properly. This requires O(n^2) with much complicated code, don't even think of it at all...
2 - Sort both lists, then check each element to see if they're identical. O(n log n) for sorting and O(n) to check so basically O(n log n), sort can be done in-place if messing up the arrays is not a problem, if not you need to have 2n size memory to copy the sorted list.
3 - Add the items and count from one array to a hashtable, then iterate through the other array, checking that each item is in the hashtable and in that case decrement count if it is not zero otherwise remove it from hashtable. O(n) to create a hashtable, and O(n) to check the other array items in the hashtable, so O(n). This introduces a hashtable with memory at most for n elements.
4 - Best of Best (Among the above): Subtract or take difference of each element in the same index of the two arrays and finally sum up the subtacted values. For eg A1={1,2,3}, A2={3,1,2} the Diff={-2,1,1} now sum-up the Diff = 0 that means they have same set of integers. This approach requires an O(n) with no extra memory. A c# code would look like as follows:
public static bool ArrayEqual(int[] list1, int[] list2)
{
if (list1 == null || list2 == null)
{
throw new Exception("Invalid input");
}
if (list1.Length != list2.Length)
{
return false;
}
int diff = 0;
for (int i = 0; i < list1.Length; i++)
{
diff += list1[i] - list2[i];
}
return (diff == 0);
}
4 doesn't work at all, it is the worst

If the elements of an array are given as distinct, then XOR ( bitwise XOR ) all the elements of both the arrays, if the answer is zero, then both the arrays have the same set of numbers. The time complexity is O(n)

I would suggest using a sort first and sort both first. Then you will compare the first element of each array then the second and so on.
If you find a mismatch you can stop.

If you sort both arrays first, you'd get O(N log(N)).

What is the "best" solution obviously depends on what constraints you have. If it's a small data set, the sorting, hashing, or brute force comparison (like nickf posted) will all be pretty similar. Because you know that you're dealing with integer values, you can get O(n) sort times (e.g. radix sort), and the hash table will also use O(n) time. As always, there are drawbacks to each approach: sorting will either require you to duplicate the data or destructively sort your array (losing the current ordering) if you want to save space. A hash table will obviously have memory overhead to for creating the hash table. If you use nickf's method, you can do it with little-to-no memory overhead, but you have to deal with the O(n2) runtime. You can choose which is best for your purposes.

Going on deep waters here, but:
Sorted lists
sorting can be O(nlogn) as pointed out. just to clarify, it doesn't matter that there is two lists, because: O(2*nlogn) == O(nlogn), then comparing each elements is another O(n), so sorting both then comparing each element is O(n)+O(nlogn) which is: O(nlogn)
Hash-tables:
Converting the first list to a hash table is O(n) for reading + the cost of storing in the hash table, which i guess can be estimated as O(n), gives O(n). Then you'll have to check the existence of each element in the other list in the produced hash table, which is (at least?) O(n) (assuming that checking existance of an element the hash-table is constant). All-in-all, we end up with O(n) for the check.
The Java List interface defines equals as each corresponding element being equal.
Interestingly, the Java Collection interface definition almost discourages implementing the equals() function.
Finally, the Java Set interface per documentation implements this very behaviour. The implementation is should be very efficient, but the documentation makes no mention of performance. (Couldn't find a link to the source, it's probably to strictly licensed. Download and look at it yourself. It comes with the JDK) Looking at the source, the HashSet (which is a commonly used implementation of Set) delegates the equals() implementation to the AbstractSet, which uses the containsAll() function of AbstractCollection using the contains() function again from hashSet. So HashSet.equals() runs in O(n) as expected. (looping through all elements and looking them up in constant time in the hash-table.)
Please edit if you know better to spare me the embarrasment.

Pseudocode :
A:array
B:array
C:hashtable
if A.length != B.length then return false;
foreach objA in A
{
H = objA;
if H is not found in C.Keys then
C.add(H as key,1 as initial value);
else
C.Val[H as key]++;
}
foreach objB in B
{
H = objB;
if H is not found in C.Keys then
return false;
else
C.Val[H as key]--;
}
if(C contains non-zero value)
return false;
else
return true;

The best way is probably to use hashmaps. Since insertion into a hashmap is O(1), building a hashmap from one array should take O(n). You then have n lookups, which each take O(1), so another O(n) operation. All in all, it's O(n).
In python:
def comparray(a, b):
sa = set(a)
return len(sa)==len(b) and all(el in sa for el in b)

Ignoring the built in ways to do this in C#, you could do something like this:
Its O(1) in the best case, O(N) (per list) in worst case.
public bool MatchArrays(object[] array1, object[] array2)
{
if (array1.length != array2.length)
return false;
bool retValue = true;
HashTable ht = new HashTable();
for (int i = 0; i < array1.length; i++)
{
ht.Add(array1[i]);
}
for (int i = 0; i < array2.length; i++)
{
if (ht.Contains(array2[i])
{
retValue = false;
break;
}
}
return retValue;
}

Upon collisions a hashmap is O(n) in most cases because it uses a linked list to store the collisions. However, there are better approaches and you should hardly have collisions anyway because if you did the hashmap would be useless. In all regular cases it's simply O(1). Besides that, it's not likely to have more than a small n of collisions in a single hashmap so performance wouldn't suck that bad; you can safely say that it's O(1) or almost O(1) because the n is so small it's can be ignored.

Here is another option, let me know what you guys think.It should be T(n)=2n*log2n ->O(nLogn) in the worst case.
private boolean compare(List listA, List listB){
if (listA.size()==0||listA.size()==0) return true;
List runner = new ArrayList();
List maxList = listA.size()>listB.size()?listA:listB;
List minList = listA.size()>listB.size()?listB:listA;
int macthes = 0;
List nextList = null;;
int maxLength = maxList.size();
for(int i=0;i<maxLength;i++){
for (int j=0;j<2;j++) {
nextList = (nextList==null)?maxList:(maxList==nextList)?minList:maList;
if (i<= nextList.size()) {
MatchingItem nextItem =new MatchingItem(nextList.get(i),nextList)
int position = runner.indexOf(nextItem);
if (position <0){
runner.add(nextItem);
}else{
MatchingItem itemInBag = runner.get(position);
if (itemInBag.getList != nextList) matches++;
runner.remove(position);
}
}
}
}
return maxLength==macthes;
}
public Class MatchingItem{
private Object item;
private List itemList;
public MatchingItem(Object item,List itemList){
this.item=item
this.itemList = itemList
}
public boolean equals(object other){
MatchingItem otheritem = (MatchingItem)other;
return otheritem.item.equals(this.item) and otheritem.itemlist!=this.itemlist
}
public Object getItem(){ return this.item}
public Object getList(){ return this.itemList}
}

The best I can think of is O(n^2), I guess.
function compare($foo, $bar) {
if (count($foo) != count($bar)) return false;
foreach ($foo as $f) {
foreach ($bar as $b) {
if ($f == $b) {
// $f exists in $bar, skip to the next $foo
continue 2;
}
}
return false;
}
return true;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to quick search in huge range-value pairs? - algorithm

Related

How do I use union find data structure to group Strings?

How to calculate Hash value of a Tree

trie or balanced binary search tree to store dictionary?

Efficient algorithm to remove any map that is contained in another map from a collection of maps

Algorithm to tell if two arrays have identical members

Categories

Resources