What is the complexity of comparing 2 ordered sets in c++ by using inbuilt (==) operator.(Assuming set size=n)) - c++14

Can someone tell me stl implementation of (==) operator in set.Let say if i have 2 sets s1 and s2.And if i want to compare if(s1==s2) then is it constant time of linear time.

Have a look at operator == for std::set #1:
template< class Key, class Compare, class Alloc >
bool operator==( const std::set<Key,Compare,Alloc>& lhs,
const std::set<Key,Compare,Alloc>& rhs );
which is accompagnied by the following explanation:
Compares the contents of two containers.
1-2) Checks if the contents of lhs and rhs are equal, that is, they have the same number of elements and each element in lhs compares equal with the element in rhs at the same position.
That should answer the first part of your question. And with respect to time complexity, further down we have:
Complexity
1-2) Constant if lhs and rhs are of different size, otherwise linear in the size of the container

Related

Linked List - Remove numbers from a specified range

I have a linked list that contains numbers from 0 to 1 and my task is to remove numbers from a given range (x, y) from this list. Do you have any idea how to solve that problem in a reasonable complexity?
Let's first think about how a LinkedList is structured. Lets take a look at the following image:
Each element in a (doubly) linked list has a pointer to the next (and the previous) element. The Java class LinkedList is for example a doubly-linked list.
In such a list there is no direct access to "give me the index of element B". We just have a head reference (pointing at the start of the list) and a tail reference (pointing at the end). To find the element B, we need to start at head (or tail) and completely walk through the entire list, following the next (or prev) pointer of the elements until we found element B.
So, back to your question, there is no efficient way to remove elements of range(x, y) from a LinkedList. This can only be done efficient in sorted structures like PriorityQueue or a sorted ArrayList (binary search yields O(log(n)) or one with direct access to elements like HashSet for example.
Here is a code snippet in Java that solves your task for LinkedList, however, as stated, it is not efficient and has a running time of O(n) (we need to take a look at each element in order to find out which elements need to be deleted):
LinkedList<Integer> list = ...
// Inclusive lower bound
int lowerBound = ...
// Exclusive upper bound
int upperBound = ...
ListIterator<Integer> listIter = list.listIterator();
while (listIter.hasNext()) {
int value = listIter.next();
// Check if the value is inside bounds
if (value >= lowerBound || value < upperBound) {
// Remove the element from the list using the iterator
// which prevents ConcurrentModificationException
listIter.remove();
}
}
If you think about it, linkedlist has no method getAtIndex. You can only start from Head and work your way to the tail or vice versa. The complexity of this would be O(n)

The complexity of the LRU cache algorithm

I have in front of me a task to implement the LRU cache. And the longest operation in the system should take O(log (n)). As my Cache I use std :: MAP. I still need a second container for storing key + Creation Time - Sort by time. And when I need to update the address to the cache it should take somewhere:
Find by key O(log (n)).
Remove to an iterator O(1).
Insert a new element of the O(log (n)).
The oldest member must reside naturally in container.begin().
I can use only STL.
List - does not suit me.
list.find() - O (n)
Priority Queue - delete items not implemented.
I think it could ideally be stored in a std::set;
std::set<pair<KEY, TIME>>;
Sort std::set:
struct compare
{
bool operator ()(const pair<K, TIME> &a, const pair<K, TIME> &b)
{
return a.second < b.second;
}
};
And to find a key in std :: set to write the function wich looks for the first element of the pair - std::set<pair<KEY, TIME>>;.
What do you think? Can anyone tell if this suits my specified complexity requirements?
Yes you can use map plus set to get the complexity of deleting/updating/inserting as O(logn).
map stores key,value pair.
set should store time,key in this order ( you have done opposite ). When cache is full and you want to remove a key it will be correspong to the element in set to which it = myset.begin() points to.
Having said that you can improve performance by using hash + double linked list.
You can achieve O(1) complexity when chose proper data structures:
template<typename Key_t, typename Value_t>
class LruCache {
....
using Order_t = std::list<Key_t>;
Order_t m_order;
std::unordered_map<Key_t, std::pair<typename Order_t::iterator, Value_t>> m_container;
};
m_order is a list. You need to add some elements at the beginning or at the end of the list (O(1)).
Removing an item from a list if you have iterator to it: m_order.erase(it) - O(1).
Removing Recently Used Key from a list: pop_front/pop_back: O(1).
When you need to find a key, use hash_map - find - (O(1) on average).
When you found a key, you have a value, which is the real value and in addition an iterator to proper item in the list.
The whole complexity can be O(1), then.

Lua 5.1 # operator and string comparisons (performance)

For this:
b = #{1,2,3}
c = 'deadbeef' == 'deadbabe'
Does b gets computed in O(n) or O(1)? In what scenario? Is the behavior consistent, or context-dependent like sparse arrays behavior?
Is string comparison O(1) or O(n)? I know strings are immutable, and Lua compares hash values but what if 2 different strings hash to the same value?
Please, don't answer with "Don't worry about low-level behavior, son". I am interested in low-level behavior. Thank you.
EDIT
3) Is the result of # stored somewhere, or is it calculated each time I call it for the same array?
The length of tables is computed in O(log n). The algorithm is roughly as follows:
Try to find some integer index mapped to nil by taking a step. The step size doubles each time. (If you find a nil value at the end of the array part, you can skip this part.)
When such an index is found, use a divide and conquer algorithm on the interval between this index and the last known non-nil index to find an non-nil value directly followed by a nil value.
See the details here. This algorithm works well if you have a contiguous sequence of values, but can produce unexpected results if the array has holes in between.
EDIT: The results of the builtin # operator are not cached, so the above algorithm runs every time you use # on a table (without __len metamethod).
Regarding string comparisons (for equality):
Newer Lua versions have two types of strings internally: short strings (usually up to 40 bytes) and long strings. Long strings are compared using memcmp (if the lengths match), so you get O(n). Short strings on the other hand are "interned" meaning that when you create a certain short string in Lua, it is checked whether a string with the same contents already exists. If so the old string object is reused, and no new string is allocated. This means that you can simply compare memory addresses to check for equality of short strings, which is O(1).
Lua strings are stored in a table to avoid creating duplicates of the same strings, so every time a string is created it needs to be hashed and compared to anything with the same hash value as part of it's creation.
The comparison of string objects after creation is O(1) as Lua already ensured they reference a unique string so Lua just compares the underlying pointers.
as all string are internalized, string equality becomes
pointer equality
#define eqstr(a,b) ((a) == (b)) lstring.h
x = "deadbeef" -- put in string table
y = "deadbabe" -- put in string table
c = x == y -- compared pointers
For the table case you presented:
From the implentation of ltabl.c:int luaH_getn (Table *t) :
t = {1, 2, 3} -- requires creating a table, hashing all the values etc.
b = #t -- constant time as array part is full and no hash part (ergo # is the array size)
t = [3] = nil
b = #t -- boundary inside array part, binary search in array, b=2
b = #t -- another binary search
t = {1, 2, 3, [1000]=4}
b = #t -- array is full, and 4 is not a key in the hash, b = 3

How to "sort" elements of 2 possible values in place in linear time? [duplicate]

This question already has answers here:
Stable separation for two classes of elements in an array
(3 answers)
Closed 9 years ago.
Suppose I have a function f and array of elements.
The function returns A or B for any element; you could visualize the elements this way ABBAABABAA.
I need to sort the elements according to the function, so the result is: AAAAAABBBB
The number of A values doesn't have to equal the number of B values. The total number of elements can be arbitrary (not fixed). Note that you don't sort chars, you sort objects that have a single char representation.
Few more things:
the sort should take linear time - O(n),
it should be performed in place,
it should be a stable sort.
Any ideas?
Note: if the above is not possible, do you have ideas for algorithms sacrificing one of the above requirements?
If it has to be linear and in-place, you could do a semi-stable version. By semi-stable I mean that A or B could be stable, but not both. Similar to Dukeling's answer, but you move both iterators from the same side:
a = first A
b = first B
loop while next A exists
if b < a
swap a,b elements
b = next B
a = next A
else
a = next A
With the sample string ABBAABABAA, you get:
ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB
on each turn, if you make a swap you move both, if not you just move a. This will keep A stable, but B will lose its ordering. To keep B stable instead, start from the end and work your way left.
It may be possible to do it with full stability, but I don't see how.
A stable sort might not be possible with the other given constraints, so here's an unstable sort that's similar to the partition step of quick-sort.
Have 2 iterators, one starting on the left, one starting on the right.
While there's a B at the right iterator, decrement the iterator.
While there's an A at the left iterator, increment the iterator.
If the iterators haven't crossed each other, swap their elements and repeat from 2.
Lets say,
Object_Array[1...N]
Type_A objs are A1,A2,...Ai
Type_B objs are B1,B2,...Bj
i+j = N
FOR i=1 :N
if Object_Array[i] is of Type_A
obj_A_count=obj_A_count+1
else
obj_B_count=obj_B_count+1
LOOP
Fill the resultant array with obj_A and obj_B with their respective counts depending on obj_A > obj_B
The following should work in linear time for a doubly-linked list. Because up to N insertion/deletions are involved that may cause quadratic time for arrays though.
Find the location where the first B should be after "sorting". This can be done in linear time by counting As.
Start with 3 iterators: iterA starts from the beginning of the container, and iterB starts from the above location where As and Bs should meet, and iterMiddle starts one element prior to iterB.
With iterA skip over As, find the 1st B, and move the object from iterA to iterB->previous position. Now iterA points to the next element after where the moved element used to be, and the moved element is now just before iterB.
Continue with step 3 until you reach iterMiddle. After that all elements between first() and iterB-1 are As.
Now set iterA to iterB-1.
Skip over Bs with iterB. When A is found move it to just after iterA and increment iterA.
Continue step 6 until iterB reaches end().
This would work as a stable sort for any container. The algorithm includes O(N) insertion/deletion, which is linear time for containers with O(1) insertions/deletions, but, alas, O(N^2) for arrays. Applicability in you case depends on whether the container is an array rather than a list.
If your data structure is a linked list instead of an array, you should be able to meet all three of your constraints. You just skim through the list and accumulating and moving the "B"s will be trivial pointer changes. Pseudo code below:
sort(list) {
node = list.head, blast = null, bhead = null
while(node != null) {
nextnode = node.next
if(node.val == "a") {
if(blast != null){
//move the 'a' to the front of the 'B' list
bhead.prev.next = node, node.prev = bhead.prev
blast.next = node.next, node.next.prev = blast
node.next = bhead, bhead.prev = node
}
}
else if(node.val == "b") {
if(blast == null)
bhead = blast = node
else //accumulate the "b"s..
blast = node
}
3
node = nextnode
}
}
So, you can do this in an array, but the memcopies, that emulate the list swap, will make it quiet slow for large arrays.
Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.
Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:
Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,
ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's
Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.
For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)
Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)
This should be possible with a bit of dynamic programming.
It works a bit like counting sort, but with a key difference. Make arrays of size n for both a and b count_a[n] and count_b[n]. Fill these arrays with how many As or Bs there has been before index i.
After just one loop, we can use these arrays to look up the correct index for any element in O(1). Like this:
int final_index(char id, int pos){
if(id == 'A')
return count_a[pos];
else
return count_a[n-1] + count_b[pos];
}
Finally, to meet the total O(n) requirement, the swapping needs to be done in a smart order. One simple option is to have recursive swapping procedure that doesn't actually perform any swapping until both elements would be placed in correct final positions. EDIT: This is actually not true. Even naive swapping will have O(n) swaps. But doing this recursive strategy will give you absolute minimum required swaps.
Note that in general case this would be very bad sorting algorithm since it has memory requirement of O(n * element value range).

Approximate string matching using backtracking

I would like to use backtracking to search for all substrings in a long string allowing for variable length matches - that is matches allowing for a maximum given number of mismatches, insertions, and deletions. I have not been able to locate any useful examples. The closest I have found is this paper here, but that is terribly complex. Anyone?
Cheers,
Martin
Algorithm
The function ff() below uses recursion (i.e. backtracking) to solve your problem. The basic idea is that at the start of any call to f(), we are trying to match a suffix t of the original "needle" string to a suffix s of the "haystack" string, while allowing only a certain number of each type of edit operation.
// ss is the start of the haystack, used only for reporting the match endpoints.
void f(char* ss, char* s, char* t, int mm, int ins, int del) {
while (*s && *s == *t) ++s, ++t; // OK to always match longest segment
if (!*t) printf("%d\n", s - ss); // Matched; print endpoint of match
if (mm && *s && *t) f(ss, s + 1, t + 1, mm - 1, ins, del);
if (ins && *s) f(ss, s + 1, t, mm, ins - 1, del);
if (del && *t) f(ss, s, t + 1, mm, ins, del - 1);
}
// Find all occurrences of t starting at any position in s, with at most
// mm mismatches, ins insertions and del deletions.
void ff(char* s, char* t, int mm, int ins, int del) {
for (char* ss = s; *s; ++s) {
// printf("Starting from offset %d...\n", s - ss);
f(ss, s, t, mm, ins, del);
}
}
Example call:
ff("xxabcydef", "abcdefg", 1, 1, 1);
This outputs:
9
9
because there are two ways to find "abcdefg" in "xxabcydef" with at most 1 of each kind of change, and both of these ways end at position 9:
Haystack: xxabcydef-
Needle: abc-defg
which has 1 insertion (of y) and 1 deletion (of g), and
Haystack: xxabcyde-f
Needle: abc-defg
which has 1 insertion (of y), 1 deletion (of f), and 1 substitution of g to f.
Dominance Relation
It may not be obvious why it's actually safe to use the while loop on line 3 to greedily match as many characters as possible at the start of the two strings. In fact this may reduce the number of times that a particular end position will be reported as a match, but it will never cause an end position to be forgotten completely -- and since we're usually interested in just whether or not there is a match ending at a given position of the haystack, and without this while loop the algorithm would always take time exponential in the needle size, this seems a win-win.
It is guaranteed to work because of a dominance relation. To see this, suppose the opposite -- that it is in fact unsafe (i.e. misses some matches). Then there would be some match in which an initial segment of equal characters from both strings are not aligned to each other, for example:
Haystack: abbbbc
Needle: a-b-bc
However, any such match can be transformed into another match having the same number of operations of each type, and ending at the same position, by shunting the leftmost character following a gap to the left of the gap:
Haystack: abbbbc
Needle: ab--bc
If you do this repeatedly until it's no longer possible to shunt characters without requiring a substitution, you will get a match in which the largest initial segment of equal characters from both strings are aligned to each other:
Haystack: abbbbc
Needle: abb--c
My algorithm will find all such matches, so it follows that no match position will be overlooked by it.
Exponential Time
Like any backtracking program, this function will exhibit exponential slowdowns on certain inputs. Of course, it may be that on the inputs you happen to use, this doesn't occur, and it works out faster than particular implementations of DP algorithms.
I would start with Levenshtein's distance algorithm, which is the standard approach when checking for string similarities via mismatch, insertion and deletion.
Since the algorithm uses bottom up dynamic programming, you'll probably be able to find all substrings without having to execute the algorithm for each potential substring.
The nicest algorithm I'm aware of for this is A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming by Gene Myers. Given a text to search of length n, a pattern string to search for of length m and a maximum number of mismatches/insertions/deletions k, this algorithm takes time O(mn/w), where w is your computer's word size (32 or 64). If you know much about algorithms on strings, it's actually pretty incredible that an algorithm exists that takes time independent of k -- for a long time, this seemed an impossible goal.
I'm not aware of an existing implementation of the above algorithm. If you want a tool, agrep may be just what you need. It uses an earlier algorithm that takes time O(mnk/w), but it's fast enough for low k -- miles ahead of a backtracking search in the worst case.
agrep is based on the Shift-Or (or "Bitap") algorithm, which is a very clever dynamic programming algorithm that manages to represent its state as bits in an integer and get binary addition to do most of the work of updating the state, which is what speeds up the algorithm by a factor of 32 or 64 over a more typical implementation. :) Myers's algorithm also uses this idea to get its 1/w speed factor.

Resources