Complexity of |= in Ruby - ruby

What is the complexity (Big O), for this operation:
my_array |= [new_element]
Is it O(n) because it needs to go through the existing array checking if new_element exists?

Let's expand upon Wand Maker's comment.
Take a look at
http://ruby-doc.org/core-2.2.3/Array.html#method-i-7C
https://github.com/ruby/ruby/blob/trunk/array.c
Source for rb_ary_or
static VALUE
rb_ary_or(VALUE ary1, VALUE ary2)
{
VALUE hash, ary3;
long i;
ary2 = to_ary(ary2);
hash = ary_make_hash(ary1);
for (i=0; i<RARRAY_LEN(ary2); i++) {
VALUE elt = RARRAY_AREF(ary2, i);
if (!st_update(RHASH_TBL_RAW(hash), (st_data_t)elt, ary_hash_orset, (st_data_t)elt)) {
RB_OBJ_WRITTEN(hash, Qundef, elt);
}
}
ary3 = rb_hash_values(hash);
ary_recycle_hash(hash);
return ary3;
}
I would say that the answer to your question is "yes" (at best -- refer to #cliffordheath's comment)", as it seems we have O(n1) for ary_make_hash(aryl) and O(n2) for the for cycle.

Related

Complexity of the word break algorithm

My Q is similar to the one asked on stack overflow in the past
http://www.geeksforgeeks.org/dynamic-programming-set-32-word-break-problem/
The solution I wrote, I am not able to understand that since I do not use DP but still how is it that my sol is solving overlapping problems. I think it is not. Can someone clarify?
my dicitonary that i use is {"cat", "catdog", "dog", "mouse"} and test string as "catdogmouse"
Here is the method i wrote
public static boolean recursiveWordBreak2(String s, int start) {
System.out.println("s is:"+s.substring(start));
if (s.isEmpty() || start >= s.length()) {
return true;
}
for (int i = start; i <= s.length(); i++) {
String str = s.substring(start, i);
System.out.println("substr:" + str);
if (dictSet.contains(str)) {
return recursiveWordBreak2(s, i);
}
}
return false;
}
Your solution uses recursion -only-. recognising that that problem is DP allows you to MEMOIZE (remember) previous results so that you can reuse them without doing the recursion again.
in the link you provided if the dictionary is {a,b,c,d,e} and the input is "abcde", you would need to check if "cde" is valid twice with recursive code, where a DP solution would remember "cde" is valid and only have to check once.
edit: dictionary {a,b,c,d,e} should be {a, ab, cde} to illustrate checking 'cde' twice
edit2 (see comment on algo having logic issue):
if (dictSet.contains(str)) {
return recursiveWordBreak2(s, i);
}
should be
if (dictSet.contains(str) && recursiveWordBreak2(s, i)) { return true }
that way if contains = true but recursiveWB = false, the outer loop will continue to check length+1 instead of returning false

Does Ruby uniq preserve ordering?

The documentation doesn't say anything about that (http://www.ruby-doc.org/core-2.2.0/Array.html#method-i-uniq).
Also, is it using a naive O(n^2) search or something else like a hashmap ? In the latter case, should I understand that my elements must have a proper implementation of hash and eql? when I want to unicize them ?
Given the code (in C) for the Array#uniq
rb_ary_uniq(VALUE ary)
{
VALUE hash, uniq, v;
long i;
if (RARRAY_LEN(ary) <= 1)
return rb_ary_dup(ary);
if (rb_block_given_p()) {
hash = ary_make_hash_by(ary);
uniq = ary_new(rb_obj_class(ary), RHASH_SIZE(hash));
st_foreach(RHASH_TBL(hash), push_value, uniq);
}
else {
hash = ary_make_hash(ary);
uniq = ary_new(rb_obj_class(ary), RHASH_SIZE(hash));
for (i=0; i<RARRAY_LEN(ary); i++) {
st_data_t vv = (st_data_t)(v = rb_ary_elt(ary, i));
if (st_delete(RHASH_TBL(hash), &vv, 0)) {
rb_ary_push(uniq, v);
}
}
}
ary_recycle_hash(hash);
return uniq;
}
In the general case (the else block), it creates a hash from the array (that unifies the key without keeping the order). Then it create a new empty array with the right size. Finally it go through the first Array and when it finds the key in the hash it delete that key and push it to the empty array.
Hence the order is kept.
I'd say the complexity is O(complexity(ary_make_hash) + N) in time, which is probably O(N)

what is the time complexity of Array#uniq method in ruby?

Can anyone please tell me which algorithm is internally used by ruby to remove duplicates from an ruby array using Array#uniq method?
From the docs:
static VALUE
rb_ary_uniq(VALUE ary)
{
VALUE hash, uniq, v;
long i;
if (RARRAY_LEN(ary) <= 1)
return rb_ary_dup(ary);
if (rb_block_given_p()) {
hash = ary_make_hash_by(ary);
uniq = ary_new(rb_obj_class(ary), RHASH_SIZE(hash));
st_foreach(RHASH_TBL(hash), push_value, uniq);
}
else {
hash = ary_make_hash(ary);
uniq = ary_new(rb_obj_class(ary), RHASH_SIZE(hash));
for (i=0; i<RARRAY_LEN(ary); i++) {
st_data_t vv = (st_data_t)(v = rb_ary_elt(ary, i));
if (st_delete(RHASH_TBL(hash), &vv, 0)) {
rb_ary_push(uniq, v);
}
}
}
ary_recycle_hash(hash);
return uniq;
It has O(N) complexity
Amortized O(n) as it uses Hash internally.
This depends on which "internals" you are talking about. There are 7 production-ready Ruby implementations in current use, and the Ruby Language Specification does not prescribe any particular algorithm. So, it really depends on the implementation.
E.g., this is the implementation Rubinius uses:
Rubinius.check_frozen
if block_given?
im = Rubinius::IdentityMap.from(self, &block)
else
im = Rubinius::IdentityMap.from(self)
end
return if im.size == size
array = im.to_array
#tuple = array.tuple
#start = array.start
#total = array.total
self
And this is the one from JRuby:
RubyHash hash = makeHash();
if (realLength == hash.size()) return makeShared();
RubyArray result = new RubyArray(context.runtime, getMetaClass(), hash.size());
int j = 0;
try {
for (int i = 0; i < realLength; i++) {
IRubyObject v = elt(i);
if (hash.fastDelete(v)) result.values[j++] = v;
}
} catch (ArrayIndexOutOfBoundsException aioob) {
concurrentModification();
}
result.realLength = j;
return result;
It compares elements using their hash (provided by the Object#hash method) then compares hashes with Object#eql?.
Source: https://github.com/ruby/ruby/blob/trunk/array.c#L3976
The time complexity is linear time i.e. O(n) as it uses Hash for the internal implementation of the
algorithm.

Calculate Space and Time complexity and Improve efficiency of this program

Problem
Find a list of non repeating number in a array of repeating numbers.
My Solution
public static int[] FindNonRepeatedNumber(int[] input)
{
List<int> nonRepeated = new List<int>();
bool repeated = false;
for (int i = 0; i < input.Length; i++)
{
repeated = false;
for (int j = 0; j < input.Length; j++)
{
if ((input[i] == input[j]) && (i != j))
{
//this means the element is repeated.
repeated = true;
break;
}
}
if (!repeated)
{
nonRepeated.Add(input[i]);
}
}
return nonRepeated.ToArray();
}
Time and space complexity
Time complexity = O(n^2)
Space complexity = O(n)
I am not sure with the above calculated time complexity, also how can I make this program more efficient and fast.
The complexity of the Algorithm you provided is O(n^2).
Use Hashmaps to improve the algorithm. The Psuedo code is as follows:
public static int[] FindNonRepeatedNumbers(int[] A)
{
Hashtable<int, int> testMap= new Hashtable<int, int>();
for (Entry<Integer, String> entry : testMap.entrySet()) {
tmp=testMap.get(A[i]);
testMap.put(A[i],tmp+1);
}
/* Elements that are not repeated are:
Set set = teatMap.entrySet();
// Get an iterator
Iterator i = set.iterator();
// Display elements
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if(me.getValue() >1)
{
System.out.println(me.getValue());
}
}
Operation:
What I did here is I used Hashmaps with keys to the hashmaps being the elements of the input array. The values for the hashmaps are like counters for each element. So if an element occurs once then the value for that key is 1 and the key value is subsequently incremented based on recurrence of element in input array.
So finally you just check your hashmap and then display elements with hashvalue 1 which are non-repated elements. The time complexity for this algorithm is O(k) for creating hashmap and O(k) for searching, if the input array length is k. This is much faster than O(n^2). The worst case is when there are no repeated elements at all. The psuedo code might be messy but this approach is the best way I could think of.
Time complexity O(n) means you can't have an inner loop. A full inner loop is O(n^2).
two pointers. begining and end. increment begining when same letters reached and store the start and end pos ,length for reference... increment end otherwise.. keep doing this til end of list..compare all the outputs and you should have the longest continuous list of unique numbers. I hope this is what the question required. Linear algo.
void longestcontinuousunique(int arr[])
{
int start=0;
int end =0;
while (end! =arr.length())
{
if(arr[start] == arr[end])
{
start++;
savetolist(start,end,end-start);
}
else
end++
}
return maxelementof(savedlist);
}

Remove duplicate items with minimal auxiliary memory?

What is the most efficient way to remove duplicate items from an array under the constraint that axillary memory usage must be to a minimum, preferably small enough to not even require any heap allocations? Sorting seems like the obvious choice, but this is clearly not asymptotically efficient. Is there a better algorithm that can be done in place or close to in place? If sorting is the best choice, what kind of sort would be best for something like this?
I'll answer my own question since, after posting, I came up with a really clever algorithm to do this. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing:
Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}
Keeping auxillary memory usage to a minimum, your best bet would be to do an efficient sort to get them in order, then do a single pass of the array with a FROM and TO index.
You advance the FROM index every time through the loop. You only copy the element from FROM to TO (and increment TO) when the key is different from the last.
With Quicksort, that'll average to O(n-log-n) and O(n) for the final pass.
If you sort the array, you will still need another pass to remove duplicates, so the complexity is O(NN) in the worst case (assuming Quicksort), or O(Nsqrt(N)) using Shellsort.
You can achieve O(N*N) by simply scanning the array for each element removing duplicates as you go.
Here is an example in Lua:
function removedups (t)
local result = {}
local count = 0
local found
for i,v in ipairs(t) do
found = false
if count > 0 then
for j = 1,count do
if v == result[j] then found = true; break end
end
end
if not found then
count = count + 1
result[count] = v
end
end
return result, count
end
I don't see any way to do this without something like a bubblesort. When you find a dupe, you need to reduce the length of the array. Quicksort is not designed for the size of the array to change.
This algorithm is always O(n^2) but it also use almost no extra memory -- stack or heap.
// returns the new size
int bubblesqueeze(int* a, int size) {
for (int j = 0; j < size - 1; ++j) {
for (int i = j + 1; i < size; ++i) {
// when a dupe is found, move the end value to index j
// and shrink the size of the array
while (i < size && a[i] == a[j]) {
a[i] = a[--size];
}
if (i < size && a[i] < a[j]) {
int tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
}
}
return size;
}
Is you have two different var for traversing a datadet insted of just one then you can limit the output by dismissing all diplicates that currently are already in the dataset.
Obvious this example in C is not an efficiant sorting algorith but it is just an example on one way to look at the probkem.
You could also blindly sort the data first and then relocate the data for removing dups, but I'm not sure that would be faster.
#define ARRAY_LENGTH 15
int stop = 1;
int scan_sort[ARRAY_LENGTH] = {5,2,3,5,1,2,5,4,3,5,4,8,6,4,1};
void step_relocate(char tmp,char s,int *dataset)
{
for(;tmp<s;s--)
dataset[s] = dataset[s-1];
}
int exists(int var,int *dataset)
{
int tmp=0;
for(;tmp < stop; tmp++)
{
if( dataset[tmp] == var)
return 1;/* value exsist */
if( dataset[tmp] > var)
tmp=stop;/* Value not in array*/
}
return 0;/* Value not in array*/
}
void main(void)
{
int tmp1=0;
int tmp2=0;
int index = 1;
while(index < ARRAY_LENGTH)
{
if(exists(scan_sort[index],scan_sort))
;/* Dismiss all values currently in the final dataset */
else if(scan_sort[stop-1] < scan_sort[index])
{
scan_sort[stop] = scan_sort[index];/* Insert the value as the highest one */
stop++;/* One more value adde to the final dataset */
}
else
{
for(tmp1=0;tmp1<stop;tmp1++)/* find where the data shall be inserted */
{
if(scan_sort[index] < scan_sort[tmp1])
{
index = index;
break;
}
}
tmp2 = scan_sort[index]; /* Store in case this value is the next after stop*/
step_relocate(tmp1,stop,scan_sort);/* Relocated data already in the dataset*/
scan_sort[tmp1] = tmp2;/* insert the new value */
stop++;/* One more value adde to the final dataset */
}
index++;
}
printf("Result: ");
for(tmp1 = 0; tmp1 < stop; tmp1++)
printf( "%d ",scan_sort[tmp1]);
printf("\n");
system( "pause" );
}
I liked the problem so I wrote a simple C test prog for it as you can see above. Make a comment if I should elaborate or you see any faults.

Resources