Speed up linear time comparison between two arrays? - algorithm

I have a speed bottleneck in my code right now. The following function compares two arrays (position and size) and produces a new array of position elements that are smaller than size. This runs in O(n) time but is called many times. Is there anyway for me to do better for this very specific case?
code:
function findValidDimensions(positions, sizes) {
var forwardDimensions = [];
var backwardDimensions = [];
for (var i = 0; i < sizes.length; i++) {
if (positions[i] < sizes[i]) {
forwardDimensions.push(i);
}
if (positions[i] > 1) {
backwardDimensions.push(i);
}
}
// we can go forward or backward
return {
"forward": forwardDimensions,
"backward": backwardDimensions
}
}

Unless there are elements you can avoid looking at (which, from your spare description, sounds unlikely), looking at n elements will take O(n) time.

As i know that the complicity for your method is to large if you use a big array the best searching pattern is the heap or the B-Tree :)
Some ref :
https://en.wikipedia.org/wiki/Heap_%28data_structure%29
https://en.wikipedia.org/wiki/B-tree

Related

Determining Time Complexity with and without Recursion

I'm learning data structures from a book. In the book, they have snippets of pseudocode at the end of the chapter and I'm trying to determine the time complexity. I'm having a little bit of difficulty understand some concepts in time complexity.
I have two pieces of code that do the same thing, seeing if an element in an array occurs at least 3 times; however, one uses recursion and the other uses loops. I have answers for both; can someone tell me whether or not they're correct?
First way (without recursion):
boolean findTripleA(int[] anArray) {
if (anArray.length <= 2) {
return false;
}
for (int i=0; i < anArray.length; i++) {
// check if anArray[i] occurs at least three times
// by counting how often it occurs in anArray
int count = 0;
for (int j = 0; j < anArray.length; j++) {
if (anArray[i] == anArray[j]) {
count++;
}
}
if (count >= 3) {
return true;
}
}
return false;
}
I thought the first way had a time complexity of O(n^2) in best and worst case scenario because there is no way to avoid the inner for loop.
Second way (with recursion):
public static Integer findTripleB(int[] an Array) {
if (anArray.length <= 2) {
return false;
}
// use insertion sort to sort anArray
for (int i = 1; i < anArray.length; i++) {
// insert anArray[i]
int j = i-1;
int element = anArray[i];
while (j >= 0 && anArray[j] > element) {
anArray[j+1] = anArray[j];
j--;
}
anArray[j+1] = element;
}
// check whether anArray contains three consecutive
// elements of the same value
for (int i = 0; i < anArray.length-2; i++) {
if (anArray[i] == anArray[i+2]) {
return new Integer(anArray[i]);
}
}
return null;
}
I thought the second way had a worst case time complexity of O(n^2) and a best case of O(n) if the array is sorted already and insertion sort can be skipped; however I don't know how recursion plays into effect.
The best case for the first one is O(n) - consider what happens when the element appearing first appears 3 times (it will return after just one iteration of the outer loop).
The second one doesn't use recursion, and your running times are correct (technically insertion sort won't 'be skipped', but the running time of insertion sort on an already sorted array is O(n) - just making sure we're on the same page).
Two other ways to solve the problem:
Sort using a better sorting algorithm such as mergesort.
Would take O(n log n) best and worst case.
Insert the elements into a hash map of element to count.
Would take expected O(n) worst case, O(1) best case.

Calculate Space and Time complexity and Improve efficiency of this program

Problem
Find a list of non repeating number in a array of repeating numbers.
My Solution
public static int[] FindNonRepeatedNumber(int[] input)
{
List<int> nonRepeated = new List<int>();
bool repeated = false;
for (int i = 0; i < input.Length; i++)
{
repeated = false;
for (int j = 0; j < input.Length; j++)
{
if ((input[i] == input[j]) && (i != j))
{
//this means the element is repeated.
repeated = true;
break;
}
}
if (!repeated)
{
nonRepeated.Add(input[i]);
}
}
return nonRepeated.ToArray();
}
Time and space complexity
Time complexity = O(n^2)
Space complexity = O(n)
I am not sure with the above calculated time complexity, also how can I make this program more efficient and fast.
The complexity of the Algorithm you provided is O(n^2).
Use Hashmaps to improve the algorithm. The Psuedo code is as follows:
public static int[] FindNonRepeatedNumbers(int[] A)
{
Hashtable<int, int> testMap= new Hashtable<int, int>();
for (Entry<Integer, String> entry : testMap.entrySet()) {
tmp=testMap.get(A[i]);
testMap.put(A[i],tmp+1);
}
/* Elements that are not repeated are:
Set set = teatMap.entrySet();
// Get an iterator
Iterator i = set.iterator();
// Display elements
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if(me.getValue() >1)
{
System.out.println(me.getValue());
}
}
Operation:
What I did here is I used Hashmaps with keys to the hashmaps being the elements of the input array. The values for the hashmaps are like counters for each element. So if an element occurs once then the value for that key is 1 and the key value is subsequently incremented based on recurrence of element in input array.
So finally you just check your hashmap and then display elements with hashvalue 1 which are non-repated elements. The time complexity for this algorithm is O(k) for creating hashmap and O(k) for searching, if the input array length is k. This is much faster than O(n^2). The worst case is when there are no repeated elements at all. The psuedo code might be messy but this approach is the best way I could think of.
Time complexity O(n) means you can't have an inner loop. A full inner loop is O(n^2).
two pointers. begining and end. increment begining when same letters reached and store the start and end pos ,length for reference... increment end otherwise.. keep doing this til end of list..compare all the outputs and you should have the longest continuous list of unique numbers. I hope this is what the question required. Linear algo.
void longestcontinuousunique(int arr[])
{
int start=0;
int end =0;
while (end! =arr.length())
{
if(arr[start] == arr[end])
{
start++;
savetolist(start,end,end-start);
}
else
end++
}
return maxelementof(savedlist);
}

How can we find a repeated number in array in O(n) time and O(1) space complexity

How can we find a repeated number in array in O(n) time and O(1) complexity?
eg
array 2,1,4,3,3,10
output is 3
EDIT:
I tried in following way.
i found that if no is oddly repeated then we can achieve the result by doing xor . so i thought to make the element which is odd no repeating to even no and every evenly repeating no to odd.but for that i need to find out unique element array from input array in O(n) but couldn't find the way.
Assuming that there is an upped bound for the values of the numbers in the array (which is the case with all built-in integer types in all programming languages I 've ever used -- for example, let's say they are 32-bit integers) there is a solution that uses constant space:
Create an array of N elements, where N is the upper bound for the integer values in the input array and initialize all elements to 0 or false or some equivalent. I 'll call this the lookup array.
Loop over the input array, and use each number to index into the lookup array. If the value you find is 1 or true (etc), the current number in the input array is a duplicate.
Otherwise, set the corresponding value in the lookup array to 1 or true to remember that we have seen this particular input number.
Technically, this is O(n) time and O(1) space, and it does not destroy the input array. Practically, you would need things to be going your way to have such a program actually run (e.g. it's out of the question if talking about 64-bit integers in the input).
Without knowing more about the possible values in the array you can't.
With O(1) space requirement the fastest way is to sort the array so it's going to be at least O(n*log(n)).
Use Bit manipulation ... traverse the list in one loop.
Check if the mask is 1 by shifting the value from i.
If so print out repeated value i.
If the value is unset, set it.
*If you only want to show one repeated values once, add another integer show and set its bits as well like in the example below.
**This is in java, I'm not sure we will reach it, but you might want to also add a check using Integer.MAX_VALUE.
public static void repeated( int[] vals ) {
int mask = 0;
int show = 0;
for( int i : vals ) {
// get bit in mask
if( (( mask >> i ) & 1) == 1 &&
(( show >> i ) & 1) == 0 )
{
System.out.println( "\n\tfound: " + i );
show = show | (1 << i);
}
// set mask if not found
else
{
mask = mask | (1 << i);
System.out.println( "new: " + i );
}
System.out.println( "mask: " + mask );
}
}
This is impossible without knowing any restricted rules about the input array, either that the Memory complexity would have some dependency on the input size or that the time complexity is gonna be higher.
The 2 answers above are infact the best answers for getting near what you have asked, one's trade off is Time where the second trade off is in Memory, but you cant have it run in O(n) time and O(1) complexity in SOME UNKNOWN INPUT ARRAY.
I met the problem too and my solution is using hashMap .The python version is the following:
def findRepeatNumber(lists):
hashMap = {}
for i in xrange(len(lists)):
if lists[i] in hashMap:
return lists[i]
else:
hashMap[lists[i]]=i+1
return
It is possible only if you have a specific data. Eg all numbers are of a small range. Then you could store repeat info in the source array not affecting the whole scanning and analyzing process.
Simplified example: You know that all the numbers are smaller than 100, then you can mark repeat count for a number using extra zeroes, like put 900 instead of 9 when 9 is occurred twice.
It is easy when NumMax-NumMin
http://www.geeksforgeeks.org/find-the-maximum-repeating-number-in-ok-time/
public static string RepeatedNumber()
{
int[] input = {66, 23, 34, 0, 5, 4};
int[] indexer = {0,0,0,0,0,0}
var found = 0;
for (int i = 0; i < input.Length; i++)
{
var toFind = input[i];
for (int j = 0; j < input.Length; j++)
{
if (input[j] == toFind && (indexer[j] == 1))
{
found = input[j];
}
else if (input[j] == toFind)
{
indexer[j] = 1;
}
}
}
return $"most repeated item in the array is {found}";
}
You can do this
#include<iostream.h>
#include<conio.h>
#include<stdio.h>
void main ()
{
clrscr();
int array[5],rep=0;
for(int i=1; i<=5; i++)
{
cout<<"enter elements"<<endl;
cin>>array[i];
}
for(i=1; i<=5; i++)
{
if(array[i]==array[i+1])
{
rep=array[i];
}
}
cout<<" repeat value is"<<rep;
getch();
}

For VS Foreach on Array performance (in AS3/Flex)

Which one is faster? Why?
var messages:Array = [.....]
// 1 - for
var len:int = messages.length;
for (var i:int = 0; i < len; i++) {
var o:Object = messages[i];
// ...
}
// 2 - foreach
for each (var o:Object in messages) {
// ...
}
From where I'm sitting, regular for loops are moderately faster than for each loops in the minimal case. Also, as with AS2 days, decrementing your way through a for loop generally provides a very minor improvement.
But really, any slight difference here will be dwarfed by the requirements of what you actually do inside the loop. You can find operations that will work faster or slower in either case. The real answer is that neither kind of loop can be meaningfully said to be faster than the other - you must profile your code as it appears in your application.
Sample code:
var size:Number = 10000000;
var arr:Array = [];
for (var i:int=0; i<size; i++) { arr[i] = i; }
var time:Number, o:Object;
// for()
time = getTimer();
for (i=0; i<size; i++) { arr[i]; }
trace("for test: "+(getTimer()-time)+"ms");
// for() reversed
time = getTimer();
for (i=size-1; i>=0; i--) { arr[i]; }
trace("for reversed test: "+(getTimer()-time)+"ms");
// for..in
time = getTimer();
for each(o in arr) { o; }
trace("for each test: "+(getTimer()-time)+"ms");
Results:
for test: 124ms
for reversed test: 110ms
for each test: 261ms
Edit: To improve the comparison, I changed the inner loops so they do nothing but access the collection value.
Edit 2: Answers to oshyshko's comment:
The compiler could skip the accesses in my internal loops, but it doesn't. The loops would exit two or three times faster if it was.
The results change in the sample code you posted because in that version, the for loop now has an implicit type conversion. I left assignments out of my loops to avoid that.
Of course one could argue that it's okay to have an extra cast in the for loop because "real code" would need it anyway, but to me that's just another way of saying "there's no general answer; which loop is faster depends on what you do inside your loop". Which is the answer I'm giving you. ;)
When iterating over an array, for each loops are way faster in my tests.
var len:int = 1000000;
var i:int = 0;
var arr:Array = [];
while(i < len) {
arr[i] = i;
i++;
}
function forEachLoop():void {
var t:Number = getTimer();
var sum:Number = 0;
for each(var num:Number in arr) {
sum += num;
}
trace("forEachLoop :", (getTimer() - t));
}
function whileLoop():void {
var t:Number = getTimer();
var sum:Number = 0;
var i:int = 0;
while(i < len) {
sum += arr[i] as Number;
i++;
}
trace("whileLoop :", (getTimer() - t));
}
forEachLoop();
whileLoop();
This gives:
forEachLoop : 87
whileLoop : 967
Here, probably most of while loop time is spent casting the array item to a Number. However, I consider it a fair comparison, since that's what you get in the for each loop.
My guess is that this difference has to do with the fact that, as mentioned, the as operator is relatively expensive and array access is also relatively slow. With a for each loop, both operations are handled natively, I think, as opossed to performed in Actionscript.
Note, however, that if type conversion actually takes place, the for each version is much slower and the while version if noticeably faster (though, still, for each beats while):
To test, change array initialization to this:
while(i < len) {
arr[i] = i + "";
i++;
}
And now the results are:
forEachLoop : 328
whileLoop : 366
forEachLoop : 324
whileLoop : 369
I've had this discussion with a few collegues before, and we have all found different results for different scenarios. However, there was one test that I found quite eloquent for comparison's sake:
var array:Array=new Array();
for (var k:uint=0; k<1000000; k++) {
array.push(Math.random());
}
stage.addEventListener("mouseDown",foreachloop);
stage.addEventListener("mouseUp",forloop);
/////// Array /////
/* 49ms */
function foreachloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var i:uint=0;
for each (var n:Number in array) {
i++;
tmp+=n;
}
trace("foreach", i, tmp, getTimer() - t1);
}
/***** 81ms ****/
function forloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var l:uint=array.length;
for(var i:uint = 0; i < l; i++)
tmp += Number(array[i]);
trace("for", i, tmp, getTimer() - t1);
}
What I like about this tests is that you have a reference for both the key and value in each iteration of both loops (removing the key counter in the "for-each" loop is not that relevant). Also, it operates with Number, which is probably the most common loop that you will want to optimize that much. And most importantly, the winner is the "for-each", which is my favorite loop :P
Notes:
-Referencing the array in a local variable within the function of the "for-each" loop is irrelevant, but in the "for" loop you do get a speed bump (75ms instead of 105ms):
function forloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var a:Array=array;
var l:uint=a.length;
for(var i:uint = 0; i < l; i++)
tmp += Number(a[i]);
trace("for", i, tmp, getTimer() - t1);
}
-If you run the same tests with the Vector class, the results are a bit confusing :S
for would be faster for arrays...but depending on the situation it can be foreach that is best...see this .net benchmark test.
Personally, I'd use either until I got to the point where it became necessary for me to optimize the code. Premature optimization is wasteful :-)
Maybe in a array where all element are there and start at zero (0 to X) it would be faster to use a for loop. In all other case (sparse array) it can be a LOT faster to use for each.
The reason is the usage of two data structure in the array: Hast table an Debse Array.
Please read my Array analysis using Tamarin source:
http://jpauclair.wordpress.com/2009/12/02/tamarin-part-i-as3-array/
The for loop will check at undefined index where the for each will skip those one jumping to next element in the HastTable
guys!
Especially Juan Pablo Califano.
I've checked your test. The main difference in obtain array item.
If you will put var len : int = 40000;, you will see that 'while' cycle is faster.
But it loses with big counts of array, instead for..each.
Just an add-on:
a for each...in loop doesn't assure You, that the elements in the array/vector gets enumerated in the ORDER THEY ARE STORED in them. (except XMLs)
This IS a vital difference, IMO.
"...Therefore, you should not write code that depends on a for-
each-in or for-in loop’s enumeration order unless you are processing
XML data..." C.Moock
(i hope not to break law stating this one phrase...)
Happy benchmarking.
sorry to prove you guys wrong, but for each is faster. even a lot. except, if you don't want to access the array values, but a) this does not make sense and b) this is not the case here.
as a result of this, i made a detailed post on my super new blog ... :D
greetz
back2dos

Remove duplicate items with minimal auxiliary memory?

What is the most efficient way to remove duplicate items from an array under the constraint that axillary memory usage must be to a minimum, preferably small enough to not even require any heap allocations? Sorting seems like the obvious choice, but this is clearly not asymptotically efficient. Is there a better algorithm that can be done in place or close to in place? If sorting is the best choice, what kind of sort would be best for something like this?
I'll answer my own question since, after posting, I came up with a really clever algorithm to do this. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing:
Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}
Keeping auxillary memory usage to a minimum, your best bet would be to do an efficient sort to get them in order, then do a single pass of the array with a FROM and TO index.
You advance the FROM index every time through the loop. You only copy the element from FROM to TO (and increment TO) when the key is different from the last.
With Quicksort, that'll average to O(n-log-n) and O(n) for the final pass.
If you sort the array, you will still need another pass to remove duplicates, so the complexity is O(NN) in the worst case (assuming Quicksort), or O(Nsqrt(N)) using Shellsort.
You can achieve O(N*N) by simply scanning the array for each element removing duplicates as you go.
Here is an example in Lua:
function removedups (t)
local result = {}
local count = 0
local found
for i,v in ipairs(t) do
found = false
if count > 0 then
for j = 1,count do
if v == result[j] then found = true; break end
end
end
if not found then
count = count + 1
result[count] = v
end
end
return result, count
end
I don't see any way to do this without something like a bubblesort. When you find a dupe, you need to reduce the length of the array. Quicksort is not designed for the size of the array to change.
This algorithm is always O(n^2) but it also use almost no extra memory -- stack or heap.
// returns the new size
int bubblesqueeze(int* a, int size) {
for (int j = 0; j < size - 1; ++j) {
for (int i = j + 1; i < size; ++i) {
// when a dupe is found, move the end value to index j
// and shrink the size of the array
while (i < size && a[i] == a[j]) {
a[i] = a[--size];
}
if (i < size && a[i] < a[j]) {
int tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
}
}
return size;
}
Is you have two different var for traversing a datadet insted of just one then you can limit the output by dismissing all diplicates that currently are already in the dataset.
Obvious this example in C is not an efficiant sorting algorith but it is just an example on one way to look at the probkem.
You could also blindly sort the data first and then relocate the data for removing dups, but I'm not sure that would be faster.
#define ARRAY_LENGTH 15
int stop = 1;
int scan_sort[ARRAY_LENGTH] = {5,2,3,5,1,2,5,4,3,5,4,8,6,4,1};
void step_relocate(char tmp,char s,int *dataset)
{
for(;tmp<s;s--)
dataset[s] = dataset[s-1];
}
int exists(int var,int *dataset)
{
int tmp=0;
for(;tmp < stop; tmp++)
{
if( dataset[tmp] == var)
return 1;/* value exsist */
if( dataset[tmp] > var)
tmp=stop;/* Value not in array*/
}
return 0;/* Value not in array*/
}
void main(void)
{
int tmp1=0;
int tmp2=0;
int index = 1;
while(index < ARRAY_LENGTH)
{
if(exists(scan_sort[index],scan_sort))
;/* Dismiss all values currently in the final dataset */
else if(scan_sort[stop-1] < scan_sort[index])
{
scan_sort[stop] = scan_sort[index];/* Insert the value as the highest one */
stop++;/* One more value adde to the final dataset */
}
else
{
for(tmp1=0;tmp1<stop;tmp1++)/* find where the data shall be inserted */
{
if(scan_sort[index] < scan_sort[tmp1])
{
index = index;
break;
}
}
tmp2 = scan_sort[index]; /* Store in case this value is the next after stop*/
step_relocate(tmp1,stop,scan_sort);/* Relocated data already in the dataset*/
scan_sort[tmp1] = tmp2;/* insert the new value */
stop++;/* One more value adde to the final dataset */
}
index++;
}
printf("Result: ");
for(tmp1 = 0; tmp1 < stop; tmp1++)
printf( "%d ",scan_sort[tmp1]);
printf("\n");
system( "pause" );
}
I liked the problem so I wrote a simple C test prog for it as you can see above. Make a comment if I should elaborate or you see any faults.

Resources