Best way to initialize a list of lists - performance

Is there any difference in initializing a list of lists by
[[0 for i in xrange(n)] for j in xrange(m)]
or
[[0]*n for j in xrange(m)]
From the point of view of time performance the first way is 4 times faster than the second way, and I am wondering whether the first way has any computational/memory use or allocation drawback.

List comprehensions provide a concise way to create lists. Common
applications are to make new lists where each element is the result of
some operations applied to each member of another sequence or
iterable, or to create a subsequence of those elements that satisfy a
certain condition.
First method is slower than the second method because a no-op is performed in each loop iteration.
P.S. If you want to initialise a list of list with constant value (k) then the ideal and the fastest way would be to used numpy np.ones((m, n))*k.

Related

Can i predict how many swaps a algorithm will do knowing how an array was shuffled?

I need to take a snapshot of an array each N times two elements gets swapped by a user-defined sorting algorithm. This N dipends by the total number of swaps M which the algorithm will perform once the array is ordered.
The size of the array can get up millions of elements, so I realized that running the algorithm two times (one for counting M, and one for taking these snapshots) gets too long on time when working with slow algorithm like BubbleSort.
Since I am the one who shuffle this algorithm I was wondering: is there a way to know how many swaps (or at least a superior limit of it) a precise sorting algorithm will do?
N is defined like:
Is it possible for you to modify the object class you are working with? You could try to pass a user-defined array class which owns a counter. Using operator overloading you could modify the assigment operator and increment the counter everytime
myarray[i]= newvalue
is called.

Best Logic to implement comparison of two arrays

I have two arrays -arr 1 and arr 2, here both the arrays will have some common items and many uncommon items, first the common items should be removed from both the arrays.
therefore for each uncommon item in arr 1 may probably be a sum of two or more values in arr 2 or vice versa.if the sum is found the values must be removed from the respective arrays. Finally the output should only be the unmatched values on both the arrays   
I need a logic where i can do this calculation in much faster way.
I'm not going to give out the code that implements your logic but rather I would be happy to point you in the right direction.
I code in C++, so I'm gonna answer with respect to that. If you want a different language, I'm sure you can freely do so.
To remove the common elements:
You first sort the arrays arr1 and arr2. Then do a set_symmetric_difference on them. This will effectively run in linear time. Then create two sets out of them, say set1 and set2.
To remove pairs that sum up to an element in the other array
For this, you need to loop through all the possible pairs of elements in arr1 and check if this sum of this pair exists in set2. Likewise do for arr2 as well and remove the element when necessary.
This can be done in O(n2). If you don't want to create the extra sets, you can always trade performance for memory, by just looping through the pairs of arr1 and checking for the sum in the arr2 by doing a binary search.
Then the time complexity may shoot up to O(n2 log(n)).

Possible to do quicksort without splitting into separate lists?

In many quicksort algorithms, the programming involves placing the elements from each array into three groups:(less, pivot, more), and sometimes placing the groups back together. What if I do not want to use this? Is there a simpler approach to sorting a list with quicksort manually?
Basically, I plan to keep the array as one, and swap all the elements based on a partition (for example, given a list x and pivot r, we could have the reference lists of [0:r] and [r:len(x)]. However, as the sorting continues, how do I continue referencing each smaller "subarray"?
So this is my code, but I'm not sure how to continue from here:
x = [4,7,4,2,4,6,5]
#r is pivot POSITION
r = len(x)-1
i = -1
for a in range(0,r+1):
if x[a] <= x[r]:
i+=1
x[i], x[a] = x[a], x[i]
You can implement quicksort purely by swapping the locations of items in a list, rather than actually creating new lists.
But unless this is some sort of homework assignment, the best option is generally to use python's built-in sort() function, which automatically uses quicksort where appropriate.
There's something not right in here. You need to have two definitions, one for the partition and one for the quicksort process itself. The quicksort will then need to have some sort of loop, so that it will continue applying the partition to subarrays of the array. Go and check the Wikipedia article to understand how this works.

Parallel Subset

The setup: I have two arrays which are not sorted and are not of the same length. I want to see if one of the arrays is a subset of the other. Each array is a set in the sense that there are no duplicates.
Right now I am doing this sequentially in a brute force manner so it isn't very fast. I am currently doing this subset method sequentially. I have been having trouble finding any algorithms online that A) go faster and B) are in parallel. Say the maximum size of either array is N, then right now it is scaling something like N^2. I was thinking maybe if I sorted them and did something clever I could bring it down to something like Nlog(N), but not sure.
The main thing is I have no idea how to parallelize this operation at all. I could just do something like each processor looks at an equal amount of the first array and compares those entries to all of the second array, but I'd still be doing N^2 work. But I guess it'd be better since it would run in parallel.
Any Ideas on how to improve the work and make it parallel at the same time?
Thanks
Suppose you are trying to decide if A is a subset of B, and let len(A) = m and len(B) = n.
If m is a lot smaller than n, then it makes sense to me that you sort A, and then iterate through B doing a binary search for each element on A to see if there is a match or not. You can partition B into k parts and have a separate thread iterate through every part doing the binary search.
To count the matches you can do 2 things. Either you could have a num_matched variable be incremented every time you find a match (You would need to guard this var using a mutex though, which might hinder your program's concurrency) and then check if num_matched == m at the end of the program. Or you could have another array or bit vector of size m, and have a thread update the k'th bit if it found a match for the k'th element of A. Then at the end, you make sure this array is all 1's. (On 2nd thoughts bit vector might not work out without a mutex because threads might overwrite each other's annotations when they load the integer containing the bit relevant to them). The array approach, atleast, would not need any mutex that can hinder concurrency.
Sorting would cost you mLog(m) and then, if you only had a single thread doing the matching, that would cost you nLog(m). So if n is a lot bigger than m, this would effectively be nLog(m). Your worst case still remains NLog(N), but I think concurrency would really help you a lot here to make this fast.
Summary: Just sort the smaller array.
Alternatively if you are willing to consider converting A into a HashSet (or any equivalent Set data structure that uses some sort of hashing + probing/chaining to give O(1) lookups), then you can do a single membership check in just O(1) (in amortized time), so then you can do this in O(n) + the cost of converting A into a Set.

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources