Time Complexity of Finding All Possible Sub Array of an Array - algorithm

As per the implementation of finding all possible sub-array from a given array as follows:
public class AllPossibleSubArray {
public static void main(String[] args) {
int[] arr = { 1, 2, 3 };
List<List<Integer>> result = new ArrayList<>();
for (int len = 0; len <= arr.length; len++) {
if (len == 0) {
result.add(new ArrayList<>());
} else {
for (int i = 0; i < arr.length - len + 1; i++) {
List<Integer> temp = new ArrayList<>();
for (int j = i; j < i + len; j++) {
temp.add(arr[j]);
}
result.add(temp);
}
}
}
result.forEach(System.out::println);
}
As per my understanding that time complexity would be O(N^3) as there are three FOR loop.
But this problem is nothing but a power set i.e. finding all possible sub set from a given set. From various forum on web, time complexity of power set is 2^N (Binomial expansion) which is not same as O(N^3).
Am i missing some fundamental ?

But this problem is nothing but a power set i.e. finding all possible sub set from a given set.
That's not correct.
The code that you've posted only finds contiguous subarrays, meaning the list of all elements from one index to another index.
The power set, by contrast, would also include discontiguous subsequences, meaning ones that include two elements without including all of the elements between them.
I should also note that there are only O(n2) subarrays, and if you find a different way to represent them, you can find them in O(n2) time rather than O(n3) time as the code that you've posted does. (Specifically, you need a representation that allows you to reuse the shared parts of the lists, rather than having to copy all required elements every time.) By contrast, if you stick with the representation in your code where each list has a distinct copy, finding all subsets would actually require O(n·2n) time, rather than just O(2n) time.

Like the post above said the optimal time complexity for getting all the possible subarrays in an array is O(N^2). I would also like to point out that by definition a subarray is contiguous. check the following link for the definition of a sub array.
https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.geeksforgeeks.org/subarraysubstring-vs-subsequence-and-programs-to-generate-them/amp/&ved=2ahUKEwiI8bOC9I76AhWRSvEDHWubCNAQFnoECAcQBQ&usg=AOvVaw1i1BkpzTxhLqu0mfVhCN87

Related

Variation of subset sum

Given an array of numbers I want to find out set of numbers whose sum is a multiple of Given number.
I know this is variation of subset sum. But the problem is that there are infinite multiples of a number. So I can't think of a Dynamic Problem Solution to the problem.
So how to extend subset sum problem to it?
Pseudo polynomial DP solution to subset sum uses the DP state:
DP(n, s) = Number of ways of getting a sum of s using first n elements of the set
And takes O(ns) time. If I want to find all the multiples of d, I am only interested in remainders of subset sums with d. Remember modulo is distributive. Therefore, I change the DP state to
DP(n, m) = Number of subsets whose sum = m mod d using the first n elements
Space reduced to O(nd) and time also O(nd)
One convention followed in the actual pseudopolynomial solution is to traverse the DP array from the end, allowing you ro use only O(s) space. That cannot be done here. The best you can do is to use O(2m) memory to store previous and current DP arrays.
Although there are infinitely many multiples of every (nonzero) number, there are only finitely many multiples of a number that will be less than the sum of all the elements in your set. In other words, you can always upper-bound the maximum multiple that could be generated by the sum of the elements of the set. This should enable you to use standard pseudopolynomial-time DP techniques to solve the problem.
Hope this helps!
Here is code for finding the number of ways you can calculate the sum value.
public static void main(String[] args) {
Scanner scan=new Scanner(System.in);
int n=scan.nextInt();//number of elements in the set
int m=scan.nextInt();//sum needs to be calculated
scan.nextLine();
int[] setValue=new int[m];
long[][] setSplit=new long[m+1][n+1];
for(int i=0;i<m; i++)
{
setValue[i]=scan.nextInt();
}
setSplit[0][0]=1;
//when sum is 0
for(int i=1; i<m+1; i++)
{
setSplit[i][0]=1;
}
//when sum is more than 0 but set element is 0
for(int j=1; j<n+1; j++)
{
setSplit[0][j]=0;
}
int temp=0;
for(int i=1; i<=m; i++)
{
for(int j=1; j<n+1; j++)
{
setSplit[i][j]=setSplit[i-1][j];
if(j>=setValue[i-1])
{
setSplit[i][j]=setSplit[i][j]+setSplit[i][j-setValue[i-1]];
}
}
}
// System.out.println(Arrays.deepToString(setSplit));
System.out.println(setSplit[m][n]);/*this will give number of ways sum can be calculated*/
}

Minimal Number of Extract + Inserts required to sort a list

Context
this problem arises from trying to minimize number of expensive function calls
Problem Definition
Please note that extract_and_insert != swap. In particular, we take the element from position "from", insert it at position "to", and SHIFT all intermediate elements.
int n;
int A[n]; // all elements are integer and distinct
function extract_and_insert(from, to) {
int old_value = A[from]
if (from < to) {
for(int i = from; i < to; ++i)
A[i] = A[i+1];
A[to] = old_value;
} else {
for(int i = from; i > to; --i)
A[i] = A[i-1];
A[to] = old_value;
}
}
Question
We know there are O(n log n) algorithms for sorting a list of numbers.
Now: is there an O(n log n) function, which returns the minimum number of calls to extract_and_insert required to sort the list?
The answer is Yes.
This problem is essentially equivalent to finding the longest increasing subsequence (LIS) in an array, and you can use algorithms to solve that.
Why is this question equivalent to longest increasing subsequence?
Because each extract_and_insert operation will, at its most effective use, correct the relative position of exactly one element in the array. In other words, when we consider the length of the longest increasing subsequence of the array, each operation will increase that length by 1. So, the minimum number of required calls is:
length_of_array - length_of_LIS
and therefore by finding the length of LIS, we will be able to find the minimum number of operations required.
Do read up the linked Wikipedia page to see how to implement the algorithm.

How to increment all values in an array interval by a given amount

Suppose i have an array A of length L. I will be given n intervals(i,j) and i have to increment all values between A[i] and A[j].Which data structure would be most suitable for the given operations?
The intervals are known beforehand.
You can get O(N + M). Keep an extra increment array B the same size of A initially empty (filled with 0). If you need to increment the range (i, j) with value k then do B[i] += k and B[j + 1] -= k
Now do a partial sum transformation in B, considering you're indexing from 0:
for (int i = 1; i < N; ++i) B[i] += B[i - 1];
And now the final values of A are A[i] + B[i]
break all intervals into start and end indexes: s_i,e_i for the i-th interval which starts including s_i and ends excluding e_i
sort all s_i-s as an array S
sort all e_i-s as an array E
set increment to zero
start a linear scan of the input and add increment to everyone,
in each loop if the next s_i is the current index increment increment if the next e_i is index decement increment
inc=0
s=<PriorityQueue of interval startindexes>
e=<PriorityQueue of interval endindexes>
for(i=0;i<n;i++){
if( inc == 0 ){
// skip adding zeros
i=min(s.peek(),e.peek())
}
while( s.peek() == i ) {
s.pop();
inc++;
}
while( e.peek() == i ) {
e.pop();
inc--;
}
a[i]+=inc;
}
complexity(without skipping nonincremented elements): O(n+m*log(m)) // m is the number of intervals
if n>>m then it's O(n)
complexity when skipping elements: O( min( n , \sum length(I_i) ) ), where length(I_i)=e_i-s_i
There are three main approaches that I can think of:
Approach 1
This is the simplest one, where you just keep the array as is, and do the naive thing for increment.
Pros: Querying is constant time
Cons: Increment can be linear time (and hence pretty slow if L is big)
Approach 2
This one is a little more complicated, but is better if you plan on incrementing a lot.
Store the elements in a binary tree so that an in-order traversal accesses the elements in order. Each node (aside from the normal left and right subchildren) also stores an extra int addOn, which will be "add me when you query any node in this tree".
For querying elements, do the normal binary search on index to find the element, adding up all of the values of the addOn variables as you go. Add those to the A[i] at the node you want, and that's your value.
For increments, traverse down into the tree, updating all of these new addOns as necessary. Note that if you add the incremented value to an addOn for one node, you do not update it for the two children. The runtime for each increment is then O(log L), since the only times you ever have to "branch off" into the children is when the first or last element in the interval is in your range. Hence, you branch off at most 2 log L times, and access a constant factor more in elements.
Pros: Increment is now O(log L), so now things are much faster than before if you increment a ton.
Cons: Queries take longer (also O(log L)), and the implementation is much trickier.
Approach 3
Use an interval tree.
Pros: Just like approach 2, this one can be much faster than the naive approach
Cons: Not doable if you don't know what the intervals are going to be beforehand.Also tricky to implement
Solve the problem for a single interval. Then iterate over all intervals and apply the single-interval solution for each. The best data structure depends on the language. Here's a Java example:
public class Interval {
int i;
int j;
}
public void increment(int[] array, Interval interval) {
for (int i = interval.i; i < interval.j; ++i) {
++array[i];
}
}
public void increment(int[] array, Interval[] intervals) {
for (Interval interval : intervals) {
increment(array, interval);
}
}
Obviously you could nest one loop inside the other if you wanted to reduce the amount of code. However, a single-interval method might be useful in its own right.
EDIT
If the intervals are known beforehand, then you can improve things a bit. You can modify the Interval structure to maintain an increment amount (which defaults to 1). Then preprocess the set of intervals S as follows:
Initialize a second set of intervals T to the empty set
For each interval I in S: if I does not overlap any interval in T, add I to T; otherwise:
For each interval J in T that overlaps I, remove J from T, form new intervals K1...Kn from I and J such that there are no overlaps (n can be from 1 to 3), and add K1...Kn to T
When this finishes, use the intervals in T with the earlier code (modified as described). Since there are no overlaps, no element of the array will be incremented more than once. For a fixed set of intervals, this is a constant time algorithm, regardless of the array length.
For N intervals, the splitting process can probably be designed to run in something close to O(N log N) by keeping T ordered by interval start index. But if the cost is amortized among many array increment operations, this isn't all that important to the overall complexity.
A Possible implementation of O(M+N) algorithm suggested by Adrian Budau
import java.util.Scanner;
class Interval{
int i;
int j;
}
public class IncrementArray {
public static void main(String[] args) {
int k= 5; // increase array elements by this value
Scanner sc = new Scanner(System.in);
int intervalNo = sc.nextInt(); // specify no of intervals
Interval[] interval = new Interval[intervalNo]; // array containing ranges/intervals
System.out.println(">"+sc.nextLine()+"<");
for(int i=0;i<intervalNo;i++)
{
interval[i]= new Interval();
String s = sc.nextLine(); // specify i and j separated by comma in one line for an interval.
String[] s1 = s.split(" ");
interval[i].i= Integer.parseInt(s1[0]);
interval[i].j= Integer.parseInt(s1[1]);
}
int[] arr = new int[10]; // array whose values need to be incremented.
for(int i=0;i<arr.length;++i)
arr[i]=i+1; // initialising array.
int[] temp = new int[10];
Interval run=interval[0]; int i;
for(i=0;i<intervalNo;i++,run=interval[i<intervalNo?i:0] ) // i<intervalNo?i:0 is used just to avoid arrayBound Exceptions at last iteration.
{
temp[run.i]+=k;
if(run.j+1<10) // incrementing temp within array bounds.
temp[run.j +1]-=k;
}
for (i = 1; i < 10; ++i)
temp[i] += temp[i - 1];
for(i=0, run=interval[i];i<10;i++)
{
arr[i]+= temp[i];
System.out.print(" "+arr[i]); // printing results.
}
}
}

Why should Insertion Sort be used after threshold crossover in Merge Sort

I have read everywhere that for divide and conquer sorting algorithms like Merge-Sort and Quicksort, instead of recursing until only a single element is left, it is better to shift to Insertion-Sort when a certain threshold, say 30 elements, is reached. That is fine, but why only Insertion-Sort? Why not Bubble-Sort or Selection-Sort, both of which has similar O(N^2) performance? Insertion-Sort should come handy only when many elements are pre-sorted (although that advantage should also come with Bubble-Sort), but otherwise, why should it be more efficient than the other two?
And secondly, at this link, in the 2nd answer and its accompanying comments, it says that O(N log N) performs poorly compared to O(N^2) upto a certain N. How come? N^2 should always perform worse than N log N, since N > log N for all N >= 2, right?
If you bail out of each branch of your divide-and-conquer Quicksort when it hits the threshold, your data looks like this:
[the least 30-ish elements, not in order] [the next 30-ish ] ... [last 30-ish]
Insertion sort has the rather pleasing property that you can call it just once on that whole array, and it performs essentially the same as it does if you call it once for each block of 30. So instead of calling it in your loop, you have the option to call it last. This might not be faster, especially since it pulls the whole data through cache an extra time, but depending how the code is structured it might be convenient.
Neither bubble sort nor selection sort has this property, so I think the answer might quite simply be "convenience". If someone suspects selection sort might be better then the burden of proof lies on them to "prove" that it's faster.
Note that this use of insertion sort also has a drawback -- if you do it this way and there's a bug in your partition code then provided it doesn't lose any elements, just partition them incorrectly, you'll never notice.
Edit: apparently this modification is by Sedgewick, who wrote his PhD on QuickSort in 1975. It was analyzed more recently by Musser (the inventor of Introsort). Reference https://en.wikipedia.org/wiki/Introsort
Musser also considered the effect on caches of Sedgewick's delayed
small sorting, where small ranges are sorted at the end in a single
pass of insertion sort. He reported that it could double the number of
cache misses, but that its performance with double-ended queues was
significantly better and should be retained for template libraries, in
part because the gain in other cases from doing the sorts immediately
was not great.
In any case, I don't think the general advice is "whatever you do, don't use selection sort". The advice is, "insertion sort beats Quicksort for inputs up to a surprisingly non-tiny size", and this is pretty easy to prove to yourself when you're implementing a Quicksort. If you come up with another sort that demonstrably beats insertion sort on the same small arrays, none of those academic sources is telling you not to use it. I suppose the surprise is that the advice is consistently towards insertion sort, rather than each source choosing its own favorite (introductory teachers have a frankly astonishing fondness for bubble sort -- I wouldn't mind if I never hear of it again). Insertion sort is generally thought of as "the right answer" for small data. The issue isn't whether it "should be" fast, it's whether it actually is or not, and I've never particularly noticed any benchmarks dispelling this idea.
One place to look for such data would be in the development and adoption of Timsort. I'm pretty sure Tim Peters chose insertion for a reason: he wasn't offering general advice, he was optimizing a library for real use.
Insertion sort is faster in practice, than bubblesort at least. Their asympotic running time is the same, but insertion sort has better constants (fewer/cheaper operations per iteration). Most notably, it requires only a linear number of swaps of pairs of elements, and in each inner loop it performs comparisons between each of n/2 elements and a "fixed" element that can be stores in a register (while bubble sort has to read values from memory). I.e. insertion sort does less work in its inner loop than bubble sort.
The answer claims that 10000 n lg n > 10 n² for "reasonable" n. This is true up to about 14000 elements.
I am surprised no-one's mentioned the simple fact that insertion sort is simply much faster for "almost" sorted data. That's the reason it's used.
The easier one first: why insertion sort over selection sort? Because insertion sort is in O(n) for optimal input sequences, i.e. if the sequence is already sorted. Selection sort is always in O(n^2).
Why insertion sort over bubble sort? Both need only a single pass for already sorted input sequences, but insertion sort degrades better. To be more specific, insertion sort usually performs better with a small number of inversion than bubble sort does. Source This can be explained because bubble sort always iterates over N-i elements in pass i while insertion sort works more like "find" and only needs to iterate over (N-i)/2 elements in average (in pass N-i-1) to find the insertion position. So, insertion sort is expected to be about two times faster than insertion sort on average.
Here is an empirical proof the insertion sort is faster then bubble sort (for 30 elements, on my machine, the attached implementation, using java...).
I ran the attached code, and found out that the bubble sort ran on average of 6338.515 ns, while insertion took 3601.0
I used wilcoxon signed test to check the probability that this is a mistake and they should actually be the same - but the result is below the range of the numerical error (and effectively P_VALUE ~= 0)
private static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
public static void insertionSort(int[] arr) {
for (int i = 1; i < arr.length; i++) {
int j = i;
while (j > 0 && arr[j-1] > arr[j]) {
swap(arr, j, j-1);
j--;
}
}
}
public static void bubbleSort(int[] arr) {
for (int i = 0 ; i < arr.length; i++) {
boolean bool = false;
for (int j = 0; j < arr.length - i ; j++) {
if (j + 1 < arr.length && arr[j] > arr[j+1]) {
bool = true;
swap(arr,j,j+1);
}
}
if (!bool) break;
}
}
public static void main(String... args) throws Exception {
Random r = new Random(1);
int SIZE = 30;
int N = 1000;
int[] arr = new int[SIZE];
int[] millisBubble = new int[N];
int[] millisInsertion = new int[N];
System.out.println("start");
//warm up:
for (int t = 0; t < 100; t++) {
insertionSort(arr);
}
for (int t = 0; t < N; t++) {
arr = generateRandom(r, SIZE);
int[] tempArr = Arrays.copyOf(arr, arr.length);
long start = System.nanoTime();
insertionSort(tempArr);
millisInsertion[t] = (int)(System.nanoTime()-start);
tempArr = Arrays.copyOf(arr, arr.length);
start = System.nanoTime();
bubbleSort(tempArr);
millisBubble[t] = (int)(System.nanoTime()-start);
}
int sum1 = 0;
for (int x : millisBubble) {
System.out.println(x);
sum1 += x;
}
System.out.println("end of bubble. AVG = " + ((double)sum1)/millisBubble.length);
int sum2 = 0;
for (int x : millisInsertion) {
System.out.println(x);
sum2 += x;
}
System.out.println("end of insertion. AVG = " + ((double)sum2)/millisInsertion.length);
System.out.println("bubble took " + ((double)sum1)/millisBubble.length + " while insertion took " + ((double)sum2)/millisBubble.length);
}
private static int[] generateRandom(Random r, int size) {
int[] arr = new int[size];
for (int i = 0 ; i < size; i++)
arr[i] = r.nextInt(size);
return arr;
}
EDIT:
(1) optimizing the bubble sort (updated above) reduced the total time taking to bubble sort to: 6043.806 not enough to make a significant change. Wilcoxon test is still conclusive: Insertion sort is faster.
(2) I also added a selection sort test (code attached) and compared it against insertion. The results are: selection took 4748.35 while insertion took 3540.114.
P_VALUE for wilcoxon is still below the range of numerical error (effectively ~=0)
code for selection sort used:
public static void selectionSort(int[] arr) {
for (int i = 0; i < arr.length ; i++) {
int min = arr[i];
int minElm = i;
for (int j = i+1; j < arr.length ; j++) {
if (arr[j] < min) {
min = arr[j];
minElm = j;
}
}
swap(arr,i,minElm);
}
}
EDIT: As IVlad points out in a comment, selection sort does only n swaps (and therefore only 3n writes) for any dataset, so insertion sort is very unlikely to beat it on account of doing fewer swaps -- but it will likely do substantially fewer comparisons. The reasoning below better fits a comparison with bubble sort, which will do a similar number of comparisons but many more swaps (and thus many more writes) on average.
One reason why insertion sort tends to be faster than the other O(n^2) algorithms like bubble sort and selection sort is because in the latter algorithms, every single data movement requires a swap, which can be up to 3 times as many memory copies as are necessary if the other end of the swap needs to be swapped again later.
With insertion sort OTOH, if the next element to be inserted isn't already the largest element, it can be saved into a temporary location, and all lower elements shunted forward by starting from the right and using single data copies (i.e. without swaps). This opens up a gap to put the original element.
C code for insertion-sorting integers without using swaps:
void insertion_sort(int *v, int n) {
int i = 1;
while (i < n) {
int temp = v[i]; // Save the current element here
int j = i;
// Shunt everything forwards
while (j > 0 && v[j - 1] > temp) {
v[j] = v[j - 1]; // Look ma, no swaps! :)
--j;
}
v[j] = temp;
++i;
}
}

order of complexity of the algorithm in O notation

Can anyone tell me order of complexity of below algorithm? This algorithm is to do following:
Given an unsorted array of integers with duplicate numbers, write the most efficient code to print out unique values in the array.
I would also like to know
What are some pros and cons in the context of hardware usage of this implementation
private static void IsArrayDuplicated(int[] a)
{
int size = a.Length;
BitArray b = new BitArray(a.Max()+1);
for ( int i = 0; i < size; i++)
{
b.Set(a[i], true);
}
for (int i = 0; i < b.Count; i++)
{
if (b.Get(i))
{
System.Console.WriteLine(i.ToString());
}
}
Console.ReadLine();
}
You have two for loops, one of length a.Length and one of length (if I understand the code correctly) a.Max() + 1. So your algorithmic complexity is O(a.Length + a.Max())
The complexity of the algorithm linear.
Finding the maximum is linear.
Setting the bits is linear.
However the algorithm is also wrong,
unless your integers can be assumed to be positive.
It also has a problem with large integers - do you really
want to allocate MAX_INT/8 bytes of memory?
The name, btw, makes me cringe. IsXYZ() should always return a bool.
I'd say, try again.
Correction - pavpanchekha has the correct answer.
O(n) is probably only possible for a finite/small domain of integers. Everyone think about bucketsort. The Hashmap approach is basically not O(n) but O(n^2) since worst-case insertion into a hashmap is O(n) and NOT constant.
How about sorting the list in O(nlog(n)) and then going through it and print the duplicate values. This results in O(nlog(n)) which is probably the true complexity of the problem.
HashSet<int> mySet = new HashSet<int>( new int[] {-1, 0, -2, 2, 10, 2, 10});
foreach(var item in mySet)
{
console.WriteLine(item);
}
// HashSet guarantee unique values without exception
You have two loops, each based on the size of n. I agree with whaley, but that should give you a good start on it.
O(n) on a.length
Complexity of your algorithm is O(N), but algorithm is not correct.
If numbers are negative it will not work
In case of large numbers you will have problems with memory
I suggest you to use this approach:
private static void IsArrayDuplicated(int[] a) {
int size = a.length;
Set<Integer> b = new HashSet<Integer>();
for (int i = 0; i < size; i++) {
b.add(a[i]);
}
Integer[] T = b.toArray(new Integer[0]);
for (int i = 0; i < T.length; i++) {
System.out.println(T[i]);
}
}

Resources