Big O of union method of dynamic connectivity problem - algorithm

i faced a paradox to analyse this function, Why the time complexity of this function is N^2 and not N?
public void union(int a, int b) {
int aid = ids[a];
int bid = ids[b];
for (int i = 0; i < ids.length; i++) {
if (ids[i] == aid) {
ids[i] = bid;
}
}
}
Its an implementation of eager approach, to solve dynamic connectivity problem , complete code is:
// Union method has N^2 time complexity!!
class EagerApproach extends UnionFind {
protected int[] ids;
EagerApproach(int[] input) {
super(input);
ids = new int[input.length];
System.arraycopy(input, 0, ids, 0, input.length);
}
public boolean connected(int a, int b) {
return ids[a] == ids[b];
}
public void union(int a, int b) {
int aid = ids[a];
int bid = ids[b];
for (int i = 0; i < ids.length; i++) {
if (ids[i] == aid) {
ids[i] = bid;
}
}
}
public int[] getIds() {
return ids;
}
}

Provided your array access ids[x] is in constant time O(1), the time complexity of the union method is linear in the length of the array ids. So
O(ids.length)
or O(n) if we define n as ids.length.
Be careful with the definition of n and ids though. If, in your specific application, n was defined as ids.length = n * n, then this is obviously O(n^2) with n being sqrt(ids.length).

Related

Optimizing quick sort

I am implementing quick sort algorithm in java and here is the code :
public class quickSort {
private int array[];
private int length;
public void sort(int[] inputArr) {
if (inputArr == null || inputArr.length == 0) {
return;
}
this.array = inputArr;
length = inputArr.length;
quickSorter(0, length - 1);
}
private void quickSorter(int lowerIndex, int higherIndex) {
int i = lowerIndex;
int j = higherIndex;
// calculate pivot number, I am taking pivot as middle index number
int pivot = array[lowerIndex+(higherIndex-lowerIndex)/2];
// Divide into two arrays
while (i <= j) {
while (array[i] < pivot) {
i++;
}
while (array[j] > pivot) {
j--;
}
if (i <= j) {
exchangeNumbers(i, j);
//move index to next position on both sides
i++;
j--;
}
}
// call quickSort() method recursively
if (lowerIndex < j)
quickSorter(lowerIndex, j);
if (i < higherIndex)
quickSorter(i, higherIndex);
}
private void exchangeNumbers(int i, int j) {
int temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
Then I implement it with (median of three)
public class quickSort {
private int array[];
private int length;
public void sort(int[] inputArr) {
if (inputArr == null || inputArr.length == 0) {
return;
}
this.array = inputArr;
length = inputArr.length;
quickSorter(0, length - 1);
}
private void quickSorter(int lowerIndex, int higherIndex) {
int i = lowerIndex;
int j = higherIndex;
int mid = lowerIndex+(higherIndex-lowerIndex)/2;
if (array[i]>array[mid]){
exchangeNumbers( i, mid);
}
if (array[i]>array[j]){
exchangeNumbers( i, j);
}
if (array[j]<array[mid]){
exchangeNumbers( j, mid);
}
int pivot = array[mid];
// Divide into two arrays
while (i <= j) {
while (array[i] < pivot) {
i++;
}
while (array[j] > pivot) {
j--;
}
if (i <= j) {
exchangeNumbers(i, j);
//move index to next position on both sides
i++;
j--;
}
}
// call quickSort() method recursively
if (lowerIndex < j)
quickSorter(lowerIndex, j);
if (i < higherIndex)
quickSorter(i, higherIndex);
}
private void exchangeNumbers(int i, int j) {
int temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
and the testing main :
public static void main(String[] args) {
File number = new File ("f.txt");
final int size = 10000000;
try{
quickSortOptimize opti = new quickSortOptimize();
quickSort s = new quickSort();
PrintWriter printWriter = new PrintWriter(number);
for (int i=0;i<size;i++){
printWriter.println((int)(Math.random()*100000));
}
printWriter.close();
Scanner in = new Scanner (number);
int [] arr1 = new int [size];
for (int i=0;i<size;i++){
arr1[i]=Integer.parseInt(in.nextLine());
}
long a=System.currentTimeMillis();
opti.sort(arr1);
long b=System.currentTimeMillis();
System.out.println("Optimaized quicksort: "+(double)(b-a)/1000);
in.close();
int [] arr2 = new int [size];
Scanner in2= new Scanner(number);
for (int i=0;i<size;i++){
arr2[i]=Integer.parseInt(in2.nextLine());
}
long c=System.currentTimeMillis();
s.sort(arr2);
long d=System.currentTimeMillis();
System.out.println("normal Quicksort: "+(double)(d-c)/1000);
}catch (Exception ex){ex.printStackTrace();}
}
The problem is that this method of optimization should improve performance by 5%
but, what happens actually is that I have done this test many times and almost always getting better result on normal quicksort that optimized one
so what is wrong with the second implementation
A median of three (or more) will usually be slower for input that's randomly ordered.
A median of three is intended to help prevent a really bad case from being quite as horrible. There are ways of making it pretty bad anyway, but at least avoids the problem for a few common orderings--e.g., selecting the first element as the pivot can produce terrible results if/when (most of) the input is already ordered.

Puzzle: Find the order of n persons standing in a line (based on their heights)

Saw this question on Careercup.com:
Given heights of n persons standing in a line and a list of numbers corresponding to each person (p) that gives the number of persons who are taller than p and standing in front of p. For example,
Heights: 5 3 2 6 1 4
InFronts:0 1 2 0 3 2
Means that the actual actual order is: 5 3 2 1 6 4
The question gets the two lists of Heights and InFronts, and should generate the order standing in line.
My solution:
It could be solved by first sorting the list in descending order. Obviously, to sort, we need to define an object Person (with two attributes of Height and InFront) and then sort Persons based on their height. Then, I would use two stacks, a main stack and a temp one, to build up the order.
Starting from the tallest, put it in the main stack. If the next person had an InFront value of greater than the person on top of the stack, that means the new person should be added before the person on top. Therefore, we need to pop persons from the main stack, insert the new person, and then return the persons popped out in the first step (back to the main stack from temp one). I would use a temp stack to keep the order of the popped out persons. But how many should be popped out? Since the list is sorted, we need to pop exactly the number of persons in front of the new person, i.e. corresponding InFront.
I think this solution works. But the worst case order would be O(n^2) -- when putting a person in place needs popping out all previous ones.
Is there any other solutions? possibly in O(n)?
The O(nlogn) algoritm is possible.
First assume that all heights are different.
Sort people by heights. Then iterate from shortest to tallest. In each step you need an efficient way to put the next person to the correct position. Notice that people we've already placed are not taller that the current person. And the people we place after are taller than the current. So we have to find a place such that the number of empty positions in the front is equal to the inFronts value of this person. This task can be done using a data structure called interval tree in O(logn) time. So the total time of an algorithm is O(nlogn).
This algorithm works well in case where there's no ties. As it may be safely assumed that empty places up to front will be filled by taller people.
In case when ties are possible, we need to assure that people of the same height are placed in increasing order of their positions. It can be achieved if we will process people by non-decreasing inFronts value. So, in case of possible ties we should also consider inFronts values when sorting people.
And if at some step we can't find a position for next person then the answer it "it's impossible to satisfy problem constraints".
There exists an algorithm with O(nlogn) average complexity, however worst case complexity is still O(n²).
To achieve this you can use a variation of a binary tree. The idea is, in this tree, each node corresponds to a person and each node keeps track of how many people are in front of him (which is the size of the left subtree) as nodes are inserted.
Start iterating the persons array in decreasing height order and insert each person into the tree starting from the root. Insertion is as follows:
Compare the frontCount of the person with the current node's (root at the beginning) value.
If it is smaller than it insert the node to the left with value 1. Increase the current node's value by 1.
Else, descend to the right by decreasing the person's frontCount by current node's value. This enables the node to be placed in the correct location.
After all nodes finished, an inorder traversal gives the correct order of people.
Let the code speak for itself:
public static void arrange(int[] heights, int[] frontCounts) {
Person[] persons = new Person[heights.length];
for (int i = 0; i < persons.length; i++)
persons[i] = new Person(heights[i], frontCounts[i]);
Arrays.sort(persons, (p1, p2) -> {
return Integer.compare(p2.height, p1.height);
});
Node root = new Node(persons[0]);
for (int i = 1; i < persons.length; i++) {
insert(root, persons[i]);
}
inOrderPrint(root);
}
private static void insert(Node root, Person p) {
insert(root, p, p.frontCount);
}
private static void insert(Node root, Person p, int value) {
if (value < root.value) { // should insert to the left
if (root.left == null) {
root.left = new Node(p);
} else {
insert(root.left, p, value);
}
root.value++; // Increase the current node value while descending left!
} else { // insert to the right
if (root.right == null) {
root.right = new Node(p);
} else {
insert(root.right, p, value - root.value);
}
}
}
private static void inOrderPrint(Node root) {
if (root == null)
return;
inOrderPrint(root.left);
System.out.print(root.person.height);
inOrderPrint(root.right);
}
private static class Node {
Node left, right;
int value;
public final Person person;
public Node(Person person) {
this.value = 1;
this.person = person;
}
}
private static class Person {
public final int height;
public final int frontCount;
Person(int height, int frontCount) {
this.height = height;
this.frontCount = frontCount;
}
}
public static void main(String[] args) {
int[] heights = {5, 3, 2, 6, 1, 4};
int[] frontCounts = {0, 1, 2, 0, 3, 2};
arrange(heights, frontCounts);
}
I think one approach can be the following. Although it again seems to be O(n^2) at present.
Sort the Height array and corresponding 'p' array in ascending order of heights (in O(nlogn)). Pick the first element in the list. Put that element in the final array in the position given by the p index.
For example after sorting,
H - 1, 2, 3, 4, 5, 6
p - 3, 2, 1, 2, 0, 0.
1st element should go in position 3. Hence final array becomes:
---1--
2nd element shall go in position 2. Hence final array becomes:
--21--
3rd element should go in position 1. Hence final array becomes:
-321--
4th element shall go in position 2. This is the position among the empty ones. Hence final array becomes:
-321-4
5th element shall go in position 0. Hence final array becomes:
5321-4
6th element should go in position 0. Hence final array becomes:
532164
I think the approach indicated above is correct. However a critical piece missing in the solutions above are.
Infronts is the number of taller candidate before the current person. So after sorting the persons based on height(Ascending), when placing person 3 with infront=2, if person 1 and 2 was in front placed at 0, 1 position respectively, you need to discount their position and place 3 at position 4, I.E 2 taller candidates will take position 2,3.
As some indicated interval tree is the right structure. However a dynamic sized container, with available position will do the job.(code below)
struct Person{
int h, ct;
Person(int ht, int c){
h = ht;
ct = c;
}
};
struct comp{
bool operator()(const Person& lhs, const Person& rhs){
return (lhs.h < rhs.h);
}
};
vector<int> heightOrder(vector<int> &heights, vector<int> &infronts) {
if(heights.size() != infronts.size()){
return {};
}
vector<int> result(infronts.size(), -1);
vector<Person> persons;
vector<int> countSet;
for(int i= 0; i< heights.size(); i++){
persons.emplace_back(Person(heights[i], infronts[i]));
countSet.emplace_back(i);
}
sort(persons.begin(), persons.end(), comp());
for(size_t i=0; i<persons.size(); i++){
Person p = persons[i];
if(countSet.size() > p.ct){
int curr = countSet[p.ct];
//cout << "the index to place height=" << p.h << " , is at pos=" << curr << endl;
result[curr] = p.h;
countSet.erase(countSet.begin() + p.ct);
}
}
return result;
}
I'm using LinkedList for the this. Sort the tallCount[] in ascending order and accordingly re-position the items in heights[]. This is capable of handling the duplicate elements also.
public class FindHeightOrder {
public int[] findOrder(final int[] heights, final int[] tallCount) {
if (heights == null || heights.length == 0 || tallCount == null
|| tallCount.length == 0 || tallCount.length != heights.length) {
return null;
}
LinkedList list = new LinkedList();
list.insertAtStart(heights[0]);
for (int i = 1; i < heights.length; i++) {
if (tallCount[i] == 0) {
Link temp = list.getHead();
while (temp != null && temp.getData() <= heights[i]) {
temp = temp.getLink();
}
if (temp != null) {
if (temp.getData() <= heights[i]) {
list.insertAfterElement(temp.getData(), heights[i]);
} else {
list.insertAtStart(heights[i]);
}
} else {
list.insertAtEnd(heights[i]);
}
} else {
Link temp = list.getHead();
int pos = tallCount[i];
while (temp != null
&& (temp.getData() <= heights[i] || pos-- > 0)) {
temp = temp.getLink();
}
if (temp != null) {
if (temp.getData() <= heights[i]) {
list.insertAfterElement(temp.getData(), heights[i]);
} else {
list.insertBeforeElement(temp.getData(), heights[i]);
}
} else {
list.insertAtEnd(heights[i]);
}
}
}
Link fin = list.getHead();
int i = 0;
while (fin != null) {
heights[i++] = fin.getData();
fin = fin.getLink();
}
return heights;
}
public class Link {
private int data;
private Link link;
public Link(int data) {
this.data = data;
}
public int getData() {
return data;
}
public void setData(int data) {
this.data = data;
}
public Link getLink() {
return link;
}
public void setLink(Link link) {
this.link = link;
}
#Override
public String toString() {
return this.data + " -> "
+ (this.link != null ? this.link : "null");
}
}
public class LinkedList {
private Link head;
public Link getHead() {
return head;
}
public void insertAtStart(int data) {
if (head == null) {
head = new Link(data);
head.setLink(null);
} else {
Link link = new Link(data);
link.setLink(head);
head = link;
}
}
public void insertAtEnd(int data) {
if (head != null) {
Link temp = head;
while (temp != null && temp.getLink() != null) {
temp = temp.getLink();
}
temp.setLink(new Link(data));
} else {
head = new Link(data);
}
}
public void insertAfterElement(int after, int data) {
if (head != null) {
Link temp = head;
while (temp != null) {
if (temp.getData() == after) {
Link link = new Link(data);
link.setLink(temp.getLink());
temp.setLink(link);
break;
} else {
temp = temp.getLink();
}
}
}
}
public void insertBeforeElement(int before, int data) {
if (head != null) {
Link current = head;
Link previous = null;
Link ins = new Link(data);
while (current != null) {
if (current.getData() == before) {
ins.setLink(current);
break;
} else {
previous = current;
current = current.getLink();
if (current != null && current.getData() == before) {
previous.setLink(ins);
ins.setLink(current);
break;
}
}
}
}
}
#Override
public String toString() {
return "LinkedList [head=" + this.head + "]";
}
}
}
As people already corrected for original input:
Heights : A[] = { 5 3 2 6 1 4 }
InFronts: B[] = { 0 1 2 0 3 2 }
Output should look like: X[] = { 5 3 1 6 2 4 }
Here is the O(N*logN) way to approach solution (with assumption that there are no ties).
Iterate over array B and build chain of inequalities (by placing items into a right spot on each iteration, here we can use hashtable for O(1) lookups):
b0 > b1
b0 > b1 > b2
b3 > b0 > b1 > b2
b3 > b0 > b1 > b4 > b2
b3 > b0 > b5 > b1 > b4 > b2
Sort array A and reverse it
Initialize output array X, iterate over chain from #1 and fill array X by placing items from A into a position defined in a chain
Steps #1 and #3 are O(N), step #2 is the most expensive O(N*logN).
And obviously reversing sorted array A (in step #2) is not required.
This is the implementation for the idea provided by user1990169. Complexity being O(N^2).
public class Solution {
class Person implements Comparator<Person>{
int height;
int infront;
public Person(){
}
public Person(int height, int infront){
this.height = height;
this.infront = infront;
}
public int compare(Person p1, Person p2){
return p1.height - p2.height;
}
}
public ArrayList<Integer> order(ArrayList<Integer> heights, ArrayList<Integer> infronts) {
int n = heights.size();
Person[] people = new Person[n];
for(int i = 0; i < n; i++){
people[i] = new Person(heights.get(i), infronts.get(i));
}
Arrays.sort(people, new Person());
Person[] rst = new Person[n];
for(Person p : people){
int count = 0;
for(int i = 0; i < n ; i++){
if(count == p.infront){
while(rst[i] != null && i < n - 1){
i++;
}
rst[i] = p;
break;
}
if(rst[i] == null) count++;
}
}
ArrayList<Integer> heightrst = new ArrayList<Integer>();
for(int i = 0; i < n; i++){
heightrst.add(rst[i].height);
}
return heightrst;
}
}
Was solving this problem today, here is what I came up with:
The idea is to sort the heights array in descending order. Once, we have this sorted array - pick up an element from this element and place it in the resultant array at the corresponding index (I am using an ArrayList for the same, it would be nice to use LinkedList) :
public class Solution {
public ArrayList<Integer> order(ArrayList<Integer> heights, ArrayList<Integer> infronts) {
Person[] persons = new Person[heights.size()];
ArrayList<Integer> res = new ArrayList<>();
for (int i = 0; i < persons.length; i++) {
persons[i] = new Person(heights.get(i), infronts.get(i));
}
Arrays.sort(persons, (p1, p2) -> {
return Integer.compare(p2.height, p1.height);
});
for (int i = 0; i < persons.length; i++) {
//System.out.println("adding "+persons[i].height+" "+persons[i].count);
res.add(persons[i].count, persons[i].height);
}
return res;
}
private static class Person {
public final int height;
public final int count;
public Person(int h, int c) {
height = h;
count = c;
}
}
}
I found this kind of problem on SPOJ. I created a binary tree with little variation. When a new height is inserted, if the front is smaller than the root's front then it goes to the left otherwise right.
Here is the C++ implementation:
#include<bits/stdc++.h>
using namespace std;
struct TreeNode1
{
int val;
int _front;
TreeNode1* left;
TreeNode1*right;
};
TreeNode1* Add(int x, int v)
{
TreeNode1* p= (TreeNode1*) malloc(sizeof(TreeNode1));
p->left=NULL;
p->right=NULL;
p->val=x;
p->_front=v;
return p;
}
TreeNode1* _insert(TreeNode1* root, int x, int _front)
{
if(root==NULL) return Add(x,_front);
if(root->_front >=_front)
{
root->left=_insert(root->left,x,_front);
root->_front+=1;
}
else
{
root->right=_insert(root->right,x,_front-root->_front);
}
return root;
}
bool comp(pair<int,int> a, pair<int,int> b)
{
return a.first>b.first;
}
void in_order(TreeNode1 * root, vector<int>&v)
{
if(root==NULL) return ;
in_order(root->left,v);
v.push_back(root->val);
in_order(root->right,v);
}
vector<int>soln(vector<int>h, vector<int>in )
{
vector<pair<int , int> >vc;
for(int i=0;i<h.size();i++) vc.push_back( make_pair( h[i],in[i] ) );
sort(vc.begin(),vc.end(),comp);
TreeNode1* root=NULL;
for(int i=0;i<vc.size();i++)
root=_insert(root,vc[i].first,vc[i].second);
vector<int>v;
in_order(root,v);
return v;
}
int main()
{
int t;
scanf("%d",&t);
while(t--)
{
int n;
scanf("%d",&n);
vector<int>h;
vector<int>in;
for(int i=0;i<n;i++) {int x;
cin>>x;
h.push_back(x);}
for(int i=0;i<n;i++) {int x; cin>>x;
in.push_back(x);}
vector<int>v=soln(h,in);
for(int i=0;i<n-1;i++) cout<<v[i]<<" ";
cout<<v[n-1]<<endl;
h.clear();
in.clear();
}
}
Here is a Python solution that uses only elementary list functions and takes care of ties.
def solution(heights, infronts):
person = list(zip(heights, infronts))
person.sort(key=lambda x: (x[0] == 0, x[1], -x[0]))
output = []
for p in person:
extended_output = output + [p]
extended_output.sort(key=lambda x: (x[0], -x[1]))
output_position = [p for p in extended_output].index(p) + p[1]
output.insert(output_position, p)
for c, p in enumerate(output):
taller_infronts = [infront for infront in output[0:c] if infront[0] >= p[0]]
assert len(taller_infronts) == p[1]
return output
Simple O(n^2) solution for this in Java:
Algorith:
If the position of the shortest person is i, i-1 taller people will be in front of him.
We fix the position of shortest person and then move to second shortest person.
Sort people by heights. Then iterate from shortest to tallest. In each step you need an efficient way to put the next person to the correct position.
We can optimise this solution even more by using segment tree. See this link.
class Person implements Comparable<Person>{
int height;
int pos;
Person(int height, int pos) {
this.height = height;
this.pos = pos;
}
#Override
public int compareTo(Person person) {
return this.height - person.height;
}
}
public class Solution {
public int[] order(int[] heights, int[] positions) {
int n = heights.length;
int[] ans = new int[n];
PriorityQueue<Person> pq = new PriorityQueue<Person>();
for( int i=0; i<n; i++) {
pq.offer(new Person(heights[i], positions[i]) );
}
for(int i=0; i<n; i++) {
Person person = pq.poll();
int vacantTillNow = 0;
int index = 0;
while(index < n) {
if( ans[index] == 0) vacantTillNow++;
if( vacantTillNow > person.pos) break;
index++;
}
ans[index] = person.height;
}
return ans;
}
}
Segment tree can be used to solve this in O(nlog n) if there are no ties in heights.
Please look for approach 3 in this link for a clear explanation of this method.
https://www.codingninjas.com/codestudio/problem-details/order-of-people-heights_1170764
Below is my code for the same approach in python
def findEmptySlot(tree, root, left, right, K, result):
tree[root]-=1
if left==right:
return left
if tree[2*root+1] >= K:
return findEmptySlot(tree, 2*root+1, left, (left+right)//2, K, result)
else:
return findEmptySlot(tree, 2*root+2, (left+right)//2+1, right, K-tree[2*root+1], result)
def buildsegtree(tree, pos, start, end):
if start==end:
tree[pos]=1
return tree[pos]
mid=(start+end)//2
left = buildsegtree(tree, 2*pos+1,start, mid)
right = buildsegtree(tree,2*pos+2,mid+1, end)
tree[pos]=left+right
return tree[pos]
class Solution:
# #param A : list of integers
# #param B : list of integers
# #return a list of integers
def order(self, A, B):
n=len(A)
people=[(A[i],B[i]) for i in range(len(A))]
people.sort(key=lambda x: (x[0], x[1]))
result=[0]*n
tree=[0]*(4*n)
buildsegtree(tree,0, 0, n-1)
for i in range(n):
idx=findEmptySlot(tree, 0, 0, n-1, people[i][1]+1, result)
result[idx]=people[i][0]
return result

Interview - Oracle

In a game the only scores which can be made are 2,3,4,5,6,7,8 and they can be made any number of times
What are the total number of combinations in which the team can play and the score of 50 can be achieved by the team.
example 8,8,8,8,8,8,2 is valid 8,8,8,8,8,4,4,2 is also valid. etc...
The problem can be solved with dynamic programming, with 2 parameters:
i - the index up to which we have considered
s - the total score.
f(i, s) will contain the total number of ways to achieve score s.
Let score[] be the list of unique positive scores that can be made.
The formulation for the DP solution:
f(0, s) = 1, for all s divisible to score[0]
f(0, s) = 0, otherwise
f(i + 1, s) = Sum [for k = 0 .. floor(s/score[i + 1])] f(i, s - score[i + 1] * k)
This looks like a coin change problem. I wrote some Python code for it a while back.
Edited Solution:
from collections import defaultdict
my_dicto = defaultdict(dict)
def row_analysis(v, my_dicto, coins):
temp = 0
for coin in coins:
if v >= coin:
if v - coin == 0: # changed from if v - coin in (0, 1):
temp += 1
my_dicto[coin][v] = temp
else:
temp += my_dicto[coin][v - coin]
my_dicto[coin][v] = temp
else:
my_dicto[coin][v] = temp
return my_dicto
def get_combs(coins, value):
'''
Returns answer for coin change type problems.
Coins are assumed to be sorted.
Example:
>>> get_combs([1,2,3,5,10,15,20], 50)
2955
'''
dicto = defaultdict(dict)
for v in xrange(value + 1):
dicto = row_analysis(v, dicto, coins)
return dicto[coins[-1]][value]
In your case:
>>> get_combs([2,3,4,5,6,7,8], 50)
3095
It is like visit a 7-branches decision tree.
The code is:
class WinScore{
static final int totalScore=50;
static final int[] list={2,3,4,5,6,7,8};
public static int methodNum=0;
static void visitTree( int achieved , int index){
if (achieved >= totalScore ){
return;
}
for ( int i=index; i< list.length; i++ ){
if ( achieved + list[i] == totalScore ) {
methodNum++;
}else if ( achieved + list[i] < totalScore ){
visitTree( achieved + list[i], i );
}
}
}
public static void main( String[] args ){
visitTree(0, 0);
System.out.println("number of methods are:" + methodNum );
}
}
output:
number of methods are:3095
Just stumbled on this question - here's a c# variation which allows you to explore the different combinations:
static class SlotIterator
{
public static IEnumerable<string> Discover(this int[] set, int maxScore)
{
var st = new Stack<Slot>();
var combinations = 0;
set = set.OrderBy(c => c).ToArray();
st.Push(new Slot(0, 0, set.Length));
while (st.Count > 0)
{
var m = st.Pop();
for (var i = m.Index; i < set.Length; i++)
{
if (m.Counter + set[i] < maxScore)
{
st.Push(m.Clone(m.Counter + set[i], i));
}
else if (m.Counter + set[i] == maxScore)
{
m.SetSlot(i);
yield return m.Slots.PrintSlots(set, ++combinations, maxScore);
}
}
}
}
public static string PrintSlots(this int[] slots, int[] set, int numVariation, int maxScore)
{
var sb = new StringBuilder();
var accumulate = 0;
for (var j = 0; j < slots.Length; j++)
{
if (slots[j] <= 0)
{
continue;
}
var plus = "+";
for (var k = 0; k < slots[j]; k++)
{
accumulate += set[j];
if (accumulate == maxScore) plus = "";
sb.AppendFormat("{0}{1}", set[j], plus);
}
}
sb.AppendFormat("={0} - Variation nr. {1}", accumulate, numVariation);
return sb.ToString();
}
}
public class Slot
{
public Slot(int counter, int index, int countSlots)
{
this.Slots = new int[countSlots];
this.Counter = counter;
this.Index = index;
}
public void SetSlot(int index)
{
this.Slots[index]++;
}
public Slot Clone(int newval, int index)
{
var s = new Slot(newval, index, this.Slots.Length);
this.Slots.CopyTo(s.Slots, 0);
s.SetSlot(index);
return s;
}
public int[] Slots { get; private set; }
public int Counter { get; set; }
public int Index { get; set; }
}
Example:
static void Main(string[] args)
{
using (var sw = new StreamWriter(#"c:\test\comb50.txt"))
{
foreach (var s in new[] { 2, 3, 4, 5, 6, 7, 8 }.Discover(50))
{
sw.WriteLine(s);
}
}
}
Yields 3095 combinations.

Finding the longest repeated substring

What would be the best approach (performance-wise) in solving this problem?
I was recommended to use suffix trees. Is this the best approach?
Check out this link: http://introcs.cs.princeton.edu/java/42sort/LRS.java.html
/*************************************************************************
* Compilation: javac LRS.java
* Execution: java LRS < file.txt
* Dependencies: StdIn.java
*
* Reads a text corpus from stdin, replaces all consecutive blocks of
* whitespace with a single space, and then computes the longest
* repeated substring in that corpus. Suffix sorts the corpus using
* the system sort, then finds the longest repeated substring among
* consecutive suffixes in the sorted order.
*
* % java LRS < mobydick.txt
* ',- Such a funny, sporty, gamy, jesty, joky, hoky-poky lad, is the Ocean, oh! Th'
*
* % java LRS
* aaaaaaaaa
* 'aaaaaaaa'
*
* % java LRS
* abcdefg
* ''
*
*************************************************************************/
import java.util.Arrays;
public class LRS {
// return the longest common prefix of s and t
public static String lcp(String s, String t) {
int n = Math.min(s.length(), t.length());
for (int i = 0; i < n; i++) {
if (s.charAt(i) != t.charAt(i))
return s.substring(0, i);
}
return s.substring(0, n);
}
// return the longest repeated string in s
public static String lrs(String s) {
// form the N suffixes
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++) {
suffixes[i] = s.substring(i, N);
}
// sort them
Arrays.sort(suffixes);
// find longest repeated substring by comparing adjacent sorted suffixes
String lrs = "";
for (int i = 0; i < N - 1; i++) {
String x = lcp(suffixes[i], suffixes[i+1]);
if (x.length() > lrs.length())
lrs = x;
}
return lrs;
}
// read in text, replacing all consecutive whitespace with a single space
// then compute longest repeated substring
public static void main(String[] args) {
String s = StdIn.readAll();
s = s.replaceAll("\\s+", " ");
StdOut.println("'" + lrs(s) + "'");
}
}
Have a look at http://en.wikipedia.org/wiki/Suffix_array as well - they are quite space-efficient and have some reasonably programmable algorithms to produce them, such as "Simple Linear Work Suffix Array Construction" by Karkkainen and Sanders
Here is a simple implementation of longest repeated substring using simplest suffix tree. Suffix tree is very easy to implement in this way.
#include <iostream>
#include <vector>
#include <unordered_map>
#include <string>
using namespace std;
class Node
{
public:
char ch;
unordered_map<char, Node*> children;
vector<int> indexes; //store the indexes of the substring from where it starts
Node(char c):ch(c){}
};
int maxLen = 0;
string maxStr = "";
void insertInSuffixTree(Node* root, string str, int index, string originalSuffix, int level=0)
{
root->indexes.push_back(index);
// it is repeated and length is greater than maxLen
// then store the substring
if(root->indexes.size() > 1 && maxLen < level)
{
maxLen = level;
maxStr = originalSuffix.substr(0, level);
}
if(str.empty()) return;
Node* child;
if(root->children.count(str[0]) == 0) {
child = new Node(str[0]);
root->children[str[0]] = child;
} else {
child = root->children[str[0]];
}
insertInSuffixTree(child, str.substr(1), index, originalSuffix, level+1);
}
int main()
{
string str = "banana"; //"abcabcaacb"; //"banana"; //"mississippi";
Node* root = new Node('#');
//insert all substring in suffix tree
for(int i=0; i<str.size(); i++){
string s = str.substr(i);
insertInSuffixTree(root, s, i, s);
}
cout << maxLen << "->" << maxStr << endl;
return 1;
}
/*
s = "mississippi", return "issi"
s = "banana", return "ana"
s = "abcabcaacb", return "abca"
s = "aababa", return "aba"
*/
the LRS problem is one that is best solved using either a suffix tree or a suffix array. Both approaches have a best time complexity of O(n).
Here is an O(nlog(n)) solution to the LRS problem using a suffix array. My solution can be improved to O(n) if you have a linear construction time algorithm for the suffix array (which is quite hard to implement). The code was taken from my library. If you want more information on how suffix arrays work make sure to check out my tutorials
/**
* Finds the longest repeated substring(s) of a string.
*
* Time complexity: O(nlogn), bounded by suffix array construction
*
* #author William Fiset, william.alexandre.fiset#gmail.com
**/
import java.util.*;
public class LongestRepeatedSubstring {
// Example usage
public static void main(String[] args) {
String str = "ABC$BCA$CAB";
SuffixArray sa = new SuffixArray(str);
System.out.printf("LRS(s) of %s is/are: %s\n", str, sa.lrs());
str = "aaaaa";
sa = new SuffixArray(str);
System.out.printf("LRS(s) of %s is/are: %s\n", str, sa.lrs());
str = "abcde";
sa = new SuffixArray(str);
System.out.printf("LRS(s) of %s is/are: %s\n", str, sa.lrs());
}
}
class SuffixArray {
// ALPHABET_SZ is the default alphabet size, this may need to be much larger
int ALPHABET_SZ = 256, N;
int[] T, lcp, sa, sa2, rank, tmp, c;
public SuffixArray(String str) {
this(toIntArray(str));
}
private static int[] toIntArray(String s) {
int[] text = new int[s.length()];
for(int i=0;i<s.length();i++)text[i] = s.charAt(i);
return text;
}
// Designated constructor
public SuffixArray(int[] text) {
T = text;
N = text.length;
sa = new int[N];
sa2 = new int[N];
rank = new int[N];
c = new int[Math.max(ALPHABET_SZ, N)];
construct();
kasai();
}
private void construct() {
int i, p, r;
for (i=0; i<N; ++i) c[rank[i] = T[i]]++;
for (i=1; i<ALPHABET_SZ; ++i) c[i] += c[i-1];
for (i=N-1; i>=0; --i) sa[--c[T[i]]] = i;
for (p=1; p<N; p <<= 1) {
for (r=0, i=N-p; i<N; ++i) sa2[r++] = i;
for (i=0; i<N; ++i) if (sa[i] >= p) sa2[r++] = sa[i] - p;
Arrays.fill(c, 0, ALPHABET_SZ, 0);
for (i=0; i<N; ++i) c[rank[i]]++;
for (i=1; i<ALPHABET_SZ; ++i) c[i] += c[i-1];
for (i=N-1; i>=0; --i) sa[--c[rank[sa2[i]]]] = sa2[i];
for (sa2[sa[0]] = r = 0, i=1; i<N; ++i) {
if (!(rank[sa[i-1]] == rank[sa[i]] &&
sa[i-1]+p < N && sa[i]+p < N &&
rank[sa[i-1]+p] == rank[sa[i]+p])) r++;
sa2[sa[i]] = r;
} tmp = rank; rank = sa2; sa2 = tmp;
if (r == N-1) break; ALPHABET_SZ = r + 1;
}
}
// Use Kasai algorithm to build LCP array
private void kasai() {
lcp = new int[N];
int [] inv = new int[N];
for (int i = 0; i < N; i++) inv[sa[i]] = i;
for (int i = 0, len = 0; i < N; i++) {
if (inv[i] > 0) {
int k = sa[inv[i]-1];
while( (i + len < N) && (k + len < N) && T[i+len] == T[k+len] ) len++;
lcp[inv[i]-1] = len;
if (len > 0) len--;
}
}
}
// Finds the LRS(s) (Longest Repeated Substring) that occurs in a string.
// Traditionally we are only interested in substrings that appear at
// least twice, so this method returns an empty set if this is not the case.
// #return an ordered set of longest repeated substrings
public TreeSet <String> lrs() {
int max_len = 0;
TreeSet <String> lrss = new TreeSet<>();
for (int i = 0; i < N; i++) {
if (lcp[i] > 0 && lcp[i] >= max_len) {
// We found a longer LRS
if ( lcp[i] > max_len )
lrss.clear();
// Append substring to the list and update max
max_len = lcp[i];
lrss.add( new String(T, sa[i], max_len) );
}
}
return lrss;
}
public void display() {
System.out.printf("-----i-----SA-----LCP---Suffix\n");
for(int i = 0; i < N; i++) {
int suffixLen = N - sa[i];
String suffix = new String(T, sa[i], suffixLen);
System.out.printf("% 7d % 7d % 7d %s\n", i, sa[i],lcp[i], suffix );
}
}
}
public class LongestSubString {
public static void main(String[] args) {
String s = findMaxRepeatedString("ssssssssssss this is a ddddddd word with iiiiiiiiiis and loads of these are ppppppppppppps");
System.out.println(s);
}
private static String findMaxRepeatedString(String s) {
Processor p = new Processor();
char[] c = s.toCharArray();
for (char ch : c) {
p.process(ch);
}
System.out.println(p.bigger());
return new String(new char[p.bigger().count]).replace('\0', p.bigger().letter);
}
static class CharSet {
int count;
Character letter;
boolean isLastPush;
boolean assign(char c) {
if (letter == null) {
count++;
letter = c;
isLastPush = true;
return true;
}
return false;
}
void reassign(char c) {
count = 1;
letter = c;
isLastPush = true;
}
boolean push(char c) {
if (isLastPush && letter == c) {
count++;
return true;
}
return false;
}
#Override
public String toString() {
return "CharSet [count=" + count + ", letter=" + letter + "]";
}
}
static class Processor {
Character previousLetter = null;
CharSet set1 = new CharSet();
CharSet set2 = new CharSet();
void process(char c) {
if ((set1.assign(c)) || set1.push(c)) {
set2.isLastPush = false;
} else if ((set2.assign(c)) || set2.push(c)) {
set1.isLastPush = false;
} else {
set1.isLastPush = set2.isLastPush = false;
smaller().reassign(c);
}
}
CharSet smaller() {
return set1.count < set2.count ? set1 : set2;
}
CharSet bigger() {
return set1.count < set2.count ? set2 : set1;
}
}
}
I had an interview and I needed to solve this problem. This is my solution:
public class FindLargestSubstring {
public static void main(String[] args) {
String test = "ATCGATCGA";
System.out.println(hasRepeatedSubString(test));
}
private static String hasRepeatedSubString(String string) {
Hashtable<String, Integer> hashtable = new Hashtable<>();
int length = string.length();
for (int subLength = length - 1; subLength > 1; subLength--) {
for (int i = 0; i <= length - subLength; i++) {
String sub = string.substring(i, subLength + i);
if (hashtable.containsKey(sub)) {
return sub;
} else {
hashtable.put(sub, subLength);
}
}
}
return "No repeated substring!";
}}
There are way too many things that affect performance for us to answer this question with only what you've given us. (Operating System, language, memory issues, the code itself)
If you're just looking for a mathematical analysis of the algorithm's efficiency, you probably want to change the question.
EDIT
When I mentioned "memory issues" and "the code" I didn't provide all the details. The length of the strings you will be analyzing are a BIG factor. Also, the code doesn't operate alone - it must sit inside a program to be useful. What are the characteristics of that program which impact this algorithm's use and performance?
Basically, you can't performance tune until you have a real situation to test. You can make very educated guesses about what is likely to perform best, but until you have real data and real code, you'll never be certain.

O(1) lookup in non-contiguous memory?

Is there any known data structure that provides O(1) random access, without using a contiguous block of memory of size O(N) or greater? This was inspired by this answer and is being asked for curiosity's sake rather than for any specific practical use case, though it might hypothetically be useful in cases of a severely fragmented heap.
Yes, here's an example in C++:
template<class T>
struct Deque {
struct Block {
enum {
B = 4*1024 / sizeof(T), // use any strategy you want
// this gives you ~4KiB blocks
length = B
};
T data[length];
};
std::vector<Block*> blocks;
T& operator[](int n) {
return blocks[n / Block::length]->data[n % Block::length]; // O(1)
}
// many things left out for clarity and brevity
};
The main difference from std::deque is this has O(n) push_front instead of O(1), and in fact there's a bit of a problem implementing std::deque to have all of:
O(1) push_front
O(1) push_back
O(1) op[]
Perhaps I misinterpreted "without using a contiguous block of memory of size O(N) or greater", which seems awkward. Could you clarify what you want? I've interpreted as "no single allocation that contains one item for every item in the represented sequence", such as would be helpful to avoid large allocations. (Even though I do have a single allocation of size N/B for the vector.)
If my answer doesn't fit your definition, then nothing will, unless you artificially limit the container's max size. (I can limit you to LONG_MAX items, store the above blocks in a tree instead, and call that O(1) lookup, for example.)
You can use a trie where the length of the key is bounded. As lookup in a trie with a key of length m is O(m), if we bound the length of the keys then we bound m and now lookup is O(1).
So think of the a trie where the keys are strings on the alphabet { 0, 1 } (i.e., we are thinking of keys as being the binary representation of integers). If we bound the length of the keys to say 32 letters, we have a structure that we can think of as being indexed by 32-bit integers and is randomly-accessible in O(1) time.
Here is an implementation in C#:
class TrieArray<T> {
TrieArrayNode<T> _root;
public TrieArray(int length) {
this.Length = length;
_root = new TrieArrayNode<T>();
for (int i = 0; i < length; i++) {
Insert(i);
}
}
TrieArrayNode<T> Insert(int n) {
return Insert(IntToBinaryString(n));
}
TrieArrayNode<T> Insert(string s) {
TrieArrayNode<T> node = _root;
foreach (char c in s.ToCharArray()) {
node = Insert(c, node);
}
return _root;
}
TrieArrayNode<T> Insert(char c, TrieArrayNode<T> node) {
if (node.Contains(c)) {
return node.GetChild(c);
}
else {
TrieArrayNode<T> child = new TrieArray<T>.TrieArrayNode<T>();
node.Nodes[GetIndex(c)] = child;
return child;
}
}
internal static int GetIndex(char c) {
return (int)(c - '0');
}
static string IntToBinaryString(int n) {
return Convert.ToString(n, 2);
}
public int Length { get; set; }
TrieArrayNode<T> Find(int n) {
return Find(IntToBinaryString(n));
}
TrieArrayNode<T> Find(string s) {
TrieArrayNode<T> node = _root;
foreach (char c in s.ToCharArray()) {
node = Find(c, node);
}
return node;
}
TrieArrayNode<T> Find(char c, TrieArrayNode<T> node) {
if (node.Contains(c)) {
return node.GetChild(c);
}
else {
throw new InvalidOperationException();
}
}
public T this[int index] {
get {
CheckIndex(index);
return Find(index).Value;
}
set {
CheckIndex(index);
Find(index).Value = value;
}
}
void CheckIndex(int index) {
if (index < 0 || index >= this.Length) {
throw new ArgumentOutOfRangeException("index");
}
}
class TrieArrayNode<TNested> {
public TrieArrayNode<TNested>[] Nodes { get; set; }
public T Value { get; set; }
public TrieArrayNode() {
Nodes = new TrieArrayNode<TNested>[2];
}
public bool Contains(char c) {
return Nodes[TrieArray<TNested>.GetIndex(c)] != null;
}
public TrieArrayNode<TNested> GetChild(char c) {
return Nodes[TrieArray<TNested>.GetIndex(c)];
}
}
}
Here is sample usage:
class Program {
static void Main(string[] args) {
int length = 10;
TrieArray<int> array = new TrieArray<int>(length);
for (int i = 0; i < length; i++) {
array[i] = i * i;
}
for (int i = 0; i < length; i++) {
Console.WriteLine(array[i]);
}
}
}
Well, since I've spent time thinking about it, and it could be argued that all hashtables are either a contiguous block of size >N or have a bucket list proportional to N, and Roger's top-level array of Blocks is O(N) with a coefficient less than 1, and I proposed a fix to that in the comments to his question, here goes:
int magnitude( size_t x ) { // many platforms have an insn for this
for ( int m = 0; x >>= 1; ++ m ) ; // return 0 for input 0 or 1
return m;
}
template< class T >
struct half_power_deque {
vector< vector< T > > blocks; // max log(N) blocks of increasing size
int half_first_block_mag; // blocks one, two have same size >= 2
T &operator[]( size_t index ) {
int index_magnitude = magnitude( index );
size_t block_index = max( 0, index_magnitude - half_first_block_mag );
vector< T > &block = blocks[ block_index ];
size_t elem_index = index;
if ( block_index != 0 ) elem_index &= ( 1<< index_magnitude ) - 1;
return block[ elem_index ];
}
};
template< class T >
struct power_deque {
half_power_deque forward, backward;
ptrdiff_t begin_offset; // == - backward.size() or indexes into forward
T &operator[]( size_t index ) {
ptrdiff_t real_offset = index + begin_offset;
if ( real_offset < 0 ) return backward[ - real_offset - 1 ];
return forward[ real_offset ];
}
};
half_power_deque implements erasing all but the last block, altering half_first_block_mag appropriately. This allows O(max over time N) memory use, amortized O(1) insertions on both ends, never invalidating references, and O(1) lookup.
How about a map/dictionary? Last I checked, that's O(1) performance.

Resources