Related
I've recently learned about the Flood-Fill Algorithm, an algorithm that can take a graph and assign each node a component number in O(N) time.
For example, a common problem that can be solved efficiently with the Flood-Fill Algorithm would be to find the largest region in a N*N board, where every node in the region is adjacent to another node with the same ID either directly up, down, to the left, or to the right.
In this board, the largest regions would both be of size 3, made up of all 1s and all 9s respectively.
However, I recently started wondering if we could extend this problem; specifically, if we could find the largest region in a graph such that every node in the region is adjacent to another node with two possible IDs. In the above board, the largest such region is made up of 1s and 9s, and has a size of 7.
Here was my thought process in trying to solve this problem:
Thought 1: O(N^4) Algorithm
We can solve this in O(N^4) time using a basic flood-fill algorithm. We do this by testing all O(N^2) pairs of horizontally or vertically adjacent squares. For every pair of squares, if they have different IDs, then we run a flood-fill from one of the two squares.
Then, by modifying the flood-fill algorithm so that it travels to squares with one of the two possible IDs, we can test each pair in O(N^2) time --> O(N^2) pairs * O(N^2) flood fill per pair = O(N^4) algorithm.
Then, I had an insight: An Possibly O(N^2) Algorithm
First, we run a regular flood-fill through the board and separate the board into a "component graph" (where each component in the original graph is reduced to a single node).
Now, we do a flood-fill through the edges of the component graph instead of the nodes. We mark each edge with a pair of integers signifying the two IDs inside the two components which it connects, before flood-filling through the edges as if they themselves were nodes.
I believe that this, if implemented correctly, would result in a O(N^2) algorithm, because an upper bound for the number of edges in a N*N board is 4*N*N.
Now, my question is, is my thought process logically sound? If not, can somebody suggest another algorithm to solve this problem?
Here is the algorithm that I wrote to solve your problem. It expands on your idea to flood-fill through the edges (great idea, by the way) and is able to output the correct answer for a 250*250 grid in less than 300ms, with less than 30 megabytes of memory allocated.
Here is the problem that I managed to find online that matches your question exactly, and it is also where I tested the validity of my algorithm:
USACO Problem
Note that the USACO Problem requires us to find the largest single-id component before finding the largest double-id component. In my algorithm, the first step is actually necessary in order to reduce the whole board into a component graph.
Here's my commented C++ Code:
#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>
#include <unordered_set>
using namespace std;
// board to hold square ids and comp[][] to mark component numbers
vector <vector<int>> board, comp;
vector <int> comp_size = {-1}; // size of those components
vector <int> comp_id = {-1}; // id contained within those components
vector <unordered_set <int>> adj = {{}}; // component graph adjacency list
vector <bool> visited; // component graph visited array
void dfs(int x, int y, int N, int id, int curr_comp){
if(x < 0 || x >= N || y < 0 || y >= N){return;}
else if(board[x][y] != id){
if(comp[x][y] == 0){return;}
// construct component graph adjacency list during the first flood-fill
adj[comp[x][y]].insert(curr_comp);
adj[curr_comp].insert(comp[x][y]);
// this is why we use an unordered_set: it automatically eliminates
// collisions
return;
}
else if(comp[x][y]){return;}
++comp_size[curr_comp];
comp[x][y] = curr_comp;
dfs(x-1, y, N, id, curr_comp);
dfs(x+1, y, N, id, curr_comp);
dfs(x, y-1, N, id, curr_comp);
dfs(x, y+1, N, id, curr_comp);
}
void dfs2(int curr, int id1, int id2, int &size){
visited[curr] = true;
// recurse from all valid and adjacent components to curr
vector <int> to_erase;
for(int item : adj[curr]){
if(visited[item]){continue;}
if(comp_id[item] == id1 || comp_id[item] == id2){
to_erase.push_back(item);
size += comp_size[item];
dfs2(item, id1, id2, size);
}
}
// we erase all edges connecting the current component AT THE SAME TIME to
// prevent std::unordered_set iterators from being invalidated, which would
// happen if we erased items as we iterated through adj[curr]
for(int item : to_erase){
adj[curr].erase(item);
adj[item].erase(curr);
}
return;
}
int main()
{
ifstream fin("multimoo.in");
ofstream fout("multimoo.out");
int N;
fin >> N;
board = vector <vector<int>> (N, vector <int> (N));
for(int i = 0; i < N; ++i){
for(int j = 0; j < N; ++j){
fin >> board[i][j];
}
}
// Input Done
comp = vector <vector<int>> (N, vector <int> (N, 0)); // note that comp[i][j] = 0 means not visited yet
// regular flood-fill through all the nodes
for(int i = 0, curr_comp = 1; i < N; ++i){
for(int j = 0; j < N; ++j){
if(comp[i][j]){continue;}
// add information about the current component
comp_size.push_back(0);
comp_id.push_back(board[i][j]);
adj.push_back({});
dfs(i, j, N, board[i][j], curr_comp++);
}
}
fout << *max_element(comp_size.begin(), comp_size.end()) << endl;
int ANS = 0;
for(unsigned int i = 1; i < comp_size.size(); ++i){
// no range-for loop here as we erase elements while iterating, which
// may invalidate unordered_set iterators; instead, we use a while-loop
while(!adj[i].empty()){
int size = comp_size[i], curr = *(adj[i].begin());
visited = vector <bool> (comp_size.size(), false); // reset visited
dfs2(i, comp_id[i], comp_id[curr], size);
ANS = max(ANS, size);
}
}
fout << ANS << endl;
return 0;
}
As for the time complexity, I personally am not very sure. If somebody could help analyze this algorithm to determine its complexity, I'd greatly appreciate it!
Your algorithm works...
As far as I can tell, flood filling over your induced graph indeed gives all possible components, after which it's simple to find the largest one.
...but I'm not sure about the runtime
You correctly say that there are O(N^2) edges in the original graph, and therefore O(N^2) nodes in the induced graph. However, these nodes are no longer guaranteed to be in a nice grid, which may leave more than O(N^2) induced edges.
For example, consider the large "1-block" in your example. This block has 6 edges, which will give a complete graph with 6 vertices, as all these edges-turned-vertices are connected. This may give you an induced graph with more than O(N^2) edges, making it impossible to find components in O(N^2) time.
Therefore, I believe that the algorithm will not run in O(N^2), but I'm unsure of the actual runtime, as it will depend on what exactly the algorithm does at this point. The question only notes flood fill, but I think it had not imagined this situation.
Consider the following 9x9 grid:
232323232
311111113
212313212
313212313
212313212
313212313
212313212
313212313
212313212
The idea is simple: it's a single large component designed to border as many small components as possible. The induced graph here would be a single almost-complete graph with O(N^2) vertices and O(N^4) edges. Alternatively, if we only link the (1,2) edges with other (1,2) edges, and similar for (1,3) edges and other (1,3) edges, we will have a slightly less-connected graph, but it would still consist of two components with each O(N^4) edges, albeit with a lower constant.
Therefore, creating this graph would take at least O(N^4) time, as would traversing it. This is the time I would argue that the algorithm takes, but I cannot prove that there are no possible optimizations that improve upon this.
We could achieve the optimal O(N^2) complexity by smartly doing our DFS from each pivot component.
Explanation:
First create the set of same-valued components and relationship-map to their neighbours
Notice that for the flood-zone we are looking for atmost 2 distinct values.
Lets say we consider the point (i, j) and look at its neighbours
For each 2-value-pair, say [v_ij, v_neighbour] => do a BFS from this (i,j) pivot point while only collecting nodes such that node-value is one of [v_ij, v_neighbour]
Notice that each component is visited only constant times for BFS (we ensure that by deleting the reverse-edge from child->parent while doing BFS).
Because of (5), our complexity remains O(N^2)
Working code in Python:
from queue import Queue
class Comp:
def __init__(self, point, value):
self.members = {point}
self.value = value
self.neighbours = set()
self.pivot = point
def can_add_member(self, value):
return value == self.value
def add_member(self, point):
self.members.add(point)
def add_neighbour(self, neighbour_comp):
self.neighbours.add(neighbour_comp)
def __str__(self):
return f'[M:%d, V:%d, N:%d]' % (len(self.members), self.value, len(self.neighbours))
def find_largest_flood_region(D):
point_to_comp_map = {}
N, M = len(D), len(D[0])
# Step-1: Create same-value connected-components:
for x in range(N):
for y in range(M):
if (x, y) in point_to_comp_map:
continue
comp_xy = Comp((x, y), D[x][y])
point_to_comp_map[(x, y)] = comp_xy
pq = Queue()
pq.put((x, y))
while pq.qsize() > 0:
i, j = pq.get()
for l, m in [(i-1, j), (i+1, j), (i, j-1), (i, j+1)]:
if 0 <= l < N and 0 <= m < M and (l, m) not in point_to_comp_map and D[l][m] == D[x][y]:
comp_xy.add_member((l, m))
point_to_comp_map[(l, m)] = comp_xy
pq.put((l, m))
# Step-2: Create the relationship-map between the components created above
for x in range(N):
for y in range(M):
comp_xy: Comp = point_to_comp_map[(x, y)]
for i, j in [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]:
if 0 <= i < N and 0 <= j < M and D[i][j] != D[x][y]:
comp_ij: Comp = point_to_comp_map[(i, j)]
comp_xy.add_neighbour(comp_ij)
comp_ij.add_neighbour(comp_xy)
# Do BFS one by one on each unique component:
unique_comps = set(point_to_comp_map.values())
max_region = 0
for comp in unique_comps:
potential_values = set([neigh_comp.value for neigh_comp in comp.neighbours])
for value in potential_values:
value_set = {value, comp.value}
region_value = 0
pq = Queue()
pq.put(comp)
while pq.qsize() > 0:
comp_xy: Comp = pq.get()
region_value += len(comp_xy.members)
for ncomp in comp_xy.neighbours:
if ncomp.value in value_set:
if comp_xy in ncomp.neighbours:
ncomp.neighbours.remove(comp_xy)
pq.put(ncomp)
max_region = max(max_region, region_value)
return max_region
D = [
[9,2,7,9],
[1,1,9,9],
[3,1,4,5],
[3,5,6,6]
]
print(find_largest_flood_region(D))
Output:
7
We can show that solving this in O(n), where n is the number of elements in the matrix, is possible with two passes of a flood-fill union-find routine without a depth-first search.
Given
9 2 7 9
1 1 9 9
3 1 4 5
3 5 6 6
after we label with flood fill, we have:
A B C D
E E D D
F E G H
F I J J
Now that we know each component's size, we can restrict each cell to testing its best connection to a different component left or up. We only need to check one field in a map on the component, the one pointing to the same number, potentially create a new component of reference, or merge two.
In the following example, we'll label components with more than one value with two letters unrelated to their original components. Each cell visited can generate at most two new components and updates at most two components so the complexity remains O(n).
Iterating left to right, top to bottom:
A0: {⊥: 1, 2: AA, 1: DD}
B0: {⊥: 1, 9: AA, 7: BB, 1: EE}
AA = {size: 2}
C0: {⊥: 1, 2: BB, 9: CC}
BB = {size: 2}
D0: {⊥: 3, 7: CC}
CC = {size: 4}
E0: {⊥: 3, 9: DD}
DD = {size: 4}
E1: {⊥: 3, 9: DD, 2: EE}
EE = {size: 4}
D1: {⊥: 3, 7: CC, 1: DD}
DD updates to size 7
D2: {⊥: 3, 7: CC, 1: DD}
F0: {⊥: 2, 1: FF}
FF = {size: 5}
... etc.
I recently encountered this question in an interview. I couldn't really come up with an algorithm for this.
Given an array of unsorted integers, we have to find the minimum cost in which this array can be converted to an Arithmetic Progression where a cost of 1 unit is incurred if any element is changed in the array. Also, the value of the element ranges between (-inf,inf).
I sort of realised that DP can be used here, but I couldn't solve the equation. There were some constraints on the values, but I don't remember them. I am just looking for high level pseudo code.
EDIT
Here's a correct solution, unfortunately, while simple to understand it's not very efficient at O(n^3).
function costAP(arr) {
if(arr.length < 3) { return 0; }
var minCost = arr.length;
for(var i = 0; i < arr.length - 1; i++) {
for(var j = i + 1; j < arr.length; j++) {
var delta = (arr[j] - arr[i]) / (j - i);
var cost = 0;
for(var k = 0; k < arr.length; k++) {
if(k == i) { continue; }
if((arr[k] + delta * (i - k)) != arr[i]) { cost++; }
}
if(cost < minCost) { minCost = cost; }
}
}
return minCost;
}
Find the relative delta between every distinct pair of indices in the array
Use the relative delta to test the cost of transforming the whole array to AP using that delta
Return the minimum cost
Louis Ricci had the right basic idea of looking for the largest existing arithmetic progression, but assumed that it would have to appear in a single run, when in fact the elements of this progression can appear in any subset of the positions, e.g.:
1 42 3 69 5 1111 2222 8
requires just 4 changes:
42 69 1111 2222
1 3 5 8
To calculate this, notice that every AP has a rightmost element. We can suppose each element i of the input vector to be the rightmost AP position in turn, and for each such i consider all positions j to the left of i, determining the step size implied for each (i, j) combination and, when this is integer (indicating a valid AP), add one to the the number of elements that imply this step size and end at position i -- since all such elements belong to the same AP. The overall maximum is then the longest AP:
struct solution {
int len;
int pos;
int step;
};
solution longestArithProg(vector<int> const& v) {
solution best = { -1, 0, 0 };
for (int i = 1; i < v.size(); ++i) {
unordered_map<int, int> bestForStep;
for (int j = 0; j < i; ++j) {
int step = (v[i] - v[j]) / (i - j);
if (step * (i - j) == v[i] - v[j]) {
// This j gives an integer step size: record that j lies on this AP
int len = ++bestForStep[step];
if (len > best.len) {
best.len = len;
best.pos = i;
best.step = step;
}
}
}
}
++best.len; // We never counted the final element in the AP
return best;
}
The above C++ code uses O(n^2) time and O(n) space, since it loops over every pair of positions i and j, performing a single hash read and write for each. To answer the original problem:
int howManyChangesNeeded(vector<int> const& v) {
return v.size() - longestArithProg(v).len;
}
This problem has a simple geometric interpretation, which shows that it can be solved in O(n^2) time and probably can't be solved any faster than that (reduction from 3SUM). Suppose our array is [1, 2, 10, 3, 5]. We can write that array as a sequence of points
(0,1), (1,2), (2,10), (3,3), (4,5)
in which the x-value is the index of the array item and the y-value is the value of the array item. The question now becomes one of finding a line which passes the maximum possible number of points in that set. The cost of converting the array is the number of points not on a line, which is minimized when the number of points on a line is maximized.
A fairly definitive answer to that question is given in this SO posting: What is the most efficient algorithm to find a straight line that goes through most points?
The idea: for each point P in the set from left to right, find the line passing through that point and a maximum number of points to the right of P. (We don't need to look at points to the left of P because they would have been caught in an earlier iteration).
To find the maximum number of P-collinear points to the right of P, for each such point Q calculate the slope of the line segment PQ. Tally up the different slopes in a hash map. The slope which maps to the maximum number of hits is what you're looking for.
Technical issue: you probably don't want to use floating point arithmetic to calculate the slopes. On the other hand, if you use rational numbers, you potentially have to calculate the greatest common divisor in order to compare fractions by comparing numerator and denominator, which multiplies running time by a factor of log n. Instead, you should check equality of rational numbers a/b and c/d by testing whether ad == bc.
The SO posting referenced above gives a reduction from 3SUM, i.e., this problem is 3SUM-hard which shows that if this problem could be solved substantially faster than O(n^2), then 3SUM could also be solved substantially faster than O(n^2). This is where the condition that the integers are in (-inf,inf) comes in. If it is known that the integers are from a bounded set, the reduction from 3SUM is not definitive.
An interesting further question is whether the idea in the Wikipedia for solving 3SUM in O(n + N log N) time when the integers are in the bounded set (-N,N) can be used to solve the minimum cost to convert an array to an AP problem in time faster than O(n^2).
Given the array a = [a_1, a_2, ..., a_n] of unsorted integers, let diffs = [a_2-a_1, a_3-a_2, ..., a_n-a_(n-1)].
Find the maximum occurring value in diffs and adjust any values in a necessary so that all neighboring values differ by this amount.
Interestingly,even I had the same question in my campus recruitment test today.While doing the test itself,I realised that this logic of altering elements based on most frequent differences between 2 subsequent elements in the array fails in some cases.
Eg-4,5,8,9 .According to the logic of a2-a1,a3-a2 as proposed above,answer shud be 1 which is not the case.
As you suggested DP,I feel it can be on the lines of considering 2 values for each element in array-cost when it is modified as well as when it is not modified and return minimum of the 2.Finally terminate when you reach end of the array.
You are given a log of wood of length 'n’. There are 'm’ markings on the log. The log must be cut at each of the marking. The cost of cutting is equal to the length of the log that is being cut. Given such a log, determine the least cost of cutting.
My partial solution is using recursion:
I am able to get the cost when i am going in sequence in the marking array i.e. from 0th cut to end of array cut. However i am stuck as to how to write code for the sequence when we are cutting not in sequence i.e. in random sequence such as the code can account for the cases when the cut is not in sequence and take a maximum for all of that cases.
One solution is to do all the permutation of the markings array. Call woodcut function for all the permutations and take maximum but that seems to be naive approach.
Any suggestions?
marking = [2, 4] (cut points)
int woodcut(length, cut_point, index){
if (cut_point > length)
return INFINITY
first_half = cut_point;
second_half = length - cut_point
if (markings[index++] == exist) {
if (next_cut_point > first)
cost = length + woodcut(second_half, next_cut_point-first)
else
cost = length + woodcut(first_half, next_cut_point)
} else if (index >= sizeof(markings))
return cost;
}
http://www.careercup.com/question?id=5188262471663616
After looking up the answers and with some help from some generous folks, I was able to code up below solution:
#include <stdio.h>
int min(int a, int b)
{
return a>b?b:a;
}
int min_cut(int first, int last, int size, int *cuts)
{
int i;
unsigned int min_cost = 1U<<30;
/* there are no cuts */
if (size == 2)
return 0;
/* there is only one cut between the end points */
if (size == 3)
return last - first;
/* cut at all the positions and take minimum of all */
for (i=1;i<size;i++) {
if (cuts[i] > first && cuts[i] < last) {
int cost = last-first + min_cut(first, cuts[i], i+1, cuts) +
min_cut(cuts[i], last, size - i, cuts);
min_cost = min(cost, min_cost);
}
}
return min_cost;
}
int main()
{
int cuts[] = {0, 2, 4, 7, 10};
int size = sizeof(cuts)/sizeof(cuts[0]);
printf("%d", min_cut(cuts[0], cuts[size-1], size, cuts));
return 0;
}
Approach A:
First write a naive recursive function that calculates the cheapest cost of cutting into pieces from the ith mark to the jth mark. Do that by taking the minimum over all possible first cuts of the cost of that first cut plus the minimum cost of cutting up the two side pieces.
Memoize this function so it is efficient.
Approach B:
Calculate a table of values for calculating the cheapest cost of cutting into pieces from the ith mark to the jth mark. Do it with an outer loop of the number of marks i and j are separate, then with an inner loop of i and then a very inner loop of possible places to do the first cut.
Both methods work. Both will be O(m*m*m) I usually would go with approach A.
Dynamic programming. Complexity O(m^3). Solution in python. Input is ordered list of marking positions, with the last item as the length of the log:
def log_cut(m):
def _log_cut(a, b):
if mat[a][b]==None:
s=0
min_v=None
for i in range(a+1, b):
v=_log_cut(a, i)+_log_cut(i, b)
if min_v==None or v<min_v:
min_v=v
if min_v!=None:
s=min_v+m[b-1]
if a>0:
s-=m[a-1]
mat[a][b]=s
return mat[a][b]
mat=[[None for i in range(len(m)+1)] for j in range(len(m)+1)]
s=_log_cut(0, len(m))
return s
This scenario is analogous to divide-and-conquer sorting. Take quicksort, for example:
There is a partition step that requires a linear pass over an array to divide it into two subarrays. Similarly, the cost of cutting a log is equal to its length.
There is then a recursive step in which each subarray is recursively sorted. Similarly, you must recursively continue to cut each of the two pieces into which a log is cut, until you have cut at all marks.
Quicksort is, of course, O(n log n) in the best case, which occurs when each partition step (except base cases) divides the array into two nearly-equally-sized subarrays. Thus, all you need to do is to find the mark closest to the middle, "cut" the log there, and recurse.
Can this problem be done using only one dp array?
It is the zigzag problem from topcoder (http://community.topcoder.com/stat?c=problem_statement&pm=1259&rd=4493)
A sequence of numbers is called a zig-zag sequence if the differences between successive numbers strictly alternate between positive and negative. The first difference (if one exists) may be either positive or negative. A sequence with fewer than two elements is trivially a zig-zag sequence.
For example, 1,7,4,9,2,5 is a zig-zag sequence because the differences (6,-3,5,-7,3) are alternately positive and negative. In contrast, 1,4,7,2,5 and 1,7,4,5,5 are not zig-zag sequences, the first because its first two differences are positive and the second because its last difference is zero.
Given a sequence of integers, sequence, return the length of the longest subsequence of sequence that is a zig-zag sequence. A subsequence is obtained by deleting some number of elements (possibly zero) from the original sequence, leaving the remaining elements in their original order.
For reference: the DP with two arrays uses an array A[1..n] where A[i] is the maximum length of a zig-zag sequence ending with a zig on element i, and an array B[1..n] where B[i] is the maximum length of a zig-zag sequence ending with a zag on element i. For i from 1 to n, this DP uses the previous entries of the A array to compute the B[i], and the previous entries of the B array to compute A[i]. At the cost of an extra loop, it would be possible to recreate the B entries on demand and thus use only the A array. I'm not sure if this solves your problem, though.
(Also, since the input arrays are so short, there are any number of encoding tricks not worth mentioning.)
Here's an attempt, I'm returning the indices from where you have zigzag. In your 2nd input (1,4,7,2,5), it returns indices of 5 and 4 since it's a zigzag from 4,7,2,5.
You can figure out if the whole array is zigzag based on the result.
public class LongestZigZag
{
private readonly int[] _input;
public LongestZigZag(int[] input)
{
_input = input;
}
public Tuple<int,int> Sequence()
{
var indices = new Tuple<int, int>(int.MinValue, int.MinValue);
if (_input.Length <= 2) return indices;
for (int i = 2; i < _input.Length; i++)
{
var firstDiff = _input[i - 1] - _input[i - 2];
var secondDiff = _input[i] - _input[i - 1];
if ((firstDiff > 0 && secondDiff < 0) || (firstDiff < 0 && secondDiff > 0))
{
var index1 = indices.Item1;
if (index1 == int.MinValue)
{
index1 = i - 2;
}
indices = new Tuple<int, int>(index1, i);
}
else
{
indices = new Tuple<int, int>(int.MinValue, int.MinValue);
}
}
return indices;
}
}
Dynamic Programming takes O(n2) time to run a program. I have designed a code which is of Linear Time Complexity O(n). With one go in the array, it gives the length of Largest possible sequence. I have tested for many test cases provided by different sites for the problem and have got positive results.
Here is my C implementation of code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i,j;
int n;
int count=0;
int flag=0;
scanf(" %d",&n);
int *a;
a = (int*)malloc(n*sizeof(a));
for(i=0;i<n;i++)
{
scanf(" %d",&a[i]); //1,7,5,10,13,15,10,5,16,8
}
i=0;
if(a[0] < a[1])
{
count++;
while(a[i] <= a[i+1] && i<n-1)
i++;
if(i==n-1 && a[i-1]<a[i])
{
count++;
i++;
}
}
while(i<n-1)
{ count++;
while(a[i] >= a[i+1] && i<n-1)
{
i++;
}
if(i==n-1 && a[i-1]>a[i])
{
count++;
break;
}
if(i<n-1)
count++;
while(a[i] <= a[i+1] && i<n-1)
{
i++;
}
if(i==n-1 && a[i-1]<a[i])
{
count++;
break;
}
}
printf("%d",count);
return 0;
}
Every (to my knowledge on the topic, so don't take it for granted) solution which you work out with dynamic programming, comes down to representing a "solution space" (meaning every possible solution that is correct, not necessarily optimal) with a DAG (Directed Acyclic Graph).
For example, if you are looking for a longest rising subseqence, then the solution space can be represented as the following DAG:
Nodes are labeled with the numbers of the sequence
Edge e(u, v) between two nodes indicates that valueOf(u) < valueOf(v) (where valueOf(x) is the value associated with node x)
In dynamic programming, finding an optimal solution to the problem is the same thing as traversing this graph in the right way. The information provided by that graph is in some sense represented by that DP array.
In this case we have two ordering operations. If we would present both of them on one of such graphs, that graph would not be acyclic - we will require at least two graphs (one representing < relation, and one for >).
If the topological ordering requires two DAGs, the solution will require two DP arrays, or some clever way of indicating which edge in Your DAG corresponds to which ordering operation (which in my opinion needlessly complicates the problem).
Hence no, You can't do it with just one DP array. You will require at least two. At least if you want a simple solution that is approached purely by using dynamic programming.
The recursive call for this problem should look something like this (the directions of the relations might be wrong, I haven't checked it):
S - given sequence (array of integers)
P(i), Q(i) - length of the longest zigzag subsequence on elements S[0 -> i] inclusive (the longest sequence that is correct, where S[i] is the last element)
P(i) = {if i == 0 then 1
{max(Q(j) + 1 if A[i] < A[j] for every 0 <= j < i)
Q(i) = {if i == 0 then 0 #yields 0 because we are pedantic about "is zig the first relation, or is it zag?". If we aren't, then this can be a 1.
{max(P(j) + 1 if A[i] > A[j] for every 0 <= j < i)
This should be O(n) with the right memoization (two DP arrays). These calls return the length of the solution - the actual result can be found by storing "parent pointer" whenever a max value is found, and then traversing backwards on these pointers.
I have written an algorithm which solves the minimum number of clique in a graph. I have tested my backtracking algorithm, but I couldn't calculate the worst case time complexity, I have tried a lot of times.
I know that this problem is an NP hard problem, but I think is it possible to give a worst time complexity based on the code. What is the worst time complexity for this code? Any idea? How you formalize the recursive equation?
I have tried to write understandable code. If you have any question, write a comment.
I will be very glad for tips, references, answers.
Thanks for the tips guys:).
EDIT
As M C commented basically I have tried to solve this problem Clique cover problem
Pseudocode:
function countCliques(graph, vertice, cliques, numberOfClique, minimumSolution)
for i = 1 .. number of cliques + 1 new loop
if i > minimumSolution then
return;
end if
if (fitToClique(cliques(i), vertice, graph) then
addVerticeToClique(cliques(i), vertice);
if (vertice == 0) then //last vertice
minimumSolution = numberOfClique
printResult(result);
else
if (i == number of cliques + 1) then // if we are using a new clique the +1 always a new clique
countCliques(graph, vertice - 1, cliques, number of cliques + 1, minimum)
else
countCliques(graph, vertice - 1, cliques, number of cliques, minimum)
end if
end if
deleteVerticeFromClique(cliques(i), vertice);
end if
end loop
end function
bool fitToClique(clique, vertice, graph)
for ( i = 1 .. cliqueSize) loop
verticeFromClique = clique(i)
if (not connected(verticeFromClique, vertice)) then
return false
end if
end loop
return true
end function
Code
int countCliques(int** graph, int currentVertice, int** result, int numberOfSubset, int& minimum) {
// if solution
if (currentVertice == -1) {
// if a better solution
if (minimum > numberOfSubset) {
minimum = numberOfSubset;
printf("New minimum result:\n");
print(result, numberOfSubset);
}
c++;
} else {
// if not a solution, try to insert to a clique, if not fit then create a new clique (+1 in the loop)
for (int i = 0; i < numberOfSubset + 1; i++) {
if (i > minimum) {
break;
}
//if fit
if (fitToSubset(result[i], currentVertice, graph)) {
// insert
result[i][0]++;
result[i][result[i][0]] = currentVertice;
// try to insert the next vertice
countCliques(graph, currentVertice - 1, result, (i == numberOfSubset) ? (i + 1) : numberOfSubset, minimum);
// delete vertice from the clique
result[i][0]--;
}
}
}
return c;
}
bool fitToSubset(int *subSet, int currentVertice, int **graph) {
int subsetLength = subSet[0];
for (int i = 1; i < subsetLength + 1; i++) {
if (graph[subSet[i]][currentVertice] != 1) {
return false;
}
}
return true;
}
void print(int **result, int n) {
for (int i = 0; i < n; i++) {
int m = result[i][0];
printf("[");
for (int j = 1; j < m; j++) {
printf("%d, ",result[i][j] + 1);
}
printf("%d]\n", result[i][m] + 1);
}
}
int** readFile(const char* file, int& v, int& e) {
int from, to;
int **graph;
FILE *graphFile;
fopen_s(&graphFile, file, "r");
fscanf_s(graphFile,"%d %d", &v, &e);
graph = (int**)malloc(v * sizeof(int));
for (int i = 0; i < v; i ++) {
graph[i] = (int*)calloc(v, sizeof(int));
}
while(fscanf_s(graphFile,"%d %d", &from, &to) == 2) {
graph[from - 1][to - 1] = 1;
graph[to - 1][from - 1] = 1;
}
fclose(graphFile);
return graph;
}
The time complexity of your algorithm is very closely linked to listing compositions of an integer, of which there are O(2^N).
The compositions alone is not enough though, as there is also a combinatorial aspect, although there are rules as well. Specifically, a clique must contain the highest numbered unused vertex.
An example is the composition 2-2-1 (N = 5). The first clique must contain 4, reducing the number of unused vertices to 4. There is then a choice between 1 of 4 elements, unused vertices is now 3. 1 element of the second clique is known, so 2 unused vertices. Thus must be a choice between 1 of 2 elements decides the final vertex in the second clique. This only leaves a single vertex for the last clique. For this composition there are 8 possible ways it could be made, given by (1*C(4,1)*1*C(2,1)*1). The 8 possible ways are as followed:
(5,4),(3,2),(1)
(5,4),(3,1),(2)
(5,3),(4,2),(1)
(5,3),(4,1),(2)
(5,2),(4,3),(1)
(5,2),(4,1),(3)
(5,1),(4,3),(2)
(5,1),(4,2),(3)
The above example shows the format required for the worst case, which is when the composition contains the as many 2s as possible. I'm thinking this is still O(N!) even though it's actually (N-1)(N-3)(N-5)...(1) or (N-1)(N-3)(N-5)...(2). However, it is impossible as it would as shown require a complete graph, which would be caught right away, and limit the graph to a single clique, of which there is only one solution.
Given the variations of the compositions, the number of possible compositions is probably a fair starting point for the upper bound as O(2^N). That there are O(3^(N/3)) maximal cliques is another bit of useful information, as the algorithm could theoretically find all of them. Although that isn't good enough either as some maximal cliques are found multiple times while others not at all.
A tighter upper bound is difficult for two main reasons. First, the algorithm progressively limits the max number of cliques, which I suppose you could call the size of the composition, which puts an upper limit on the computation time spent per clique. Second, missing edges cause a large number of possible variations to be ignored, which almost ensures that the vast majority of the O(N!) variations are ignored. Combined with the above paragraph, makes putting the upper bound difficult. If this isn't enough for an answer, you might want to take the question to math area of stack exchange as a better answer will require a fair bit of mathematical analysis.