Generating Permutations in Lexicographic Order vs Sorting? - algorithm

I'm a little bit confused. How is the problem of generating permutations in Lexicographic Order any different from the problem of sorting? Can someone please explain it to me with an example? Thanks

These are two different things. There are N! permutations, but there is only one sorted order (the sorted permutation is the smallest lexicographically).
Here is an example of a sorted permutation:
brown fox quick
Here is a list of permutations in lexicographic order:
brown fox quick
brown quick fox
fox brown quick
fox quick brown
quick brown fox
quick fox brown
Here is a program in C++ to generate permutations in lexicographic order:
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
using namespace std;
int main() {
vector<string> s;
s.push_back("quick");
s.push_back("brown");
s.push_back("fox");
sort(s.begin(), s.end());
do {
for(int i = 0 ; i != s.size() ; i++) {
cout << s[i] << " ";
}
cout << endl;
} while (next_permutation(s.begin(), s.end()));
return 0;
}

Permutations are not addressed in the problem of sorting.
One way that they could relate is if you generate Permutations that are not in lexicographic order then you sort to get it in lexicographical order. This however would require to have factorial space. Generation usually spits out one element at a time therefore not having to have all elements in memory.

There's a fairly easy way to generate the nth permutation in lexicographic order. The set of choices you make in selecting the permutation elements are: pick 1 of N, then 1 of N-1, then 1 of N-2, ... then 1 of 2, and finally there's just one left. Those choices, as index values into a running "what's left" list, can be looked at as a variable-base number.
You can develop the digits from right to left as d[1] = n%2, d[2] = (n/2)%3, d[3] = (n/6)%4, ... d[k] = (n/k!) % (k+1). The result has d[N-1]==0 for the first (N-1)! permutations, d[N-1]==1 for the next (N-1)!, and so on. You can see that these index values will be in lex. order. Then choose the symbols out of your sorted set (Any random-access collection will do if syms[0], syms[1], ... are in the order you want.)
Here's some code I whipped up for working on Project Euler problems. It just generates the index values, and allows for choosing permutations of k symbols out of n. The header file defaults k to -1, and the argument check code converts this to n and generates full length permutations. There's also a change of notation here: "index" is the number of the permutation ("n" above) and "n" is the set size ("N" above).
vector<int> pe_permutation::getperm(long long index, int n, int k)
{
if (n<0) throw invalid_argument("permutation order (n)");
if (k<0 || k>n)
{
if (k==-1)
k=n;
else throw invalid_argument("permutation size (k)");
}
vector<int> sset(n, 0); // generate initial selection set {0..n-1}
for (int i=1; i<n; ++i)
sset[i] = i;
// Initialize result to sset index values. These are "without replacement"
// index values into a vector that decreases in size as each result value
// is chosen.
vector<int> result(k,0);
long long r = index;
for (int m=n-k+1; m<=n; ++m)
{
result[n-m] = (int)(r % m);
r = (r / m);
}
// Choose values from selection set:
for (int i=0; i<k; ++i)
{
int j = result[i];
result[i] = sset[j];
sset.erase(sset.begin()+j);
}
return result;
} // getperm(long long, int, int)

import java.util.*;
public class Un{
public static void main(String args[]){
int[]x={1,2,3,4};
int b=0;int k=3;
while(b!=(1*2*3*4)){
int count=0;
while(count!=6){
for(int i=2;i>0;i--){
int temp=x[i];
x[i]=x[3];
x[3]=temp;
count++;
System.out.println(x[0]+""+x[1]+""+x[2]+""+x[3]);
}
}
b+=count;
int temp=x[0];
x[0]=x[k];
x[k]=temp;
k--;
}
}
}

Related

Algorithm that will maintain top n items in the past k days?

I would like to implement a data structure maintaining a set S for a leaderboard that can answer the following queries efficiently, while also being memory-efficient:
add(x, t) Add a new item with score x to set S with an associated time t.
query(u) List the top n items (sorted by score) in the set S which has associated time t such that t + k >= u. Each subsequent query will have a u no smaller than previous queries.
In standard English, highscores can be added to this leaderboard individually, and I'd like an algorithm that can efficiently query the top n items on the leaderboard within the post k days (where k and n are fixed constants).
n can be assumed to be much less than the total number of items, and scores may be assumed to be random.
A naïve algorithm would be to store all elements as they are added into a balanced binary search tree sorted by score, and remove elements from the tree when they are more than k days old. Detecting elements that are more than k days old can be done with another balanced binary search tree sorted by time. This algorithm would yield a good time complexity of O(log(h)) where h is the total number of scores added in the past k days. However, the space complexity is O(h), and it is easy to see that most of the data saved will never be reported in a query even if no new scores are added for the next k days.
If n is 1, a simple double-ended queue is all that is necessary. Before adding a new item to the front of the queue, remove items from the front that have a smaller score than the new item, because they will never be reported in a query. Before querying, remove items from the back of the queue that are too old, then return the item that is left at the back of the queue. All operations would be amortized constant time complexity, and I wouldn't be storing items that would never be reported.
When n is more than 1, I can't seem to be able to formulate an algorithm which has a good time complexity and only stores items that could possibly be reported. An algorithm with time complexity O(log(h)) would be great, but n is small enough so that O(log(h) + n) is acceptable too.
Any ideas? Thanks!
This solution is based on the double-ended queue solution and I assume t is ascending.
The idea is that a record can be removed if there are n records with both larger t and larger x than it, which is implemented by Record.count in the sample code.
As each record would be moved from S to temp at most n times, we have average time complexity O(n).
The space complexity is hard to decide. However, it looks fine in the simulation. S.size() is about 400 when h = 10000 and n = 50.
#include <iostream>
#include <vector>
#include <queue>
#include <cstdlib>
using namespace std;
const int k = 10000, n = 50;
class Record {
public:
Record(int _x, int _t): x(_x), t(_t), count(n) {}
int x, t, count;
};
deque<Record> S;
void add(int x, int t)
{
Record record(x, t);
vector<Record> temp;
while (!S.empty() && record.x >= S.back().x) {
if (--S.back().count > 0) temp.push_back(S.back());
S.pop_back();
}
S.push_back(record);
while (!temp.empty()) {
S.push_back(temp.back());
temp.pop_back();
}
}
vector<int> query(int u)
{
while (S.front().t + k < u)
S.pop_front();
vector<int> xs;
for (int i = 0; i < S.size() && i < n; ++i)
xs.push_back(S[i].x);
return xs;
}
int main()
{
for (int t = 1; t <= 1000000; ++t) {
add(rand(), t);
vector<int> xs = query(t);
if (t % k == 0) {
cout << "t = " << t << endl;
cout << "S.size() = " << S.size() << endl;
for (auto x: xs) cout << x << " ";
cout << endl;
}
}
return 0;
}

Efficiently count occurrences of each element from given ranges

So i have some ranges like these:
2 4
1 9
4 5
4 7
For this the result should be
1 -> 1
2 -> 2
3 -> 2
4 -> 4
5 -> 3
6 -> 2
7 -> 2
8 -> 1
9 -> 1
The naive approach will be to loop through all the ranges but that would be very inefficient and the worst case would take O(n * n)
What would be the efficient approach probably in O(n) or O(log(n))
Here's the solution, in O(n):
The rationale is to add a range [a, b] as a +1 in a, and a -1 after b. Then, after adding all the ranges, then compute the accumulated sums for that array and display it.
If you need to perform queries while adding the values, a better choice would be to use a Binary Indexed Tree, but your question doesn't seem to require this, so I left it out.
#include <iostream>
#define MAX 1000
using namespace std;
int T[MAX];
int main() {
int a, b;
int min_index = 0x1f1f1f1f, max_index = 0;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
min_index = min(min_index, a);
max_index = max(max_index, b);
}
for(int i=min_index; i<=max_index; i++) {
T[i] += T[i-1];
cout << i << " -> " << T[i] << endl;
}
}
UPDATE: Based on the "provocations" (in a good sense) by גלעד ברקן, you can also do this in O(n log n):
#include <iostream>
#include <map>
#define ull unsigned long long
#define miit map<ull, int>::iterator
using namespace std;
map<ull, int> T;
int main() {
ull a, b;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
}
ull last;
int count = 0;
for(miit it = T.begin(); it != T.end(); it++) {
if (count > 0)
for(ull i=last; i<it->first; i++)
cout << i << " " << count << endl;
count += it->second;
last = it->first;
}
}
The advantage of this solution is being able to support ranges with much larger values (as long as the output isn't so large).
The solution would be pretty simple:
generate two lists with the indices of all starting and ending indices of the ranges and sort them.
Generate a counter for the number of ranges that cover the current index. Start at the first item that is at any range and iterate over all numbers to the last element that is in any range. Now if an index is either part of the list of starting-indices, we add 1 to the counter, if it's an element of the ending-indices, we substract 1 from the counter.
Implementation:
vector<int> count(int** ranges , int rangecount , int rangemin , int rangemax)
{
vector<int> res;
set<int> open, close;
for(int** r = ranges ; r < ranges + sizeof(int*) * rangecount ; r++)
{
open.add((*r)[0]);
close.add((*r)[1]);
}
int rc = 0;
for(int i = rangemin ; i < rangemax ; i++)
{
if(open.count(i))
++rc;
res.add(rc);
if(close.count(i))
--rc;
}
return res;
}
Paul's answer still counts from "the first item that is at any range and iterate[s] over all numbers to the last element that is in any range." But what is we could aggregate overlapping counts? For example, if we have three (or say a very large number of) overlapping ranges [(2,6),[1,6],[2,8] the section (2,6) could be dependent only on the number of ranges, if we were to label the overlaps with their counts [(1),3(2,6),(7,8)]).
Using binary search (once for the start and a second time for the end of each interval), we could split the intervals and aggregate the counts in O(n * log m * l) time, where n is our number of given ranges and m is the number of resulting groups in the total range and l varies as the number of disjoint updates required for a particular overlap (the number of groups already within that range). Notice that at any time, we simply have a sorted list grouped as intervals with labeled count.
2 4
1 9
4 5
4 7
=>
(2,4)
(1),2(2,4),(5,9)
(1),2(2,3),3(4),2(5),(6,9)
(1),2(2,3),4(4),3(5),2(6,7),(8,9)
So you want the output to be an array, where the value of each element is the number of input ranges that include it?
Yeah, the obvious solution would be to increment every element in the range by 1, for each range.
I think you can get more efficient if you sort the input ranges by start (primary), end (secondary). So for 32bit start and end, start:end can be a 64bit sort key. Actually, just sorting by start is fine, we need to sort the ends differently anyway.
Then you can see how many ranges you enter for an element, and (with a pqueue of range-ends) see how many you already left.
# pseudo-code with possible bugs.
# TODO: peek or put-back the element from ranges / ends
# that made the condition false.
pqueue ends; // priority queue
int depth = 0; // how many ranges contain this element
for i in output.len {
while (r = ranges.next && r.start <= i) {
ends.push(r.end);
depth++;
}
while (ends.pop < i) {
depth--;
}
output[i] = depth;
}
assert ends.empty();
Actually, we can just sort the starts and ends separately into two separate priority queues. There's no need to build the pqueue on the fly. (Sorting an array of integers is more efficient than sorting an array of structs by one struct member, because you don't have to copy around as much data.)

My code for merge sort in C++ using dynamic arrays returning garbage values

Please tell me why this code is giving garbage values
Compiles well, implemented this based on the Cormen algorithm for mergesorting
Basically taking given numbers in a dynamic array. two void functions are taken.One is to merge the two sub arrays via merge sort and the other to recursively split the array to sub arrays
#include<iostream>
using namespace std;
void merge(int *A,int p, int q, int r)// a function to merge two //sub arrays
{
int n1= q-p+1;
int n2=r-q;
int L[n1];
int R[n2];
for (int i=0;i<n1;i++)
{
L[i]=A[p+i];
}
int m=1;
for(int j=0; j<n2 ;j++)
{
R[j]=A[q+m];
m=m+1;
}
int i=0;
int j=0;
for(int k=0;k<r;k++)
{
if (L[i]<=R[j])
{
A[k]=L[i];
i=i+1;
}
else
{
A[k]=R[j];
j=j+1;
}
}
}
void mergesort(int *A,int p,int r)// dividng the sequence to sub arrays
{
if (p<r)
{
int q;
q=(p+r)/2;
mergesort(A,p,q);
mergesort(A,(q+1),r);
merge(A,p,q,r);
}
}
int main()
{
int n;
cout<<"Enter the number of numbers to be sorted by merge sort"<<endl;
cin>>n;
int* a=NULL;
a=new int[n];
int temp;
cout<<"Enter the numbers"<<endl;
for(int i=0;i<n;i++)
{
cin>>temp;
*(a+i)=temp;// inputting the given numbers into a dynamic array
}
cout<<"The given numbers are:"<<endl;
for(int j=0;j<n;j++)
cout<<*(a+j)<<" ";
mergesort(a,0,n-1);
cout<<"The merged sorted numbers are:"<<endl;
for(int s=0;s<n;s++)
cout<<*(a+s)<<" ";
delete [] a;
system("pause");
return 0;
}
You are getting your intervals wrong pretty much everywhere in your code. For example:
Based on your usage in main, mergesort is supposed to sort the sublist of indices [0,n-1].
With this meaning, your recursion in mergesort says in order to sort the indices [p,r-1], you should first sort [p,q-1] then sort [q+1,r-1]: you completely ignore index q.
Similarly, merge is confused: once you fix the typo when coping into L (A[i] should be A[p+i]), it takes [p,q] as one list, and [q,r] as the other list: note you copy entry q twice, and you also copy r when you probably shouldn't be.
To fix your code, you need to straighten out exactly what intervals everything is supposed to be working on. This isn't a hard problem, you just have to bring yourself to write down explicitly exactly what all of your functions and loops and stuff are supposed to be doing.
The typical convention these days is half-open intervals: you should generally think of taking indices [p,q) from a list. ([p,q) is the same as [p,q-1]) Here are several examples of why this is preferred:
The number of entries of [p,r) is simply r-p
A for loop iterating through [p,r) is the usual for(i=p; i<r; ++i) (not <=)
Splitting the interval [p,r) into parts gives you intervals [p,q) and [q,r) -- there is no worry about remembering to add 1 in places.
e.g. merge would normally be designed to take the first list comes from indices [p,q) and the second list from indices [q,r).

Minimizing sum of absolute values of differences

An SPOJ question:
Given two arrays, A and B, of positive numbers between 1 and 1,000,000. I have to pair each integer a in A with an integer b in B such that the sum of absolute values of differences is minimized. A and B can contain a maximum of 5000 integers each.
For example:
Let A=[10, 15, 13] and B=[14,13, 12], then the best pairing is (10, 12), (15, 14) and (13, 13) because |10-12|+|15-14|+|13-13|=3, which is the least we can achieve. Thus, the minimum sum achieved is 3.
I believe it is a dynamic programming question.
Edit:
The arrays may be of different sizes but can contain a maximum of 5000 elements each.
My code:
#include <cmath>
#include <vector>
#include <iostream>
#include <cstring>
#include <algorithm>
#include <cstdio>
using namespace std;
static int DP[5002][5002], N, M, tmp;
vector<int> B, C;
int main()
{
scanf("%d %d", &N, &M); memset(DP, -1, sizeof DP);
B.push_back(0); C.push_back(0); DP[0][0]=0;
for(int i=1; i<=N; ++i){scanf("%d", &tmp); B.push_back(tmp);} \\inputting numbers.
for(int i=1; i<=M; ++i){scanf("%d", &tmp); C.push_back(tmp);}
sort(B.begin(), B.end()); sort(C.begin(), C.end()); \\Sorting the two arrays.
if(C.size()<=B.size()){ \\Deciding whether two swap the order of arrays.
for(int i=1; i<=N; ++i){
for(int j=1; j<=M; ++j){
if(j>i)break;
if(j==1)DP[i][j]=abs(C[j]-B[i]);
else{
tmp=DP[i-1][j-1]+abs(C[j]-B[i]);
DP[i][j]=(DP[i-1][j]!=-1)? min(tmp, DP[i-1][j]): tmp;
}
}
}
printf("%d\n", DP[N][M]); \\Outputting the final result.
}
else{
for(int i=1; i<=M; ++i){
for(int j=1; j<=N; ++j){
if(j>i) break;
if(j==1)DP[i][j]=abs(C[i]-B[j]);
else{
tmp=DP[i-1][j-1]+abs(C[i]-B[j]);
DP[i][j]=(DP[i-1][j]!=-1)? min(tmp, DP[i-1][j]): tmp;
}
}
}
printf("%d\n", DP[M][N]);
}
return 0;
}
Niels's comment elucidates that, if the arrays are of the same size, then you should sort them and pair the values. We can build on that to construct the general case:
I'll assume the length of the first array arr1 is smaller than or equal to the length of the second arr2. If it isn't, just swap them. First, sort both arrays, and let dp[A][B] be the smallest difference when you consider only the subarrays arr1[A...] and arr2[B...] (that is, arr1 from A forward and arr2 from B to the end). You have two choices:
Pair A and B. In this case you'd get a total difference of |arr1[A]-arr2[B]| + dp[A+1][B+1].
Don't use B. Note that in this case you'll never consider B again (because if you pair A and B to different elements, then you could swap both pairs and the sum would go down). So you can simply ignore B and your answer will be dp[A][B+1].
Base cases should be fairly obvious:
dp[length of arr1][length of arr2] = 0
dp[A][length of arr2] = infinity (it's impossible to pair the remaining elements of arr1).

find minimum step to make a number from a pair of number

Let's assume that we have a pair of numbers (a, b). We can get a new pair (a + b, b) or (a, a + b) from the given pair in a single step.
Let the initial pair of numbers be (1,1). Our task is to find number k, that is, the least number of steps needed to transform (1,1) into the pair where at least one number equals n.
I solved it by finding all the possible pairs and then return min steps in which the given number is formed, but it taking quite long time to compute.I guess this must be somehow related with finding gcd.can some one please help or provide me some link for the concept.
Here is the program that solved the issue but it is not cleat to me...
#include <iostream>
using namespace std;
#define INF 1000000000
int n,r=INF;
int f(int a,int b){
if(b<=0)return INF;
if(a>1&&b==1)return a-1;
return f(b,a-a/b*b)+a/b;
}
int main(){
cin>>n;
for(int i=1;i<=n/2;i++){
r=min(r,f(n,i));
}
cout<<(n==1?0:r)<<endl;
}
My approach to such problems(one I got from projecteuler.net) is to calculate the first few terms of the sequence and then search in oeis for a sequence with the same terms. This can result in a solutions order of magnitude faster. In your case the sequence is probably: http://oeis.org/A178031 but unfortunately it has no easy to use formula.
:
As the constraint for n is relatively small you can do a dp on the minimum number of steps required to get to the pair (a,b) from (1,1). You take a two dimensional array that stores the answer for a given pair and then you do a recursion with memoization:
int mem[5001][5001];
int solve(int a, int b) {
if (a == 0) {
return mem[a][b] = b + 1;
}
if (mem[a][b] != -1) {
return mem[a][b];
}
if (a == 1 && b == 1) {
return mem[a][b] = 0;
}
int res;
if (a > b) {
swap(a,b);
}
if (mem[a][b%a] == -1) { // not yet calculated
res = solve(a, b%a);
} else { // already calculated
res = mem[a][b%a];
}
res += b/a;
return mem[a][b] = res;
}
int main() {
memset(mem, -1, sizeof(mem));
int n;
cin >> n;
int best = -1;
for (int i = 1; i <= n; ++i) {
int temp = solve(n, i);
if (best == -1 || temp < best) {
best = temp;
}
}
cout << best << endl;
}
In fact in this case there is not much difference between dp and BFS, but this is the general approach to such problems. Hope this helps.
EDIT: return a big enough value in the dp if a is zero
You can use the breadth first search algorithm to do this. At each step you generate all possible NEXT steps that you havent seen before. If the set of next steps contains the result you're done if not repeat. The number of times you repeat this is the minimum number of transformations.
First of all, the maximum number you can get after k-3 steps is kth fibinocci number. Let t be the magic ratio.
Now, for n start with (n, upper(n/t) ).
If x>y:
NumSteps(x,y) = NumSteps(x-y,y)+1
Else:
NumSteps(x,y) = NumSteps(x,y-x)+1
Iteratively calculate NumSteps(n, upper(n/t) )
PS: Using upper(n/t) might not always provide the optimal solution. You can do some local search around this value for the optimal results. To ensure optimality you can try ALL the values from 0 to n-1, in which worst case complexity is O(n^2). But, if the optimal value results from a value close to upper(n/t), the solution is O(nlogn)

Resources