Related
I work in a logistic department for a company, recently we have been trying to narrow down the amount of different packaging options that we use.
I have all the necessary product data like length, width, height, volume and also sales data.
So I was thinking if it is possible to use an algorithm to cluster the different volumes of the products and maybe also take into account which sizes are selling the most, to determine, which box sizes would be ideal.
(Taking into account how often a product sells is secondary so that is not absolutely necessary)
What I want is that I can give the Algorithm an amount of how many different boxsizes I want and the algorithm should determine where to put the limits, so that there is a solution for every product that we have. With the goal of the optimization being minimum volume wasted while also not using more than the set amount of different boxes.
Also important to note, the orientation of the products and the amount per box is set, so there is no need to determine how to pack the products and how many go into one box idealy or something like that.
What kind of algorithms could be used for a problem like this and what are my options to program them? I was thinking of using Matlab, but would also be open for other possible options. I want to program it, not simply use an existing program like SPSS.
Thanks in advance and forgive me if my english is not the best, I'm not a native speaker.
The following C++ program will find optimal solutions for small instances. For 10 input box sizes, each having dimensions randomly chosen in the range 1..100, and for any number 1..10 of box sizes to choose, it computes the answer in a couple of seconds on my computer. For 15 input box sizes, it takes around 10s. For 20 input box sizes, I could compute up to 4 chosen box sizes in about 3 minutes, with memory becoming an issue (it used around 3GB). I had to increase the linker's default stack size to avoid stack overflows.
#include <iostream>
#include <algorithm>
#include <vector>
#include <array>
#include <map>
#include <set>
#include <functional>
#include <climits>
using namespace std;
ostream& operator<<(ostream& os, array<int, 3> a) {
return os << '(' << a[0] << ", " << a[1] << ", " << a[2] << ')';
}
template <int N>
long long vol(array<int, N> b) {
return static_cast<long long>(b[0]) * b[1] * b[2];
}
template <int N, int M>
bool fits(array<int, N> a, array<int, M> b) {
return a[0] <= b[0] && a[1] <= b[1] && a[2] <= b[2];
}
// Compares first by volume, then lexicographically.
struct CompareByVolumeDesc {
bool operator()(array<int, 3> a, array<int, 3> b) const {
return vol(a) > vol(b) || vol(a) == vol(b) && a < b;
}
};
vector<array<int, 3>> candSizes;
struct State {
vector<array<int, 4>> req;
int n;
int k;
// Needed for map<>
bool operator<(State const& other) const {
return make_tuple(n, k, req) < make_tuple(other.n, other.k, other.req);
}
} dummy = { {}, -1, -1 };
map<State, pair<int, State>> memo;
// Compute the minimum volume required for the given list of box sizes if we use exactly k of the first n candidate box sizes.
pair<long long, State> solve(State const& s) {
if (empty(s.req)) return { 0, dummy };
if (s.k == 0 || s.k > s.n) return { LLONG_MAX / 4, dummy };
auto previousAnswer = memo.find(s);
if (previousAnswer != end(memo)) return (*previousAnswer).second;
// Try using the nth candidate box size.
int nFitting = 0;
vector<array<int, 4>> notFitting;
for (auto r : s.req) {
if (fits(r, candSizes[s.n - 1])) {
nFitting += r[3];
} else {
notFitting.push_back(r);
}
}
pair<long long, State> solution;
solution.second = { s.req, s.n - 1, s.k };
solution.first = solve(solution.second).first;
if (nFitting > 0) {
State useNth = { notFitting, s.n - 1, s.k - 1 };
long long useNthVol = nFitting * vol(candSizes[s.n - 1]) + solve(useNth).first;
if (useNthVol < solution.first) solution = { useNthVol, useNth };
}
memo[s] = solution;
return solution;
}
void printOptimalSolution(State s) {
while (!empty(s.req)) {
State next = solve(s).second;
if (next.k < s.k) cout << candSizes[s.n - 1] << endl;
s = next;
}
}
int main(int argc, char** argv) {
int n, k;
cin >> n >> k;
vector<array<int, 4>> requestedBoxSizes;
set<int> lengths, widths, heights;
for (int i = 0; i < n; ++i) {
array<int, 4> d; // d[3] is actually the number of requests for this box size
cin >> d[0] >> d[1] >> d[2] >> d[3];
sort(begin(d), begin(d) + 3, std::greater<int>());
requestedBoxSizes.push_back(d);
lengths.insert(d[0]);
widths.insert(d[1]);
heights.insert(d[2]);
}
// Generate all candidate box sizes
for (int l : lengths) {
for (int w : widths) {
for (int h : heights) {
array<int, 3> cand = { l, w, h };
sort(begin(cand), end(cand), std::greater<int>());
candSizes.push_back(cand);
}
}
}
sort(begin(candSizes), end(candSizes), CompareByVolumeDesc());
candSizes.erase(unique(begin(candSizes), end(candSizes)), end(candSizes));
cout << "Number of candidate box sizes: " << size(candSizes) << endl;
State startState = { requestedBoxSizes, static_cast<int>(size(candSizes)), k };
long long minVolume = solve(startState).first;
cout << "Minimum achievable volume using " << k << " box sizes: " << minVolume << endl;
cout << "Optimal set of " << k << " box sizes:" << endl;
printOptimalSolution(startState);
return 0;
}
Example input:
15 5
100 61 35 27
17 89 96 47
31 69 30 55
37 23 39 9
94 11 48 19
38 17 29 36
63 79 80 36
59 52 37 51
86 63 54 7
32 30 11 26
50 88 51 5
74 70 33 14
67 46 4 79
83 94 89 58
65 42 37 69
Example output:
Number of candidate box sizes: 2310
Minimum achievable volume using 5 box sizes: 124069460
Optimal set of 5 box sizes:
(94, 48, 11)
(69, 52, 37)
(100, 89, 35)
(88, 79, 63)
(94, 89, 83)
I'll explain the algorithm behind this if there's interest. It's better than considering all possible combinations of k candidate box sizes, but not terribly efficient.
I am using <stdlib.h> rand() function to generate 100 random integers within range [0 ... 9]. I used the following way to generate them on equal distribution,
int random_numbers[100];
for(register int i = 0; i < 100; i++){
random_numbers[i] = rand() % 10;
}
This is working fine. But now I want to get 100 numbers where I want around 50% of those numbers to be 5. How do I do that?
Extended Problem
I want to get 100 numbers. What if I want 50% of those number will be between 0~2. I mean 50 percent of those number will consists only with number 0, 1, 2. How to do that?
I am expecting generalised steps which can be applied beyond the boundary of 10 or 100.
Hmmm, how about choosing a random number between 0 and 17, and if the number is greater than 9, change it to 5?
For 0 - 17, you would get a distribution like
0,1,2,3,4,5,6,7,8,9,5,5,5,5,5,5,5,5
Code:
int random_numbers[100];
for(register int i = 0; i < 100; i++){
random_numbers[i] = rand() % 18;
if (random_numbers[i] > 9) {
random_numbers[i] = 5;
}
}
You basically add a set of numbers beyond your desired range that, when translated to 5 give you equal numbers of 5 and non-5.
In order to get around 50% of these numbers to be in [0, 2] range you can split the full range of rand() into two equal halves and then use the same %-based technique to map the first half to [0, 2] range and the second half to [3, 9] range.
int random_numbers[100];
for(int i = 0; i < 100; i++)
{
int r = rand();
random_numbers[i] = r <= RAND_MAX / 2 ? r % 3 : r % 7 + 3;
}
To to get around 50% of these numbers to be 5 a similar technique will work. Just map the second half to [0, 9] range with 5 excluded
int random_numbers[100];
for(int i = 0; i < 100; i++)
{
int r = rand();
if (r <= RAND_MAX / 2)
r = 5;
else if ((r %= 9) >= 5)
++r;
random_numbers[i] = r;
}
I think it is easy to solve the particular problem of 50% using the techniques mentioned by other answers. Let us try to answer the question for a general case -
Let us say you want a distribution where you want the numbers {A1, A2, .. An} with the percentages {P1, P2, .. Pn} and sum of Pi is 100% (and all the percentages are integers, if not it can be adjusted).
We will create an array of 100 size and fill it with the numbers A1-An.
int distribution[100];
Now we fill each number, it's percentage number of times.
int postion = 0;
for (int i = 0; i < n; i++) {
for( int j = 0; j < P[i]; j++) {
// Add a check here to make sure the sum hasn't crossed 100
distribution[position] = A[i];
position ++;
}
}
Now that this initialization is done once, you can draw a random number as -
int number = distribution[rand() % 100];
In case your percentages are not integers but say you want precision of 0.1%, you can create an array of 1000 instead of 100.
In both case, the goal is 50% selected from one set and 50% from another. Code could call rand() and uses some bits (one) for choosing the group and the remaining bits for value selection.
If the range of numbers needed is much smaller than RAND_MAX, a first attempt could use:
int rand_special_50percent(int n, int special) {
int r = rand();
int r_div_2 = r/2;
if (r%2) {
return special;
}
int y = r_div_2%(n-1); // 9 numbers left
if (y >= special) y++;
return y;
}
int rand_low_50percent(int n, int low_special) {
int r = rand();
int r_div_2 = r/2;
if (r%2) {
return r_div_2%(low_special+1);
}
return r_div_2%(n - low_special) + low_special + 1;
}
Sample
int r5 = rand_special_50percent(10, 5);
int preferred_low_value_max = 2;
int r012 = rand_low_50percent(10, preferred_low_value_max);
Advanced:
With n above RAND_MAX/2, additional calls to rand() are needed.
When using rand()%n, unless (RAND_MAX+1u)%n == 0 (n is a divisor of RAND_MAX+1), a bias is introduced. The above code does not compensate for that.
C++11 solution (not optimal but easy)
std::piecewise_constant_distribution can generate random real numbers (float or double) for given intervals and weights for the each interval.
Not optimal because this solution is generating double and converting double to int. Also getting exactly 50 from [0,3) 100 samples is not guaranteed but for around 50 samples is guaranteed.
For your case : 2 intervals - [0,3), [3,100) and their weights [1,1]
Equal weights, so ~50% of the numbers from [0,3) and ~50% from [3,100)
#include <iostream>
#include <string>
#include <map>
#include <random>
int main()
{
std::random_device rd;
std::mt19937 gen(rd());
std::vector<double> intervals{0, 3, 3, 100};
std::vector<double> weights{ 1, 0, 1};
std::piecewise_constant_distribution<> d(intervals.begin(), intervals.end(), weights.begin());
std::map<int, int> hist;
for(int n=0; n<100; ++n) {
++hist[(int)d(gen)];
}
for(auto p : hist) {
std::cout << p.first << " : generated " << p.second << " times"<< '\n';
}
}
Output:
0 : generated 22 times
1 : generated 19 times
2 : generated 16 times
4 : generated 1 times
5 : generated 2 times
8 : generated 1 times
12 : generated 1 times
17 : generated 1 times
19 : generated 1 times
22 : generated 2 times
23 : generated 1 times
25 : generated 1 times
29 : generated 1 times
30 : generated 2 times
31 : generated 1 times
36 : generated 1 times
38 : generated 1 times
44 : generated 1 times
45 : generated 1 times
48 : generated 1 times
49 : generated 1 times
51 : generated 1 times
52 : generated 1 times
53 : generated 1 times
57 : generated 2 times
58 : generated 3 times
62 : generated 1 times
65 : generated 2 times
68 : generated 1 times
71 : generated 1 times
76 : generated 2 times
77 : generated 1 times
85 : generated 1 times
90 : generated 1 times
94 : generated 1 times
95 : generated 1 times
96 : generated 2 times
I was reading about this and thought to form an algorithm to find the minimum number of moves to solve this.
Constraints I made: An N X N matrix having one empty slot ,say 0, would be plotted having numbers 0 to n-1.
Now we have to recreate this matrix and form the matrix having numbers in increasing order from left to right beginning from the top row and have the last element 0 i.e. (N X Nth)element.
For example,
Input :
8 4 0
7 2 5
1 3 6
Output:
1 2 3
4 5 6
7 8 0
Now the problem is how to do this in minimum number of steps possible.
As in game(link provided) you can either move left, right, up or bottom and shift the 0(empty slot) to corresponding position to make the final matrix.
The output to printed for this algorithm is number of steps say M and then Tile(number) moved in the direction say, 1 for swapping with upper adjacent element, 2 for lower adjacent element, 3 for left adjacent element and 4 for right adjacent element.
Like, for
2 <--- order of N X N matrix
3 1
0 2
Answer should be: 3 4 1 2 where 3 is M and 4 1 2 are steps to tile movement.
So I have to minimise the complexity for this algorithm and want to find minimum number of moves. Please suggest me the most efficient approach to solve this algorithm.
Edit:
What I coded in c++, Please see the algorithm rather than pointing out other issues in code .
#include <bits/stdc++.h>
using namespace std;
int inDex=0,shift[100000],N,initial[500][500],final[500][500];
struct Node
{
Node* parent;
int mat[500][500];
int x, y;
int cost;
int level;
};
Node* newNode(int mat[500][500], int x, int y, int newX,
int newY, int level, Node* parent)
{
Node* node = new Node;
node->parent = parent;
memcpy(node->mat, mat, sizeof node->mat);
swap(node->mat[x][y], node->mat[newX][newY]);
node->cost = INT_MAX;
node->level = level;
node->x = newX;
node->y = newY;
return node;
}
int row[] = { 1, 0, -1, 0 };
int col[] = { 0, -1, 0, 1 };
int calculateCost(int initial[500][500], int final[500][500])
{
int count = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
if (initial[i][j] && initial[i][j] != final[i][j])
count++;
return count;
}
int isSafe(int x, int y)
{
return (x >= 0 && x < N && y >= 0 && y < N);
}
struct comp
{
bool operator()(const Node* lhs, const Node* rhs) const
{
return (lhs->cost + lhs->level) > (rhs->cost + rhs->level);
}
};
void solve(int initial[500][500], int x, int y,
int final[500][500])
{
priority_queue<Node*, std::vector<Node*>, comp> pq;
Node* root = newNode(initial, x, y, x, y, 0, NULL);
Node* prev = newNode(initial,x,y,x,y,0,NULL);
root->cost = calculateCost(initial, final);
pq.push(root);
while (!pq.empty())
{
Node* min = pq.top();
if(min->x > prev->x)
{
shift[inDex] = 4;
inDex++;
}
else if(min->x < prev->x)
{
shift[inDex] = 3;
inDex++;
}
else if(min->y > prev->y)
{
shift[inDex] = 2;
inDex++;
}
else if(min->y < prev->y)
{
shift[inDex] = 1;
inDex++;
}
prev = pq.top();
pq.pop();
if (min->cost == 0)
{
cout << min->level << endl;
return;
}
for (int i = 0; i < 4; i++)
{
if (isSafe(min->x + row[i], min->y + col[i]))
{
Node* child = newNode(min->mat, min->x,
min->y, min->x + row[i],
min->y + col[i],
min->level + 1, min);
child->cost = calculateCost(child->mat, final);
pq.push(child);
}
}
}
}
int main()
{
cin >> N;
int i,j,k=1;
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
cin >> initial[j][i];
}
}
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
final[j][i] = k;
k++;
}
}
final[N-1][N-1] = 0;
int x = 0, y = 1,a[100][100];
solve(initial, x, y, final);
for(i=0;i<inDex;i++)
{
cout << shift[i] << endl;
}
return 0;
}
In this above code I am checking for each child node which has the minimum cost(how many numbers are misplaced from the final matrix numbers).
I want to make this algorithm further efficient and reduce it's time complexity. Any suggestions would be appreciable.
While this sounds a lot like a homework problem, I'll lend a bit of help.
For significantly small problems, like your 2x2 or 3x3, you can just brute force it. Basically, you do every possible combination with every possible move, track how many turns each took, and then print out the smallest.
To improve on this, maintain a list of solved solutions, and then any time you make a possible move, if that moves already done, stop trying that one since it can't possible be the smallest.
Example, say I'm in this state (flattening your matrix to a string for ease of display):
5736291084
6753291084
5736291084
Notice that we're back to a state we've seen before. That means it can't possible be the smallest move, because the smallest would be done without returning to a previous state.
You'll want to create a tree doing this, so you'd have something like:
134
529
870
/ \
/ \
/ \
/ \
134 134
529 520
807 879
/ | \ / | \
/ | X / X \
134 134 134 134 134 130
509 529 529 502 529 524
827 087 870 879 870 879
And so on. Notice I marked some with X because they were duplicates, and thus we wouldn't want to pursue them any further since we know they can't be the smallest.
You'd just keep repeating this until you've tried all possible solutions (i.e., all non-stopped leaves reach a solution), then you just see which was the shortest. You could also do it in parallel so you stop once any one has found a solution, saving you time.
This brute force approach won't be effective against large matrices. To solve those, you're looking at some serious software engineering. One approach you could take with it would be to break it into smaller matrices and solve that way, but that may not be the best path.
This is a tricky problem to solve at larger values, and is up there with some of the trickier NP problems out there.
Start from solution, determine ranks of permuation
The reverse of above would be how you can pre-generate a list of all possible values.
Start with the solution. That has a rank of permutation of 0 (as in, zero moves):
012
345
678
Then, make all possible moves from there. All of those moves have rank of permutation of 1, as in, one move to solve.
012
0 345
678
/ \
/ \
/ \
102 312
1 345 045
678 678
Repeat that as above. Each new level all has the same rank of permutation. Generate all possible moves (in this case, until all of your branches are killed off as duplicates).
You can then store all of them into an object. Flattening the matrix would make this easy (using JavaScript syntax just for example):
{
'012345678': 0,
'102345678': 1,
'312045678': 1,
'142305678': 2,
// and so on
}
Then, to solve your question "minimum number of moves", just find the entry that is the same as your starting point. The rank of permutation is the answer.
This would be a good solution if you are in a scenario where you can pre-generate the entire solution. It would take time to generate, but lookups would be lightning fast (this is similar to "rainbow tables" for cracking hashes).
If you must solve on the fly (without pre-generation), then the first solution, start with the answer and work your way move-by-move until you find a solution would be better.
While the maximum complexity is O(n!), there are only O(n^2) possible solutions. Chopping off duplicates from the tree as you go, your complexity will be somewhere in between those two, probably in the neighborhood of O(n^3) ~ O(2^n)
You can use BFS.
Each state is one vertex, and there is an edge between two vertices if they can transfer to each other.
For example
8 4 0
7 2 5
1 3 6
and
8 0 4
7 2 5
1 3 6
are connected.
Usually, you may want to use some numbers to represent your current state. For small grid, you can just follow the sequence of the number. For example,
8 4 0
7 2 5
1 3 6
is just 840725136.
If the grid is large, you may consider using the rank of the permutation of the numbers as your representation of the state. For example,
0 1 2
3 4 5
6 7 8
should be 0, as it is the first in permutation.
And
0 1 2
3 4 5
6 7 8
(which is represented by 0)
and
1 0 2
3 4 5
6 7 8
(which is represented by some other number X)
are connected is the same as 0 and X are connected in the graph.
The complexity of the algo should be O(n!) as there are at most n! vertices/permutations.
I'm trying to work on a sub-problem of an larger algorithm which I am really struggling on!
The Problem
If I had a array of numbers (say A), how can I efficiently list all the numbers that can be made by multiplying the numbers together (which can be used as many times as you want) and is less than another number (say x).
For example, let's say I had A = [7, 11, 13] and x was 1010, the answers would be:
- 7 = 7
- 11 = 11
- 13 = 13
- 7*7 = 49
- 7*11 = 77
- 7*13 = 91
- 11*11 = 121
- 11*13 = 143
- 13*13 = 169
- 7*7*7 = 343
- 7*7*11 = 539
- 7*7*13 = 637
- 7*11*11 = 847
- 7*11*13 = 1001
I tried my best not to miss any (but feel free to edit if I have)!
I can tell this is probably some type of recursion but am really struggling on this one!
Optional
A naive solution will also be nice (that's how much I'm struggling).
Running time is also optional.
UPDATE
All numbers in A are all the prime numbers (except 1, 2, 3, 5) got from the sieve of eratosthenes.
UPDATE 2
A is also sorted
UPDATE 3
All numbers in A is under the limit
UPDATE 4
The solution does NOT need to be recursion. That was just an idea I had. And Java or Pseudo code more preferable!
I'd go with using a queue. The algorithm I have in mind would be something like the following (in pseudocode):
multiplyUntil(A, X)
{
queue q = A.toQueue();
result;
while(!q.isEmpty())
{
element = q.pop();
result.add(element); // only if the initial elements are guaranteed to be < X otherwise you should add other checks
for(int i = 0; i < A.length; i++)
{
product = element * A[i];
// A is sorted so if this product is >= X the following will also be >= X
if(product >= X)
{
// get out of the inner cycle
break;
}
q.add(product);
}
}
return result;
}
Let me know if something is unclear.
P.S: Keep in mind that the result is not guaranteed to be sorted. If you want the result to be sorted you could use a heap instead of a queue or sort the result in the end of the computation.
Here's solution on Java along with comments. It's pretty straightforward to translate it to other language.
// numbers is original numbers like {7, 11, 13}, not modified
// offset is the offset of the currently processed number (0 = first)
// limit is the maximal allowed product
// current array is the current combination, each element denotes
// the number of times given number is used. E. g. {1, 2, 0} = 7*11*11
private static void getProducts(int[] numbers, int offset, int limit, int[] current) {
if(offset == numbers.length) {
// all numbers proceed: output the current combination
int product = 1;
StringBuilder res = new StringBuilder();
for(int i=0; i<offset; i++) {
for(int j = 0; j<current[i]; j++) {
if(res.length() > 0) res.append(" * ");
res.append(numbers[i]);
product *= numbers[i];
}
}
// instead of printing you may copy the result to some collection
if(product != 1)
System.out.println(" - "+res+" = "+product);
return;
}
int n = numbers[offset];
int count = 0;
while(limit >= 1) {
current[offset] = count;
getProducts(numbers, offset+1, limit, current);
count++;
// here the main trick: we reduce limit for the subsequent recursive calls
// note that in Java it's integer division
limit/=n;
}
}
// Main method to launch
public static void getProducts(int[] numbers, int limit) {
getProducts(numbers, 0, limit, new int[numbers.length]);
}
Usage:
public static void main(String[] args) {
getProducts(new int[] {7, 11, 13}, 1010);
}
Output:
- 13 = 13
- 13 * 13 = 169
- 11 = 11
- 11 * 13 = 143
- 11 * 11 = 121
- 7 = 7
- 7 * 13 = 91
- 7 * 11 = 77
- 7 * 11 * 13 = 1001
- 7 * 11 * 11 = 847
- 7 * 7 = 49
- 7 * 7 * 13 = 637
- 7 * 7 * 11 = 539
- 7 * 7 * 7 = 343
The resulting products are sorted in different way, but I guess sorting is not a big problem.
Here is my solution in C++. I use a recursive function. The principle is:
the recursive function is given a limit, a current which is a composite and a range of primes [start, end(
it will output all combination of powers of the primes in the given range, multiplied by the current composite
At each step, the function takes the first prime p from the range, and compute all its powers. It then multiplies current by the p as long as the product, cp is under the limit.
We use the fact the array is sorted by leaving as soon as cp is above the limit.
Due to the way we compute the numbers they won't be sorted. But it is easy to add this as a final step once you collected the numbers (in which case ou would use a back_inserter output iterator instead of an ostream_iterator, and do a sort on the collection vector)
#include <algorithm>
#include <iostream>
#include <iterator>
using namespace std;
template <class It, class Out>
void f(int limit, int current, It start, It end, Out out) {
// terminal condition
if(start == end) {
if(current != 1)
*(out++) = current;
return;
}
// Output all numbers where current prime is a factor
// starts at p^0 until p^n where p^n > limit
int p = *start;
for(int cp = current; cp < limit; cp *= p) {
f(limit, cp, start+1, end, out);
}
}
int main(int argc, char* argv[]) {
int const N = 1010;
vector<int> primes{7, 11, 13};
f(N, 1, begin(primes), end(primes), ostream_iterator<int>(cout, "\n"));
}
Problem
Provided I have two arrays:
const int N = 1000000;
float A[N];
myStruct *B[N];
The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1}) on the GPU? The array B should be changed according to A (A is keys, B is values).
Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.
Has anyone come across this kind of problem?
Thrust example
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
thrust::greater<float>() );
Thrust's documentation on Github is not up-to-date. As #JaredHoberock said, thrust::partition is the way to go since it now supports stencils. You may need to get a copy from the Github repository:
git clone git://github.com/thrust/thrust.git
Then run scons doc in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...). With the new stencil partition, you can do:
#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
thrust::partition(thrust::host, // if you want to test on the host
thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
keyVec.begin(),
is_positive());
This returns:
Before:
keyVec = 0 -1 2 -3 4 -5 6 -7 8 -9
valVec = 0 1 2 3 4 5 6 7 8 9
After:
keyVec = 0 2 4 6 8 -5 -3 -7 -1 -9
valVec = 0 2 4 6 8 5 3 7 1 9
Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition:
stable_partition differs from partition in that stable_partition is
guaranteed to preserve relative order. That is, if x and y are
elements in [first, last), such that pred(x) == pred(y), and if x
precedes y, then it will still be true after stable_partition that x
precedes y.
If you want a complete example, here it is:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
void print_vec(const thrust::host_vector<int>& v)
{
for(size_t i = 0; i < v.size(); i++)
std::cout << " " << v[i];
std::cout << "\n";
}
int main ()
{
const int N = 10;
thrust::host_vector<int> keyVec(N);
thrust::host_vector<int> valVec(N);
int sign = 1;
for(int i = 0; i < N; ++i)
{
keyVec[i] = sign * i;
valVec[i] = i;
sign *= -1;
}
// Copy host to device
thrust::device_vector<int> d_keyVec = keyVec;
thrust::device_vector<int> d_valVec = valVec;
std::cout << "Before:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
// Partition key-val on device
thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
d_keyVec.begin(),
is_positive());
// Copy result back to host
keyVec = d_keyVec;
valVec = d_valVec;
std::cout << "After:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
}
UPDATE
I made a quick comparison with the thrust::sort_by_key version, and the thrust::partition implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.
How about this?:
Count how many positive numbers to determine the inflexion point
Evenly divide each side of the inflexion point into groups (negative-groups are all same length but different length to positive-groups. these groups are the memory chunks for the results)
Use one kernel call (one thread) per chunk pair
Each kernel swaps any out-of-place elements in the input groups into the desired output groups. You will need to flag any chunks that have more swaps than the maximum so that you can fix them during subsequent iterations.
Repeat until done
Memory traffic is swaps only (from original element position, to sorted position). I don't know if this algorithm sounds like anything already defined...
You should be able to achieve this in thrust simply with a modification of your comparison operator:
struct my_compare
{
__device__ __host__ bool operator()(const float x, const float y) const
{
return !((x<0.0f) && (y>0.0f));
}
};
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
my_compare() );