What are "source" and "destination" parameters in MPI_Cart_shift? - parallel-processing

Here it is written that the output parameters of MPI_Cart_shift are ranks of the source and destination processes. However, in this tutorial (code below) what is returned as the source process is later used in MPI_Isend to send messages. Anyone can clear it up - what actually "source" and "destination" mean?
#include "mpi.h"
#include <stdio.h>
#define SIZE 16
#define UP 0
#define DOWN 1
#define LEFT 2
#define RIGHT 3
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, source, dest, outbuf, i, tag=1,
inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,},
nbrs[4], dims[2]={4,4},
periods[2]={0,0}, reorder=0, coords[2];
MPI_Request reqs[8];
MPI_Status stats[8];
MPI_Comm cartcomm;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks == SIZE) {
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &cartcomm);
MPI_Comm_rank(cartcomm, &rank);
MPI_Cart_coords(cartcomm, rank, 2, coords);
MPI_Cart_shift(cartcomm, 0, 1, &nbrs[UP], &nbrs[DOWN]);
MPI_Cart_shift(cartcomm, 1, 1, &nbrs[LEFT], &nbrs[RIGHT]);
printf("rank= %d coords= %d %d neighbors(u,d,l,r)= %d %d %d %d\n",
rank,coords[0],coords[1],nbrs[UP],nbrs[DOWN],nbrs[LEFT],
nbrs[RIGHT]);
outbuf = rank;
for (i=0; i<4; i++) {
dest = nbrs[i];
source = nbrs[i];
MPI_Isend(&outbuf, 1, MPI_INT, dest, tag,
MPI_COMM_WORLD, &reqs[i]);
MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag,
MPI_COMM_WORLD, &reqs[i+4]);
}
MPI_Waitall(8, reqs, stats);
printf("rank= %d inbuf(u,d,l,r)= %d %d %d %d\n",
rank,inbuf[UP],inbuf[DOWN],inbuf[LEFT],inbuf[RIGHT]); }
else
printf("Must specify %d processors. Terminating.\n",SIZE);
MPI_Finalize();
}

MPI_Cart_shift: Returns the shifted source and destination ranks, given a shift direction and amount
int MPI_Cart_shift(MPI_Comm comm, int direction, int displ, int *source, int *dest)
What you hand in to the function is comm, direction and displ. Where direction specifies the dimension in which the displacement is taken. The displacement is the distance.
Example
Imagine a 2D cart topology like this (names are not ranks but process-names, only for explanation):
A1 A2 A3 A4 A5
B1 B2 B3 B4 B5
C1 C2 C3 C4 C5
D1 D2 D3 D4 D5
E1 E2 E3 E4 E5
As you might already have understood you are writing SPMD-Code in MPI, therefore we can now pick, w.l.o.g., one process to show what is happening. Let's pick C3
The general idea of MPI_Cart_shift is that we get the rank of a specified process in our topology.
First, we have to decide in which direction we want to go, let's pick 0, which is the column dimension.
Then we have to specify a distance to the other process, let's say this is 2.
So the call would be like:
MPI_Cart_shift(cartcomm, 0, 2, &source, &dest);
Now, the ranks which are placed into the source and dest variables are those respectively of the processes A3 and E3.
How to interpret the results
I (C3) want to send data to the process in the same column with a distance of 2. So this is the dest rank.
If you do the same from the viewpoint of A3: process A3 gets as its dest field the rank of C3.
And this is what source says: what is the rank of the process which is sending me those data if it calls the same MPI_Cart_shift.
If there is no process at the specified place the variable contains MPI_PROC_NULL.
So the results of the call at each process would look like this (with source|dest for each process, using - for MPI_PROC_NULL):
MPI_Cart_shift(cartcomm, 0, 2, &source, &dest);
A1 A2 A3 A4 A5
-|C1 -|C2 -|C3 -|C4 -|C5
B1 B2 B3 B4 B5
-|D1 -|D2 -|D3 -|D4 -|D5
C1 C2 C3 C4 C5
A1|E1 A2|E2 A3|E3 A4|E4 A5|E5
D1 D2 D3 D4 D5
B1|- B2|- B3|- B4|- B5|-
E1 E2 E3 E4 E5
C1|- C2|- C3|- C4|- C5|-
Additional bit of information
If you create the cart with any dimension set periods = 1 then there is a virtual edge between the first and the last node of the cart. In this example, periods[0] = 1 would make a connection between A1 and E1, between A2 and E2, and so on. If you then call the MPI_Cart_shift, the counting has to be wrapped around the corners so your output would be:
A1 A2 A3 A4 A5
D1|C1 D2|C2 D3|C3 D4|C4 D5|C5
B1 B2 B3 B4 B5
E1|D1 E2|D2 E3|D3 E4|D4 E5|D5
C1 C2 C3 C4 C5
A1|E1 A2|E2 A3|E3 A4|E4 A5|E5
D1 D2 D3 D4 D5
B1|A1 B2|A2 B3|A3 B4|A4 B5|A5
E1 E2 E3 E4 E5
C1|B1 C2|B2 C3|B3 C4|B4 C5|B5

MPI_Cart_shift is a convenience function. It's primary usage is for data shifts, i.e. operations in which each rank sends data in a certain direction (i.e. to destination) and receives data from the opposite direction (i.e. from source) (forward operation). When source is used as destination and destination as source, data flows in the opposite direction (backward operation). An example of such operation is the halo swapping and it usually requires two shifts along each dimension - one forward and one backward.
MPI_Cart_shift is a convenience function since its action is equivalent to the following set of MPI calls:
// 1. Determine the rank of the current process
int rank;
MPI_Comm_rank(cartcomm, &rank);
// 2. Transform the rank into topology coordinates
int coords[ndims];
MPI_Cart_coords(cartcomm, rank, ndims, coords);
// 3. Save the current coordinate along the given direction
int saved_coord = coords[direction];
// 4. Compute the "+"-shifted position and convert to rank
coords[direction] = saved_coord + displ;
// Adjust for periodic boundary if necessary
if (periods[direction])
coords[direction] %= dims[direction];
// 5. Convert to rank
MPI_Cart_rank(cartcomm, coords, &destination);
// 6. Compute the "-"-shifted position and convert to rank
coords[direction] = saved_coord - displ;
// Adjust for periodic boundary
if (periods[direction])
coords[direction] %= dims[direction];
// 7. Convert to rank
MPI_Cart_rank(cartcomm, coords, &source);
One could also compute the rank<->coordinate transforms using arithmetic without calls to MPI_Cart_rank or MPI_Cart_coords but it would be very inflexible as the formulas change when the dimensionality of the topology changes.
Something very important. The ranks as computed by MPI_Cart_shift (or by the equivalent code above) are related to the cartcomm communicator. Those match the ranks in the original communicator (the one used in MPI_Cart_create) only if reorder = 0. When reordering is allowed, the ranks could differ and therefore one should not use those ranks within the context of the original communicator. The following code of yours is valid but strongly dependent on the fact that reorder = 0 in the call to MPI_Cart_create:
dest = nbrs[i];
source = nbrs[i];
MPI_Isend(&outbuf, 1, MPI_INT, dest, tag,
MPI_COMM_WORLD, &reqs[i]);
MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag,
MPI_COMM_WORLD, &reqs[i+4]);
Here nbrs are computed within cartcomm and then used within MPI_COMM_WORLD. The correct code should use cartcomm in both communication calls:
MPI_Isend(&outbuf, 1, MPI_INT, dest, tag,
cartcomm, &reqs[i]);
MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag,
cartcomm, &reqs[i+4]);
Some algorithms require that data travels the other way round, i.e. forward and backward are swapped. For such algorithms the displacement displ specified could be negative. In general, a call to MPI_Cart_shift with negative displacement is equivalent to a call with positive displacement but source and destination swapped.

Related

All possible combinations of N objects in K buckets

Suppose I have 3 boxes labeled A, B, C and I have 2 balls, B1 and B2. I want to get all possible combinations of these balls in the boxes. Please note, it is important to know which ball is in each box, meaning B1 and B2 are not the same.
A B C
B1, B2
B1 B2
B1 B2
B2 B1
B2 B1
B1, B2
B1 B2
B2 B1
B1, B2
Edit
If there is a known algorithm for this problem, please tell me its name.
Let N be number of buckets (3 in the example), M number of balls (2). Now, let's have a look at numbers in a range [0..N**M) - [0..9) in the example; these numbers we represent with radix = N. For the example in the question we have trinary numbers
Now we can easily interprete these numbers: first digit shows 1st ball location, second - 2nd ball position.
|--- Second Ball position [0..2]
||-- First Ball position [0..2]
||
0 = 00 - both balls are in the bucket #0 (`A`)
1 = 01 - first ball is in the bucket #1 ('B'), second is in the bucket #0 (`A`)
2 = 02 - first ball is in the bucket #2 ('C'), second is in the bucket #0 (`A`)
3 = 10 - first ball is in the bucket #0 ('A'), second is in the bucket #1 (`B`)
4 = 11 - both balls are in the bucket #1 (`B`)
5 = 12 ...
6 = 20
7 = 21 ...
8 = 22 - both balls are in the bucket #2 (`C`)
the general algorithm is:
For each number in 0 .. N**M range
ith ball (i = 0..M-1) will be in the bucket # (number / N**i) % N (here / stands for integer division, % for remainder)
If you want just total count, the answer is simple N ** M, in the example above 3 ** 2 == 9
C# Code The algorithm itself is easy to implement:
static IEnumerable<int[]> BallsLocations(int boxCount, int ballCount) {
BigInteger count = BigInteger.Pow(boxCount, ballCount);
for (BigInteger i = 0; i < count; ++i) {
int[] balls = new int[ballCount];
int index = 0;
for (BigInteger value = i; value > 0; value /= boxCount)
balls[index++] = (int)(value % boxCount);
yield return balls;
}
}
It's answer representation which can be entangled:
static IEnumerable<string> BallsSolutions(int boxCount, int ballCount) {
foreach (int[] balls in BallsLocations(boxCount, ballCount)) {
List<int>[] boxes = Enumerable
.Range(0, boxCount)
.Select(_ => new List<int>())
.ToArray();
for (int j = 0; j < balls.Length; ++j)
boxes[balls[j]].Add(j + 1);
yield return string.Join(Environment.NewLine, boxes
.Select((item, index) => $"Box {index + 1} : {string.Join(", ", item.Select(b => $"B{b}"))}"));
}
}
Demo:
int balls = 3;
int boxes = 2;
string report = string.Join(
Environment.NewLine + "------------------" + Environment.NewLine,
BallsSolutions(boxes, balls));
Console.Write(report);
Outcome:
Box 1 : B1, B2, B3
Box 2 :
------------------
Box 1 : B2, B3
Box 2 : B1
------------------
Box 1 : B1, B3
Box 2 : B2
------------------
Box 1 : B3
Box 2 : B1, B2
------------------
Box 1 : B1, B2
Box 2 : B3
------------------
Box 1 : B2
Box 2 : B1, B3
------------------
Box 1 : B1
Box 2 : B2, B3
------------------
Box 1 :
Box 2 : B1, B2, B3
Fiddle
There's a very simple recursive implementation that at each level adds the current ball to each box. The recursion ends when all balls have been processed.
Here's some Java code to illustrate. We use a Stack to represent each box so we can simply pop the last-added ball after each level of recursion.
void boxBalls(List<Stack<String>> boxes, String[] balls, int i)
{
if(i == balls.length)
{
System.out.println(boxes);
return;
}
for(Stack<String> box : boxes)
{
box.push(balls[i]);
boxBalls(boxes, balls, i+1);
box.pop();
}
}
Test:
String[] balls = {"B1", "B2"};
List<Stack<String>> boxes = new ArrayList<>();
for(int i=0; i<3; i++) boxes.add(new Stack<>());
boxBalls(boxes, balls, 0);
Output:
[[B1, B2], [], []]
[[B1], [B2], []]
[[B1], [], [B2]]
[[B2], [B1], []]
[[], [B1, B2], []]
[[], [B1], [B2]]
[[B2], [], [B1]]
[[], [B2], [B1]]
[[], [], [B1, B2]]

Scheduling Algorithm with limitations

Thanks to user3125280, D.W. and Evgeny Kluev the question is updated.
I have a list of webpages and I must download them frequently, each webpage got a different download frequency. Based on this frequency we group the webpages in 5 groups:
Items in group 1 are downloaded once per 1 hour
items in group 2 once per 2 hours
items in group 3 once per 4 hours
items in group 4 once per 12 hours
items in group 5 once per 24 hours
This means, we must download all the group 1 webpages in 1 hour, all the group 2 in 2 hours etc.
I am trying to make an algorithm. As input, I have:
a) DATA_ARR = one array with 5 numbers. Each number represents the number of items in this group.
b) TIME_ARR = one array with 5 numbers (1, 2, 4, 12, 24) representing how often the items will be downloaded.
b) X = the total number of webpages to download per hour. This is calculated using items_in_group/download_frequently and rounded upwards. If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.
Every hour my program must download at max X sites. I expect the algorithm to output something like:
Hour 1: A1 B0 C4 D5
Hour 2: A2 B1 C2 D2
...
A1 = 2nd item of 1st group
C0 = 1st item of 3rd group
My algorithm must be as efficient as possible. This means:
a) the pattern must be extendable to at least 200+ hours
b) no need to create a repeatable pattern
c) spaces are needed when possible in order to use the absolute minimum bandwidth
d) never ever download an item more often than the update frequency, no exceptions
Example:
group 1: 0 items | once per 1 hour
group 2: 3 items | once per 2 hours
group 3: 4 items | once per 4 hours
group 4: 0 items | once per 12 hours
group 5: 0 items | once per 24 hours
We calculate the number of items we can take per hour: 3/2+4/4 = 2.5. We round this upwards and it's 3.
Using pencil and paper, we can found the following solution:
Hour 1: B0 C0 B1
Hour 2: B2 C1 c2
Hour 3: B0 C3 B1
Hour 4: B2
Hour 5: B0 C0 B1
Hour 6: B2 C1 c2
Hour 7: B0 C3 B1
Hour 8: B2
Hour 9: B0 C0 B1
Hour 10: B2 C1 c2
Hour 11: B0 C3 B1
Hour 12: B2
Hour 13: B0 C0 B1
Hour 14: B2 C1 c2
and continue the above.
We take C0, C1 C2, and C3 once every 4 hours. We also take B0, B1 and B2 once every 2 hours.
Question: Please, explain to me, how to design an algorithm able to download the items, while using the absolute minimum number of downloads? Brute force is NOT a solution and the algorithm must be efficient CPU wise because the number of elements can be huge.
You may read the answer posted here: https://cs.stackexchange.com/a/19422/12497 as well as the answer posted bellow by user3125280.
You problem is a typical scheduling problem. These kinds of problems are well studied in computer science so there is a huge array of literature to consult.
The code is kind of like Deficit round robin, but with a few simplifications. First, we feed the queues ourself by adding to the data_to_process variable. Secondly, the queues just iterate through a list of values.
One difference is that this solution will get the optimal value you want, barring mathematical error.
Rough sketch: have not compiled (c++11) unix based, to spec code
#include <iostream>
#include <vector>
#include <numeric>
#include <unistd.h>
//#include <cmath> //for ceil
#define TIME_SCALE ((double)60.0) //1 for realtime speed
//Assuming you are not refreshing ints in the real case
template<typename T>
struct queue
{
const std::vector<T> data; //this will be filled with numbers
int position;
double refresh_rate; //must be refreshed ever ~ hours
double data_rate; //this many refreshes per hour
double credit; //amount of refreshes owed
queue(std::initializer_list<T> v, int r ) :
data(v), position(0), refresh_rate(r), credit(0) {
data_rate = data.size() / (double) refresh_rate;
}
int getNext() {
return data[position++ % data.size()];
}
};
double time_passed(){
static double total;
//if(total < 20){ //stop early
usleep(60000000 / TIME_SCALE); //sleep for a minute
total += 1.0 / 60.0; //add a minute
std::cout << "Time: " << total << std::endl;
return 1.0; //change to 1.0 / 60.0 for real time speed
//} else return 0;
}
int main()
{
//keep a list of the queues
std::vector<queue<int> > queues{
{{1, 2, 3}, 2},
{{1, 2, 3, 4}, 3}};
double total_data_rate = 0;
for(auto q : queues) total_data_rate += q.data_rate;
double data_to_process = 0; //how many refreshes we have to do
int queue_number = 0; //which queue we are processing
auto current_queue = &queues[0];
while(1) {
data_to_process += time_passed() * total_data_rate;
//data_to_process = ceil(data_to_process) //optional
while(data_to_process >= 1){
//data_to_process >= 0 will make the the scheduler more
//eager in the first time period (ie. everything will updated correctly
//in the first period and and following periods
if(current_queue->credit >= 1){
//don't change here though, since credit determines the weighting only,
//not how many refreshes are made
//refresh(current_queue.getNext();
std::cout << "From queue " << queue_number << " refreshed " <<
current_queue->getNext() << std::endl;
current_queue->credit -= 1;
data_to_process -= 1;
} else {
queue_number = (queue_number + 1) % queues.size();
current_queue = &queues[queue_number];
current_queue->credit += current_queue->data_rate;
}
}
}
return 0;
}
The example should now compile on gcc with --std=c++11 and give you what you want.
and here is test case output: (for non-time scaled earlier code)
Time: 0
From queue 1 refreshed 1
From queue 0 refreshed 1
From queue 1 refreshed 2
Time: 1
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 3
Time: 2
From queue 0 refreshed 1
From queue 1 refreshed 4
From queue 1 refreshed 1
Time: 3
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 2
Time: 4
From queue 0 refreshed 1
From queue 1 refreshed 3
From queue 0 refreshed 2
Time: 5
From queue 0 refreshed 3
From queue 1 refreshed 4
From queue 1 refreshed 1
As an extension, to answer the repeating pattern problem by allowing this scheduler to complete only the first lcm(update_rate * lcm(...refresh rates...), ceil(update_rate)) steps, and then repeating the pattern.
ALSO: this will, indeed, be unsolvable sometimes because of the requirement on hour boundaries. When I use your unsolvable example, and modify time_passed to return 0.1, the schedule is solved with updates every 1.1 hours (just not at the hour boundaries!).
It seems your constraints are all over the place. To quickly summarise my other answer:
It meets the refresh rates only on average
It does the least number of downloads at hour intervals required to fulfil the above
It was based on these (sometimes unfulfillable) constraints
Update at discrete, 1 hour intervals
Update the fewest items each time
Update each item at fixed intervals
and broke 3.
Since both the hourly interval and least-each-time constraints are not really necessary, I will give a simpler, better answer here, which breaks 2.
#include <iostream>
#include <vector>
#include <numeric>
#include <unistd.h>
#define TIME_SCALE ((double)60.0)
//Assuming you are not refreshing ints in the real case
template<typename T>
struct queue
{
const std::vector<T> data; //this is the data to refresh
int position; //this is the data we are up to
double refresh_rate; //must be refreshed every this many hours
double data_rate; //this many refreshes per hour
double credit; //is owed this many refreshes
const char* name;//a name for each queue
queue(std::initializer_list<T> v, int r, const char* n ) :
data(v), position(0), refresh_rate(r), credit(0), name(n) {
data_rate = data.size() / (double) refresh_rate;
}
void refresh() {
std::cout << "From queue " << name << " refreshed " << data[position++ % data.size()] << "\n";
}
};
double time_passed(){
static double total;
usleep(60000000 / TIME_SCALE); //sleep for a minute
total += 1.0; //add a minute
std::cout << "Time: " << total << std::endl;
return 1.0; //change to 1.0 / 60.0 for real time speed
}
int main()
{
//keep a list of the queues
std::vector<queue<int> > queues{
{{1}, 1, "A"},
{{1}, 2, "B"}};
while(1) {
auto t = time_passed();
for(queue<int>& q : queues) {
q.credit += q.data_rate * t;
while(q.credit >= 1){
q.refresh();
q.credit -= 1.0;
}
}
}
return 0;
}
It has the potential, however, to schedule many refreshes on the same hour. There is a third option as well, which breaks the hour-interval rule and updates only one at a time.
I think this is the easiest and requires the minimal number of updates (like the previous answer) but doesn't break rule 3.

Converting data from 8 bits to 12 bits

I am getting signal that is stored as a buffer of char data (8 bits).
I am also getting the same signal plus 24 dB and my boss told me that it should be possible to reconstruct from those two buffers, one (which will be used as output) that will be stored as 12 bits.
I would like to know the mathematical operation that can do that and why choosing +24dB.
Thanks (I am dumb ><).
From the problem statement, I guess you have an analog signal which are sampled at two amlitudes. Both signals has a resolution of 8 bits, but one is shifted and truncated.
You could get a 12 bit signal by combining the upper 4 bits of the first signal, and concatenating them with the second signal.
sOut = ((sIn1 & 0xF0) << 4) | sIn2
If you want to get a little better accuracy, you could try to calculate an average over the common bits of the two signals. Normally, the lower 4 bits of the first signal should be approximately equal to the upper 4 bits of the second signal. Due to rounding-errors or noise, the values could be slightly different. One of the values could even have overflowed, and moved to the other end of the range.
int Combine(byte sIn1, byte sIn2)
{
int a = sIn1 >> 4; // Upper 4 bits
int b1 = sIn1 & 0x0F; // Common middle 4 bits
int b2 = sIn2 >> 4; // Common middle 4 bits
int c = sIn2 & 0x0F; // Lower 4 bits
int b;
if (b1 >= 12 && b2 < 4)
{
// Assume b2 has overflowed, and wrapped around to a smaller value.
// We need to add 16 to it to compensate the average.
b = (b1 + b2 + 16)/2;
}
else if (b1 < 4 && b2 >= 12)
{
// Assume b2 has underflowed, and wrapped around to a larger value.
// We need to subtract 16 from it to compensate the average.
b = (b1 + b2 - 16)/2;
}
else
{
// Neither or both has overflowed. Just take the average.
b = (b1 + b2)/2;
}
// Construct the combined signal.
return a * 256 + b * 16 + c;
}
When I tested this, it reproduced the signal accurately more often than the first formula.

UITextField int data type xcode

I am trying to get a mathmetical equation to recognise a + /- sign of an integer (either -1 or +1) entered in a UItextfield (s1, s2). So if the user enters different signs the equations will be subtracted from each other.
It seems that the sign is not being recognised for some reason and the program just adds d1 and d2.
-(IBAction)calculateD:(id)sender{
float n1, r1, n2, r1, d, d1, d2;
int s1, s2;
s1= [textfieldS1.text intvalue]; //etc for all variables
d1 = s1 * ((n1-1)/r1);
d2 = s2 * ((n2-1)/r2);
if (s1 != s2) { d = d1 - d2;}
else { d = d1 + d2;
}}
Any problems apparent in this code please?
I have no idea what you are trying to do here. Variables are not initialized and there is no specific reference to actual UITextField inside of the -calculateD: method. With this said, here are some hints, hope it will come in hand.
The signs s1, s2 are actually taken twice into a consideration. Once to produce d1, d2, and later to decide the (s1 != s2). Because of this, the latter will make sure you add two numbers of the same sign, possibly negating what you really want to obtain here. Example:
say that s1=+1, s1=+1, then you got d = ((n1-1)/r1) + ((n2-1)/r2);
say that s1=+1, s1=-1, then you got d = ((n1-1)/r1) + ((n2-1)/r2); the same as before;
Just drop the if, and leave a single: d = d1 + d2.

Most elegant way to expand card hand suits

I'm storing 4-card hands in a way to treat hands with different suits the same, e.g.:
9h 8h 7c 6c
is the same as
9d 8d 7h 6h
since you can replace one suit with another and have the same thing. It's easy to turn these into a unique representation using wildcards for suits. THe previous would become:
9A 8A 7B 6B
My question is - what's the most elegant way to turn the latter back into a list of the former? For example, when the input is 9A 8A 7B 6B, the output should be:
9c 8c 7d 6d
9c 8c 7h 6h
9c 8c 7s 6s
9h 8h 7d 6d
9h 8h 7c 6c
9h 8h 7s 6s
9d 8d 7c 6c
9d 8d 7h 6h
9d 8d 7s 6s
9s 8s 7d 6d
9s 8s 7h 6h
9s 8s 7c 6c
I have some ugly code that does this on a case-by-case basis depending on how many unique suits there are. It won't scale to hands with more cards. Also in a situation like:
7A 7B 8A 8B
it will have duplicates, since in this case A=c and B=d is the same as A=d and B=c.
What's an elegant way to solve this problem efficiently? I'm coding in C, but I can convert higher-level code down to C.
There are only 4 suits so the space of possible substitutions is really small - 4! = 24 cases.
In this case, I don't think it is worth it, to try to come up with something especially clever.
Just parse the string like "7A 7B 8A 8B", count the number of different letters in it, and based on that number, generate substitutions based on a precomputed set of substitutions.
1 letter -> 4 possible substitutions c, d, h, or s
2 letters -> 12 substitutions like in Your example.
3 or 4 letters -> 24 substitutions.
Then sort the set of substitutions and remove duplicates. You have do sort the tokens in every string like "7c 8d 9d 9s" and then sort an array of the strings to detect duplicates but that shouldn't be a problem. It's good to have the patterns like "7A 7B 8A 8B" sorted too (the tokens like: "7A", "8B" are in an ascending order).
EDIT:
An alternative for sorting might be, to detect identical sets if ranks associated with two or more suits and take it into account when generating substitutions, but it's more complicated I think. You would have to create a set of ranks for each letter appearing in the pattern string.
For example, for the string "7A 7B 8A 8B", with the letter A, associated is the set {7, 8} and the same set is associated with the letter B. Then You have to look for identical sets associated with different letters. In most cases those sets will have just one element, but they might have two as in the example above. Letters associated with the same set are interchangeable. You can have following situations
1 letter no duplicates -> 4 possible substitutions c, d, h, or s
2 letters no duplicates -> 12 substitutions.
2 letters, 2 letters interchangeable (identical sets for both letters) -> 6 substitutions.
3 letters no duplicates -> 24 substitutions.
3 letters, 2 letters interchangeable -> 12 substitutions.
4 letters no duplicates -> 24 substitutions.
4 letters, 2 letters interchangeable -> 12 substitutions.
4 letters, 3 letters interchangeable -> 4 substitutions.
4 letters, 2 pairs of interchangeable letters -> 6 substitutions.
4 letters, 4 letters interchangeable -> 1 substitution.
I think a generic permutation function that takes an array arr and an integer n and returns all possible permutations of n elements in that array would be useful here.
Find how how many unique suits exist in the hand. Then generate all possible permutations with those many elements from the actual suits [c, d, h, s]. Finally go through each permutation of suits, and assign each unknown letter [A, B, C, D] in the hand to the permuted values.
The following code in Ruby takes a given hand and generates all suit permutations. The heaviest work is being done by the Array.permutation(n) method here which should simplify things a lot for a corresponding C program as well.
# all 4 suits needed for generating permutations
suits = ["c", "d", "h", "s"]
# current hand
hand = "9A 8A 7B 6B"
# find number of unique suits in the hand. In this case it's 2 => [A, B]
unique_suits_in_hand = hand.scan(/.(.)\s?/).uniq.length
# generate all possible permutations of 2 suits, and for each permutation
# do letter assignments in the original hand
# tr is a translation function which maps corresponding letters in both strings.
# it doesn't matter which unknowns are used (A, B, C, D) since they
# will be replaced consistently.
# After suit assignments are done, we split the cards in hand, and sort them.
possible_hands = suits.permutation(unique_suits_in_hand).map do |perm|
hand.tr("ABCD", perm.join ).split(' ').sort
end
# Remove all duplicates
p possible_hands.uniq
The above code outputs
9c 8c 7d 6d
9c 8c 7h 6h
9c 8c 7s 6s
9d 8d 7c 6c
9d 8d 7h 6h
9d 8d 7s 6s
9h 8h 7c 6c
9h 8h 7d 6d
9h 8h 7s 6s
9s 8s 7c 6c
9s 8s 7d 6d
9s 8s 7h 6h
Represent suits as sparse arrays or lists, numbers as indexes, hands as associative arrays
In your example
H [A[07080000] B[07080000] C[00000000] D[00000000] ] (place for four cards)
To get the "real" hands always apply the 24 permutations (fixed time), so you don't have to care about how many cards has your hand A,B,C,D -> c,d,h,s with the following "trick"> store always in alphabetical order>
H1 [c[xxxxxx] d[xxxxxx] s[xxxxxx] h[xxxxxx]]
Since Hands are associative arrays, duplicated permutations does not generate two different output hands.
#include <stdio.h>
#include <stdlib.h>
const int RANK = 0;
const int SUIT = 1;
const int NUM_SUITS = 4;
const char STANDARD_SUITS[] = "dchs";
int usedSuits[] = {0, 0, 0, 0};
const char MOCK_SUITS[] = "ABCD";
const char BAD_SUIT = '*';
char pullSuit (int i) {
if (usedSuits [i] > 0) {
return BAD_SUIT;
}
++usedSuits [i];
return STANDARD_SUITS [i];
}
void unpullSuit (int i) {
--usedSuits [i];
}
int indexOfSuit (char suit, const char suits[]) {
int i;
for (i = 0; i < NUM_SUITS; ++i) {
if (suit == suits [i]) {
return i;
}
}
return -1;
}
int legitimateSuits (const char suits[]) {
return indexOfSuit (BAD_SUIT, suits) == -1;
}
int distinctSuits (const char suits[]) {
int i, j;
for (i = 0; i < NUM_SUITS; ++i) {
for (j = 0; j < NUM_SUITS; ++j) {
if (i != j && suits [i] == suits [j]) {
return 0;
}
}
}
return 1;
}
void printCards (char* mockCards[], int numMockCards, const char realizedSuits[]) {
int i;
for (i = 0; i < numMockCards; ++i) {
char* mockCard = mockCards [i];
char rank = mockCard [RANK];
char mockSuit = mockCard [SUIT];
int idx = indexOfSuit (mockSuit, MOCK_SUITS);
char realizedSuit = realizedSuits [idx];
printf ("%c%c ", rank, realizedSuit);
}
printf ("\n");
}
/*
* Example usage:
* char** mockCards = {"9A", "8A", "7B", "6B"};
* expand (mockCards, 4);
*/
void expand (char* mockCards[], int numMockCards) {
int i, j, k, l;
for (i = 0; i < NUM_SUITS; ++i) {
char a = pullSuit (i);
for (j = 0; j < NUM_SUITS; ++j) {
char b = pullSuit (j);
for (k = 0; k < NUM_SUITS; ++k) {
char c = pullSuit (k);
for (l = 0; l < NUM_SUITS; ++l) {
char d = pullSuit (l);
char realizedSuits[] = {a, b, c, d};
int legitimate = legitimateSuits (realizedSuits);
if (legitimate) {
int distinct = distinctSuits (realizedSuits);
if (distinct) {
printCards (mockCards, numMockCards, realizedSuits);
}
}
unpullSuit (l);
}
unpullSuit (k);
}
unpullSuit (j);
}
unpullSuit (i);
}
}
int main () {
char* mockCards[] = {"9A", "8A", "7B", "6B"};
expand (mockCards, 4);
return 0;
}

Resources