Scattered indices in MPI - parallel-processing

I'm trying to divide up an array between processors such that each one takes points from different parts in the array. For example, if
A = {1, 2, 3, 4, 5, 6, 7, 8}
and I'm using 2 processors, I want P1 to handle {1, 3, 5, 7}, and P2 to handle {2, 4, 6, 8}.
When scaling up to very large numbers of points (millions) and processors (128), this is tricky. In previous versions of my function, I simply gave P1 the first chunk of points, P2 the next chunk, and so on (which is easy with MPI_gatherv).
Is there some way to use MPI_gatherv to make this work, or a way to use MPI_send and MPI_receive to achieve it? The trouble with MPI_gatherv is that while you can specify indices for processors to send to, it still puts all of P1 before P2 before P3 etc.

Related

Concurrent database MVCC timestamp generation method

I need to generate database timestamps for MVCC Snapshot isolation. The typical method utilized:
"Transactional actions are implemented in SI-TM as follows.
TM BEGIN: A logical snapshot for the transaction is generated
by obtaining a unique timestamp using an atomic increment
to the global timestamp counter."
The problem with using this approach in a system with hundreds of cores is that it doesn't scale. There is a hardware limit of 10M atomic increments per second on a contested memory location.
Any ideas?
Here are two simple ideas, and a paper reference:
1) Instead of incrementing the counter by 1, increment it by N, giving clients effectively a range of transaction identifiers [c, c+N). For instance, if N=5, and the initial value of the counter is 1, then clients A, B, and C would get the following:
A: [1, 2, 3, 4, 5]
B: [6, 7, 8, 9, 10]
C: [11, 12, 13, 14, 15]
While this reduces the pressure on the atomic counter, as we can see from this example some clients (like client C) will get a relatively high range of ids while others get lower ranges (client A), and this will lead to higher abort rates in the system.
2) Use ranges of interleaved transaction identifiers. This is like 1, but we've added a step variable, S. Here's a simple example: If N=5 and S=3, and the initial value of the counter is 1, then clients A B and C would get the following:
A: [1, 4, 7, 10, 13]
B: [2, 5, 8, 11, 14]
C: [3, 6, 9, 12, 15]
This seems to have solved the problem of 1, but consider client D:
D: [16, 19, 22, 25, 28]
Now we're back to the same problem that solution #1 had. Tricks must be played with this technique to "get it right".
3) An interesting, but more complex, decentralized way of assigning transaction IDs is described here:
Tu, Stephen, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. "Speedy transactions in multicore in-memory databases." In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 18-32. ACM, 2013.

Combine lists to the least possible amount of 2-dimensional lists

Sorry for the bad description in the title.
Consider a 2-dimensional list such as this:
list = [
[1, 2],
[2, 3],
[3, 4]
]
If I were to extract all possible "vertical" combinations of this list, for a total of 2*2*2=8 combinations, they would be the following sequences:
1, 2, 3
2, 2, 3
1, 3, 3
2, 3, 3
1, 2, 4
2, 2, 4
1, 3, 4
2, 3, 4
Now, let's say I remove some of these sequences. Let's say I only want to keep sequences which have either the number 2 in position #1 OR number 4 in position #3. Then I would be left with these sequences:
2, 2, 3
2, 3, 3
1, 2, 4
2, 2, 4
1, 3, 4
2, 3, 4
The problem
I would like to re-combine these remaining sequences to the least possible amount of 2-dimensional lists needed to contain all sequences but no less or no more.
By doing so, the resulting 2-dimensional lists in this particular example would be:
list_1 = [
[2],
[2, 3],
[3, 4]
]
list_2 = [
[1],
[2, 3],
[4]
]
In this particular case, the resulting lists can be thought out. But how would I go about if there were thousands of sequences yielding hundereds of 2-dimensional lists? I have been trying to come up with a good algorithm for two weeks now, but I am getting nowhere near a satisfying result.
Divide et impera, or divide and conquer. If we have a logical expression, stating that the value at position x should be a or the value at position y should be b, then we have 3 cases:
a is the value at position x and b is the value at position y
a is the value at position x and b is not the value at position y
a is not the value at position x and b is the value at position y
So, first you generate all your scenarios, you know now that you have 3 scenarios.
Then, you effectively separate your cases and handle all of them in a sub-routine as they were your main tasks. The philosophy behind divide et imera is to reduce your complex problem into several similar, but less complex problems, until you reach triviality.

How do I add a random offset to values in a Pseq?

Given a Pseq similar to the following:
Pseq([1, 2, 3, 4, 5, 6, 7, 8], inf)
How would I randomise the values slightly each time? That is, not just randomly alter the 8 values once at initialisation time, but have a random offset added each time a value is sent to the stream?
Here's a neat way:
(Pseq([1, 2, 3, 4, 5, 6, 7, 8], inf) + Pgauss(0, 0.1))
First you need to know that Pgauss is just a pattern that generates gaussian random numbers. You can use any other kind of pattern such as Pwhite.
Then you need to know the really pleasant bit: performing basic math operations on Patterns (as above) composes the patterns (by wrapping them in Pbinop).

Importing data into Mathematica in the form of a matrix

I have a file which, when I import into Mathematica, looks like this:
{{1,1,n1},{1,2,n2},{1,3,n3},{2,1,n4},{2,2,n5},{2,3,n6}} where n1...n6 are some numbers that I want to import as a matrix that looks like :
The first number in each block specifies the row and the second the column, but they are not a part of the matrix. Only the third number in each block is part of the matrix. How can I do that?
If
data = {{1, 1, n1}, {1, 2, n2}, {1, 3, n3}, {2, 1, n4}, {2, 2, n5}, {2, 3, n6}};
you can simply do
mat = Partition[data[[All, 3]], 3, 3]
There are a couple of interpretations of this question that I can think of.
If your data is in a regular format and you wish to read it in a memory efficient manner I recommend looking closely at ReadList and related functionality as I already directed you toward and the Partition function that the other answer illustrates.
I shall instead focus on the idea that the data is not in an entirely regular form in that the given row and column indexes are necessary to describe the positions of the data in the array. For that the most natural method is to use SparseArray at it accepts data in the form of position and value Rule pairs:
data = {{1, 1, n1}, {1, 2, n2}, {1, 3, n3}, {2, 1, n4}, {2, 2, n5}, {2, 3, n6}};
array = SparseArray[{#, #2} -> #3 & ### data];
array // MatrixForm
The function Normal can be used to convert the SparseArray into a regular list-of-lists array as needed:
Normal # array
{{n1, n2, n3}, {n4, n5, n6}}
Also there is a StackExchange site dedicated to Mathematica that I encourage you to explore.

understanding lotto program logic by skiena

I am reading article from following location. Here is text snippet form document.
Link
The problem of finding a minimum set of tickets that will guarantee a
win is not a trivial one. Given that P out of R outcomes will be from
the fortune-teller set, it is not difficult to see that there are NCP
= (N/P!)/(N-P)! possible P-subsets from the fortune-teller set that can occur in the winning ticket. If we were to pick all P-subsets from
the fortune-teller set W times and fill in the remaining R-P slots
arbitrarily, the set of tickets obtained will have at least W
occurrences of each P-subset and guarantee us W wins. However, such a
set need not be a minimal one and in most cases is not.
We know from the fortune-teller’s promise that one of the P-subsets
will occur in the winning ticket. It is possible for two P-subsets to
differ by less than J numbers. When such a situation arises, the
subsets are said to overlap or cover one another with respect to the
shared J numbers and only one of the P-subsets must be in a purchased
ticket. This phenomenon is best illustrated using an example. Suppose
we are playing the PICK-4 Lotto and require one 2/4 win. Hence R=4,
J=2 and W=1. Furthermore let’s assume that the fortune-teller predicts
3 numbers from a set of 5 numbers ( i.e. P=3 and N=5 ). If all
P-subsets were taken from the fortune-teller set and arbitrarily
filled to complete the tickets, we would have a set of ten tickets
that guarantees one 2/4 win ( See Figure 1 ). However, it is also
possible to exclude some tickets from this set because of several
two-number overlaps. For instance the subset {3, 4, 5} is different
than {1, 3, 5} by only one number and it will be wasteful to use both
of these in purchased tickets. We might think that not including {3,
4, 5} will permit the possibility of losing, but that is not the case
since if {3, 4, 5}occurs we will have ‘3’ and ‘5’ in {1, 3, 5}that we
bought to claim the prize! Similarly there can be many more redundant
P-subsets. An optimal solution is shown in Figure 2. Our lottery
problem is that of finding the smallest set of P-subsets from the
fortune-teller set that guarantees the specified number of wins by
keeping the number of overlaps to a minimum. This set of P-subsets
defines the winning set regardless of what numbers are used to
complete the R slots on the ticket.
My question are followiong
As author metioned "If all P-subsets were taken from the fortune-teller set and arbitrarily filled to complete the tickets, we would have a set of ten tickets" As in article table is missing can any one help me here what are the 10 tickets?
In above example if 1 and 3 occurs and if we didn't select {1, 3, 5} how can we win here?
Can anyone come up with fig 2 which is missing in article?
thank!
Here is an inefficent list of 10 tickets
{1, 2, 3, 6}
{1, 2, 4, 6}
{1, 2, 5, 6}
{1, 3, 4, 6}
{1, 3, 5, 6}
{1, 4, 5, 6}
{2, 3, 4, 6}
{2, 3, 5, 6}
{2, 4, 5, 6}
{3, 4, 5, 6}
Mu. To win we need to match 2 out of 4. So it isn't the case that 1 and 3 occurs, it is the case that a specific set of 3 occurs and we only need to match 2 of them.
I think this is optimal.
{1, 2, 3, 4}
But I'm not entirely positive that I can pick 4. If I am only allowed to pick 3 per ticket then an optimal set would be:
{1, 2, 3}
{2, 3, 4}
The two tickets are:
{ 1, 3, 5, X }
{ 2, 4, 5, X }
where X is an arbitrarily chosen number which does not affect the solution.

Resources