I'm a beginner with R, so I'm having trouble thinking of things the "R way"...
I have this function:
upOneRow <- function(table, column) {
for (i in 1:(nrow(table) - 1)) {
table[i, column] = table [i + 1, column]
}
return(table)
}
It seems simple enough, and shouldn't take that long to run, but on a dataframe with ~300k rows, the time it takes to run is unreasonable. What is the right way to approach this?
Instead of the loop you could try something like this:
n <- nrow(table)
table[(1:(n-1)), column] <- table[(2:n), column];
to vectorize is the key
Simple answer: Columns in a data.frame are also vectors which can be indexed with [,]
my.table <- data.frame(x = 1:10, y=10:1)
> my.table
x y
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1
my.table$y <-c(my.table[-1,"y"],NA) #move up one spot and pad with NA
> my.table
x y
1 1 4
2 2 3
3 3 2
4 4 1
5 5 NA
Now you function repeats the last data point at the end. If this is really what you want, pad with tail(x,1) instead of NA.
my.table$y <-c(my.table[-1,"y"],tail(my.table$y,1)) #pad with tail(x,1)
> my.table
x y
1 1 4
2 2 3
3 3 2
4 4 1
5 5 1
If I understand you right, you're trying to "move up" one column of a data frame, with the first element going to the bottom. Then, It might be achieved as:
col <- table[, column]
table[, column] <- col[c(nrow(table), 1:(nrow(table)-1))]
Related
I want a wrap around index like 1232123...., and the frame size is 3. How to implement it? Does it has a term?
for i in 1..100 {
let idx = loop_index(i);
print!("{} ", idx);
}
Expected output for frame 3:
1 2 3 2 1 2 3 2 1...
Expected output for frame 4:
1 2 3 4 3 2 1 2 3 4 3 2 1...
For a size of 3, notice that the sequence 1232 has length 4, and then it repeats. In general, for size n, the length is 2*(n-1). If we take the modulo i % (2*(n-1)), the task becomes simpler: turn a sequence 0123..(2*(n-1)-1) into 123..(n-1)n(n-1)..321. And this can be done using abs and basic arithmetic:
n = 3
r = 2 * (n - 1)
for i in range(20):
print(n - abs(n - (i % r)) - 1))
When you reach the top number, you start decreasing; when you reach the bottom number, you start increasing.
Don't change direction until you reach the top or bottom number.
def count_up_and_down(bottom, top, length):
assert(bottom < top)
direction = +1
x = bottom
for _ in range(length):
yield x
direction = -1 if x == top else +1 if x == bottom else direction
x = x + direction
for i in count_up_and_down(1, 4, 10):
print(i, end=' ')
# 1 2 3 4 3 2 1 2 3 4
Alternatively, combining two ranges with itertools:
from itertools import chain, cycle, islice
def count_up_and_down(bottom, top, length):
return islice(cycle(chain(range(bottom, top), range(top, bottom, -1))), length)
for i in count_up_and_down(1, 4, 10):
print(i, end=' ')
# 1 2 3 4 3 2 1 2 3 4
I have a matrix (m) of scores for 4 students on 3 different exams.
4 3 1
3 2 5
8 4 6
1 5 2
I want to know, for each student, the exams they did best to worse on. Desired output:
1 2 3
2 3 1
1 3 2
3 1 2
Now, I'm new to the language (and coding in general), so I read GeeksforGeeks' page on sorting in Julia and tried
mapslices(sortperm, -m; dims = 2)
However, this gives something subtly different: a matrix of each row being the index of the sorting.
1 2 3
3 1 2
1 3 2
2 3 1
Perhaps it was obvious, but I now realize this is not actually what I want, but I cannot find a built-in function/fast way to complete this operation. Any ideas? Preferably something which doesn't iterate through items in the matrix/row, as in reality my matrix is very, very large. Thanks!
Such functionality is provided by StatsBase.jl. Here is an example:
julia> using StatsBase
julia> m = [4 3 1
3 2 5
8 4 6
1 5 2]
4×3 Array{Int64,2}:
4 3 1
3 2 5
8 4 6
1 5 2
julia> mapslices(x -> ordinalrank(x, rev=true), m, dims = 2)
4×3 Array{Int64,2}:
1 2 3
2 3 1
1 3 2
3 1 2
You might want to use other rank, depending on how you want to split ties, see here for details.
Figured out something which works!
Run m_index_rank = mapslices(sortperm, -m; dims = 2) on the matrix and get a ranking for each row through index. Then, realizing this is, in each row, an inverse permutation away from the desired output, run mapslices(invperm, m_index_rank; dims = 2) for the desired result.
In one line, this is mapslices(r -> invperm(sortperm(r, rev=true)), m; dims=2) over the desired matrix m. dims = 2 is to carry out the operation row-wise.
I'm marking this resolved for now, but please let me know if there are cleaner/faster ways to do this.
Edit: Replaced my syntactically clunky mapslices(invperm, mapslices(sortperm, -m; dims = 2); dims = 2) with a more natural one, thanks to #phipsgabler
A matrix of size nxn needs to be constructed with the desired properties.
n is even. (given as input to the algorithm)
Matrix should contain integers from 0 to n-1
Main diagonal should contain only zeroes and matrix should be symmetric.
All numbers in each row should be different.
For various n , any one of the possible output is required.
input
2
output
0 1
1 0
input
4
output
0 1 3 2
1 0 2 3
3 2 0 1
2 3 1 0
Now the only idea that comes to my mind is to brute-force build combinations recursively and prune.
How can this be done in a iterative way perhaps efficiently?
IMO, You can handle your answer by an algorithm to handle this:
If 8x8 result is:
0 1 2 3 4 5 6 7
1 0 3 2 5 4 7 6
2 3 0 1 6 7 4 5
3 2 1 0 7 6 5 4
4 5 6 7 0 1 2 3
5 4 7 6 1 0 3 2
6 7 4 5 2 3 0 1
7 6 5 4 3 2 1 0
You have actually a matrix of two 4x4 matrices in below pattern:
m0 => 0 1 2 3 m1 => 4 5 6 7 pattern => m0 m1
1 0 3 2 5 4 7 6 m1 m0
2 3 0 1 6 7 4 5
3 2 1 0 7 6 5 4
And also each 4x4 is a matrix of two 2x2 matrices with a relation to a power of 2:
m0 => 0 1 m1 => 2 3 pattern => m0 m1
1 0 3 2 m1 m0
In other explanation I should say you have a 2x2 matrix of 0 and 1 then you expand it to a 4x4 matrix by replacing each cell with a new 2x2 matrix:
0 => 0+2*0 1+2*0 1=> 0+2*1 1+2*1
1+2*0 0+2*0 1+2*1 0+2*1
result => 0 1 2 3
1 0 3 2
2 3 0 1
3 2 1 0
Now expand it again:
0,1=> as above 2=> 0+2*2 1+2*2 3=> 0+2*3 1+2*3
1+2*2 0+2*2 1+2*3 0+2*3
I can calculate value of each cell by this C# sample code:
// i: row, j: column, n: matrix dimension
var v = 0;
var m = 2;
do
{
var p = m/2;
v = v*2 + (i%(n/p) < n/m == j%(n/p) < n/m ? 0 : 1);
m *= 2;
} while (m <= n);
We know each row must contain each number. Likewise, each row contains each number.
Let us take CS convention of indices starting from 0.
First, consider how to place the 1's in the matrix. Choose a random number k0, from 1 to n-1. Place the 1 in row 0 at position (0,k0). In row 1, if k0 = 1 in which case there is already a one placed. Otherwise, there are n-2 free positions and place the 1 at position (1,k1). Continue in this way until all the 1 are placed. In the final row there is exactly one free position.
Next, repeat with the 2 which have to fit in the remaining places.
Now the problem is that we might not be able to actually complete the square. We may find there are some constraints which make it impossible to fill in the last digits. The problem is that checking a partially filled latin square is NP-complete.(wikipedia) This basically means pretty compute intensive and there no know short-cut algorithm. So I think the best you can do is generate squares and test if they work or not.
If you only want one particular square for each n then there might be simpler ways of generating them.
The link Ted Hopp gave in his comment Latin Squares. Simple Construction does provide a method for generating a square starting with the addition of integers mod n.
I might be wrong, but if you just look for printing a symmetric table - a special case of latin squares isomorphic to the symmetric difference operation table over a powerset({0,1,..,n}) mapped to a ring {0,1,2,..,2^n-1}.
One can also produce such a table, using XOR(i,j) where i and j are n*n table indexes.
For example:
def latin_powerset(n):
for i in range(n):
for j in range(n):
yield (i, j, i^j)
Printing tuples coming from previously defined special-case generator of symmetric latin squares declared above:
def print_latin_square(sq, n=None):
cells = [c for c in sq]
if n is None:
# find the length of the square side
n = 1; n2 = len(cells)
while n2 != n*n:
n += 1
rows = list()
for i in range(n):
rows.append(" ".join("{0}".format(cells[i*n + j][2]) for j in range(n)))
print("\n".join(rows))
square = latin_powerset(8)
print(print_latin_square(square))
outputs:
0 1 2 3 4 5 6 7
1 0 3 2 5 4 7 6
2 3 0 1 6 7 4 5
3 2 1 0 7 6 5 4
4 5 6 7 0 1 2 3
5 4 7 6 1 0 3 2
6 7 4 5 2 3 0 1
7 6 5 4 3 2 1 0
See also
This covers more generic cases of latin squares, rather than that super symmetrical case with the trivial code above:
https://www.cut-the-knot.org/arithmetic/latin2.shtml (also pointed in the comments above for symmetric latin square construction)
https://doc.sagemath.org/html/en/reference/combinat/sage/combinat/matrices/latin.html
There are two vectors:
a = 1:5;
b = 1:2;
in order to find all combinations of these two vectors, I am using the following piece of code:
[A,B] = meshgrid(a,b);
C = cat(2,A',B');
D = reshape(C,[],2);
the result includes all the combinations:
D =
1 1
2 1
3 1
4 1
5 1
1 2
2 2
3 2
4 2
5 2
now the questions:
1- I want to decrease the number of operations to improve the performance for vectors with bigger size. Is there any single function in MATLAB that is doing this?
2- In the case that the number of vectors is more than 2, the meshgrid function cannot be used and has to be replaced with for loops. What is a better solution?
For greater than 2 dimensions, use ndgrid:
>> a = 1:2; b = 1:3; c = 1:2;
>> [A,B,C] = ndgrid(a,b,c);
>> D = [A(:) B(:) C(:)]
D =
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
Note that ndgrid expects (rows,cols,...) rather than (x,y).
This can be generalized to N dimensions (see here and here):
params = {a,b,c};
vecs = cell(numel(params),1);
[vecs{:}] = ndgrid(params{:});
D = reshape(cat(numel(vecs)+1,vecs{:}),[],numel(vecs));
Also, as described in Robert P.'s answer and here too, kron can also be useful for replicating values (indexes) in this way.
If you have the neural network toolbox, also have a look at combvec, as demonstrated here.
One way would be to combine repmat and the Kronecker tensor product like this:
[repmat(a,size(b)); kron(b,ones(size(a)))]'
ans =
1 1
2 1
3 1
4 1
5 1
1 2
2 2
3 2
4 2
5 2
This can be scaled to more dimensions this way:
a = 1:3;
b = 1:3;
c = 1:3;
x = [repmat(a,1,numel(b)*numel(c)); ...
repmat(kron(b,ones(1,numel(a))),1,numel(c)); ...
kron(c,ones(1,numel(a)*numel(b)))]'
There is a logic! First: simply repeat the first vector. Secondly: Use the tensor product with the dimension of the first vector and repeat it. Third: Use the tensor product with the dimension of (first x second) and repeat (in this case there is not fourth, so no repeat.
I know that modulus gives the remainder and that this code will give the survivor of the Josephus Problem. I have noticed a pattern that when n mod k = 0, the starting count point begins at the very beginning of the circle and that when n mod k = 1, the person immediately before the beginning of the circle survived that execution round through the circle.
I just don't understand how this recursion uses modulus to find the last man standing and what josephus(n-1,k) is actually referring to. Is it referring to the last person to get executed or the last survivor of a specific round through the circle?
def josephus( n, k):
if n ==1:
return 1
else:
return ((josephus(n-1,k)+k-1) % n)+1
This answer is both a summary of the Josephus Problem and an answer to your questions of:
What is josephus(n-1,k) referring to?
What is the modulus operator being used for?
When calling josephus(n-1,k) that means that you've executed every kth person up to a total of n-1 times. (Changed to match George Tomlinson's comment)
The recursion keeps going until there is 1 person standing, and when the function returns itself to the top, it will return the position that you will have to be in to survive. The modulus operator is being used to help stay within the circle (just as GuyGreer explained in the comments). Here is a picture to help explain:
1 2
6 3
5 4
Let the n = 6 and k = 2 (execute every 2nd person in the circle). First run through the function once and you have executed the 2nd person, the circle becomes:
1 X
6 3
5 4
Continue through the recursion until the last person remains will result in the following sequence:
1 2 1 X 1 X 1 X 1 X X X
6 3 -> 6 3 -> 6 3 -> X 3 -> X X -> X X
5 4 5 4 5 X 5 X 5 X 5 X
When we check the values returned from josephus at n we get the following values:
n = 1 return 1
n = 2 return (1 + 2 - 1) % 2 + 1 = 1
n = 3 return (1 + 2 - 1) % 3 + 1 = 3
n = 4 return (3 + 2 - 1) % 4 + 1 = 1
n = 5 return (1 + 2 - 1) % 5 + 1 = 3
n = 6 return (3 + 2 - 1) % 6 + 1 = 5
Which shows that josephus(n-1,k) refers to the position of the last survivor. (1)
If we removed the modulus operator then you will see that this will return the 11th position but there is only 6 here so the modulus operator helps keep the counting within the bounds of the circle. (2)
Your first question has been answered above in the comments.
To answer your second question, it's referring to the position of the last survivor.
Consider j(4,2).
Using the algorithm gives
j(4,2)=(j(3,2)+1)%4)+1
j(3,2)=(j(2,2)+1)%3)+1
j(2,2)=(j(1,2)+1)%2)+1
j(1,2)=1
and so
j(2,2)=((1+1)%2)+1=1
j(3,2)=((1+1)%3)+1=3
j(4,2)=((3+1)%4)+1=1
Now the table of j(2,2) is
1 2
1 x
so j(2,2) is indeed 1.
For j(3,2) we have
1 2 3
1 x 3
x x 3
so j(3,2) is 3 as required.
Finally, j(4,2) is
1 2 3 4
1 x 3 4
1 x 3 x
1 x x x
which tells us that j(4,2)=1 as required.