Matrices in Julia - Sorting and Sort Permutations - sorting

How can matrices be sorted in Julia.
Let's define a matrix:
a = [1 1 1;
1 3 1;
1 2 2;
1 2 1]
I'm looking for several things here:
Sort by multiple Columns
So, you can sort the matrix by all columns from left to right with
sortslices(a, dims = 1)
4×3 Array{Int64,2}:
1 1 1
1 2 1
1 2 2
1 3 1
Sort by particular Column Order
But what if want to sort by the 3rd then the 2nd and then the 1st column? Expected output:
4×3 Array{Int64,2}:
1 1 1
1 2 1
1 3 1
1 2 2
Sort permutation
Let's assume I had a Vector
b = ["a", "d", "c", "b"]
and I would like to sort its elements by the sort permutation of the matrix' columns. As seen above sortslices() let's me sort the matrix so I get the rows in the order [1,4,3,2]. How can I get this vector to sort b to
4-element Array{String,1}:
"a"
"b"
"c"
"d"
I know there are other similar questions, but they seem either address other issues or they seem to be outdated (e.g. Julia: Sort Matrix by column 2 then 3).

It isn't ideal to ask multiple questions within a single post. It makes searching for questions much harder.
Anyway, here we go:
Sort by multiple Columns
julia> a = [1 1 1;
1 3 1;
1 2 2;
1 2 1]
4×3 Matrix{Int64}:
1 1 1
1 3 1
1 2 2
1 2 1
julia> sortslices(a, dims = 1, by=x->(x[1], x[2], x[3]))
4×3 Matrix{Int64}:
1 1 1
1 2 1
1 2 2
1 3 1
Sort by particular Column Order
julia> a = [1 1 1;
1 3 1;
1 2 2;
1 2 1]
4×3 Matrix{Int64}:
1 1 1
1 3 1
1 2 2
1 2 1
julia> sortslices(a, dims = 1, by=x->(x[3], x[2], x[1]))
4×3 Matrix{Int64}:
1 1 1
1 2 1
1 3 1
1 2 2
Sort permutation
julia> b = ["a", "d", "c", "b"]
4-element Vector{String}:
"a"
"d"
"c"
"b"
julia> sortperm(b)
4-element Vector{Int64}:
1
4
3
2
julia> b[sortperm(b)]
4-element Vector{String}:
"a"
"b"
"c"
"d"

Building on #mcabbot's comment, I think that by clause is unnecessary. This suffices: p = sortperm(collect(eachrow(a))).
Whether this is the most performant solution, I don't know.

Related

Maximum k such that A[0]<A[k], A[1]<A[k+1], ..., A[k-1]<A[2*k-1], after sorting each k-sized window

I need the efficient algorithm for this problem (time comlexity less than O(n^2)), please help me:
a[i..j] is called a[i..j] < b[i..j] if a[i]<b[i], a[i+1]<b[i+1], ..., a[j]<b[j] after sorting these 2 arrays.
Given array A[1..n], (n<= 10^5, a[i]<= 1000). Find the maximum of k that A[1..k] < A[k+1..2k]
For example, n=10: 2 2 1 4 3 2 5 4 2 3
the answer is 4
Easily to see that k <= n/2. So we can use brute-forces (k from n/2 to 1), but not binary search.
And I don't know what to do with a[i] <= 1000. Maybe using map???
Use a Fenwick tree with range updates. Each index in the tree represents the count of how many numbers in window A are smaller than it. For the windows to be valid, each element in B (the window on the right) must have a partner in A (the window on the left). When we shift a number x into A, we add 1 to the range, [x+1, 1000] in the tree. For the element shifted from B to A, add 1 in its tree index. For each new element in B, add -1 to its index in the tree. If an index drops below zero, the window is invalid.
For the example, we have:
2 2 1 4 3 2 5 4 2 3
2 2
|
Tree:
add 1 to [3, 1000]
add -1 to 2
idx 1 2 3 4 5
val 0 -1 1 1 1 (invalid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4
|
Tree:
add 1 to [3, 1000]
add 1 to 2 (remove 2 from B)
add -1 to 1
add -1 to 4
idx 1 2 3 4 5
val -1 0 2 1 2 (invalid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2
|
Tree:
add 1 to [2, 1000]
add 1 to 1 (remove 1 from B)
add -1 to 3
add -1 to 2
idx 1 2 3 4 5
val 0 0 2 2 3 (valid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2 5 4
|
Tree:
add 1 to [5, 1000]
add 1 to 4 (remove 4 from B)
add -1 to 5
add -1 to 4
idx 1 2 3 4 5
val 0 0 2 2 3 (valid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2 5 4 2 3
|
Tree:
add 1 to [4, 1000]
add 1 to 3 (remove 3 from B)
add -1 to 2
add -1 to 3
idx 1 2 3 4 5
val 0 -1 2 3 4 (invalid)

Example of compress column format for rank-deficient matrices

It is the first time I deal with column-compress storage (CCS) format to store matrices. After googling a bit, if I am right, in a matrix having n nonzero elements the CCS is as follows:
-we define a vector A_v of dimensions n x 1 storing the n non-zero elements
of the matrix
- we define a second vector A_ir of dimensions n x 1 storing the rows of the
non-zero elements of the matrix
-we finally define a third vector A_jc whose elements are the indices of the
elements of A_v which corresponds to the beginning of new column, plus a
final value which is by convention equal t0 n+1, and identifies the end of
the matrix (pointing theoretically to a virtual extra-column).
So for instance,
if
M = [1 0 4 0 0;
0 3 5 2 0;
2 0 0 4 6;
0 0 7 0 8]
we get
A_v = [1 2 3 4 5 7 2 4 6 8];
A_ir = [1 3 2 1 2 4 2 3 3 4];
A_jc = [1 3 4 7 9 11];
my questions are
I) is what I wrote correct, or I misunderstood anything?
II) what if I want to represent a matri with some columns which are zeroes, e.g.,
M2 = [0 1 0 0 4 0 0;
0 0 3 0 5 2 0;
0 2 0 0 0 4 6;
0 0 0 0 7 0 8]
wouldn't the representation of M2 in CCS be identical to the one of M?
Thanks for the help!
I) is what I wrote correct, or I misunderstood anything?
You are perfectly correct. However, you have to take care that if you use a C or C++ library offsets and indices should start at 0. Here, I guess you read some Fortran doc for which indices are starting at 1. To be clear, here is below the C version, which simply translates the indices of your Fortran-style correct answer:
A_v = unmodified
A_ir = [0 2 1 0 1 3 1 2 2 4] (in short [1 3 2 1 2 4 2 3 3 4] - 1)
A_jc = [0 2 3 6 8 10] (in short [1 3 4 7 9 11] - 1)
II) what if I want to represent a matri with some columns which are
zeroes, e.g., M2 = [0 1 0 0 4 0 0;
0 0 3 0 5 2 0;
0 2 0 0 0 4 6;
0 0 0 0 7 0 8]
wouldn't the representation of M2 in CCS be identical to the one of M?
I you have an empty column, simply add a new entry in the offset table A_jc. As this column contains no element this new entry value is simply the value of the previous entry. For instance for M2 (with index starting at 0) you have:
A_v = unmodified
A_ir = unmodified
A_jc = [0 0 2 3 6 8 10] (to be compared to [0 2 3 6 8 10])
Hence the two representations are differents.
If you just start learning about sparse matrices there is an excelllent free book here: http://www-users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf

Julia: sorting a matrix by 2 columns in different orders

I need to sort a four column matrix in Julia by the third column in ascending order then by the fourth column in descending order.
The easiest way to do chained lexicographic sorting on columns in an arbitrary order is to pass a transformation by function: sortrows(A, by=x->(x[3],x[4]))… but that's just lexicographic with both columns ascending. In order to do fancier behaviors, you can pass a custom comparison function to sortrows:
julia> A = rand(1:3,6,4)
6x4 Array{Int64,2}:
3 1 1 2
1 1 3 1
1 1 2 1
2 1 3 3
1 3 3 1
2 3 2 3
julia> sortrows(A, lt=(x,y)->isless(x[3],y[3]) || (isequal(x[3],y[3]) && isless(y[4],x[4])))
6x4 Array{Int64,2}:
3 1 1 2
2 3 2 3
1 1 2 1
2 1 3 3
1 1 3 1
1 3 3 1

Tiling or repeating n-dimensional arrays in Julia

I am looking for a general function to tile or repeat matrices along an arbitrary number of dimensions an arbitrary number of times. Python and Matlab have these features in NumPy's tile and Matlab's repmat functions. Julia's repmat function only seems to support up to 2-dimensional arrays.
The function should look like repmatnd(a, (n1,n2,...,nk)). a is an array of arbitrary dimension. And the second argument is a tuple specifying the number of times the array is repeated for each dimension k.
Any idea how to tile a Julia array on greater than 2 dimensions? In Python I would use np.tile and in matlab repmat, but the repmat function in Julia only supports 2 dimensions.
For instance,
x = [1 2 3]
repmatnd(x, 3, 1, 3)
Would result in:
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
And for
x = [1 2 3; 1 2 3; 1 2 3]
repmatnd(x, (1, 1, 3))
would result in the same thing as before. I imagine the Julia developers will implement something like this in the standard library, but until then, it would be nice to have a fix.
Use repeat:
julia> X = [1 2 3]
1x3 Array{Int64,2}:
1 2 3
julia> repeat(X, outer = [3, 1, 3])
3x3x3 Array{Int64,3}:
[:, :, 1] =
1 2 3
1 2 3
1 2 3
[:, :, 2] =
1 2 3
1 2 3
1 2 3
[:, :, 3] =
1 2 3
1 2 3
1 2 3

Matlab: Sum of 2nd column in matrix for equal values in 1st column

i would like to sum all the values from my 2nd column which have the same value in the first column.
So my matrix looks maybe like this:
column: [1 1 1 2 2 3 3 3 3 4 5 5]
column: [3 5 8 2 6 4 0 6 1 0 2 6]
now i would like to have for the value 1 in the 1st column a sum of 3, 5 and 8 in the 2nd column, the same goes for 2, 3 and so from the 1st column.
Like this for example:
[1 2 3 4 5],
[16 8 11 0 8]
i'm thankful for any suggestions!
Sum all values when values are equal :
Just to init :
a = [1 1 1 2 2 3 3 3 3 4 5 5 ; 3 5 8 2 6 4 0 6 1 0 2 6];
a = a.';
Let's go :
n=0
for i=1:size(a,1)
if a(i,1) == a(i,2)
n = n + a(i,1)
end
end
n
For the second question :
mat=0
for j = 1:max(a(:,1))
n=0
for i=1:size(a,1)
if j == a(i,1)
n = n + a(i,2)
end
end
mat(j,1) = j
mat(j,2) = n
end
mat
Result :
mat =
1 16
2 8
3 11
4 0
5 8

Resources