So I have a size N in julia and I need an NxN sparse matrix with N ones in it, in random places. What would be the best way to go about this?
At first I thought about randomly generating indexes and then setting those numbers to 1 in a sparse matrix but I recently found the sprand functions however I don't understand how to use them correctly or apply them to my problem. I tried using it with my limited understanding and it keeps generating error messages. Help is of course always greatly appreciated :)
Inspired by #DanGetz comment above, the following solution is a one-line function using randperm. I deleted the original answer as it was not very helpful.
sparseN(N) = sparse(randperm(N), randperm(N), ones(N), N, N)
This is also incredibly fast:
#time sparseN(10_000);
0.000558 seconds (30 allocations: 782.563 KiB)
A sparse matrix of dimension (N rows)x(M columns) has at most NxM components that can be indexed using the K=[0,N*M) integer set. For any k in K you can retrieve element indices (i,j) thanks to a Euclidean division k = i + j*N (here column major layout).
To randomly sample n elements of K (without repetition), you can use Knuth algorithm "Algorithm S (Selection sampling technique)" 3.4.2, in its book Vol2., seminumerical-Algorithms
In Julia:
function random_select(n::Int64,K::Int64)
#assert 0<=n<=K
sample=Vector{Int64}(n)
t=Int64(0)
m=Int64(0)
while m<n
if (K-t)*rand()>=n-m
t+=1
else
m+=1
sample[m]=t
t+=1
end
end
sample
end
The next part simply retrieves the I,J indices to create the sparse matrix from its coordinate form:
function create_sparseMatrix(n::Int64,N::Int64,M::Int64)
#assert (0<=N)&&(0<=M)
#assert 0<=n<=N*M
nonZero = random_select(n,N*M)
# column major: k=i+j*N
I = map(k->mod(k,N),nonZero)
J = map(k->div(k,N),nonZero)
sparse(I+1,J+1,ones(n),N,M)
end
Usage example: a 4x5 sparse matrix with 3 nonzero (=1.0) at random positions:
julia> create_sparseMatrix(3,4,5)
4×5 SparseMatrixCSC{Float64,Int64} with 3 stored entries:
[4, 1] = 1.0
[3, 2] = 1.0
[3, 3] = 1.0
Border case tests:
julia> create_sparseMatrix(0,4,5)
4×5 SparseMatrixCSC{Float64,Int64} with 0 stored entries
julia> create_sparseMatrix(4*5,4,5)
4×5 SparseMatrixCSC{Float64,Int64} with 20 stored entries:
[1, 1] = 1.0
[2, 1] = 1.0
[3, 1] = 1.0
[4, 1] = 1.0
⋮
[4, 4] = 1.0
[1, 5] = 1.0
[2, 5] = 1.0
[3, 5] = 1.0
[4, 5] = 1.0
Insisting on a one-line-ish solution:
using StatsBase
sparseones(N,M,K) = sparse(
(x->(first.(x).+1,last.(x).+1))(divrem.(sample(0:N*M-1,K,replace=false),M))...,
ones(K),N,M
)
Giving:
julia> sparseones(3,4,5)
3×4 SparseMatrixCSC{Float64,Int64} with 5 stored entries:
[1, 1] = 1.0
[2, 1] = 1.0
[3, 3] = 1.0
[2, 4] = 1.0
[3, 4] = 1.0
This method is essentially the same as the earlier answer with the advantage of re-using existing sample and being much shorter. It is even faster on larger matrices.
Related
I have 2 tensors. The first tensor is 1D (e.g. a tensor of 3 values). The second tensor is 2D, with the first dim as the IDs to first tensor in a one-many relationship (e.g. a tensor with a shape of 6, 2)
# e.g. simple example of dot product
import torch
a = torch.tensor([2, 4, 3])
b = torch.tensor([[0, 2], [0, 3], [0, 1], [1, 4], [2, 3], [2, 1]]) # 1st column is the index to tensor a, 2nd column is the value
output = [(2*2)+(2*3)+(2*1),(4*4),(3*3)+(3*1)]
output = [12, 16, 12]
Current what I have is to find the size of each id in b (e.g. [3,1,2]) then using torch.split to group them into a list of tensors and running a for loop through the groups. It is fine for a small tensor, but when the size of the tensors are in millions, with tens of thousands of arbitrary-sized groups, it became very slow.
Any better solutions?
You can use numpy.bincount or torch.bincount to sum the elements of b by key:
import numpy as np
a = np.array([2,4,3])
b = np.array([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
print( np.bincount(b[:,0], b[:,1]) )
# [6. 4. 4.]
print( a * np.bincount(b[:,0], b[:,1]) )
# [12. 16. 12.]
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
torch.bincount(b[:,0], b[:,1])
# tensor([6., 4., 4.], dtype=torch.float64)
a * torch.bincount(b[:,0], b[:,1])
# tensor([12., 16., 12.], dtype=torch.float64)
References:
numpy.bincount official documentation;
torch.bincount official documentation;
How can I reduce a numpy array based on a key rather than an axis?
Another alternative in pytorch if gradient is needed.
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
output = torch.zeros(a.shape[0], dtype=torch.long).index_add_(0, b[:, 0], b[:, 1]) * a
alternatively, torch.tensor.scatter_add also works.
I'm fairly new to numpy arrays, so any help will be much appreciated.
I want to get a single slice of an n x m array along the second axis, with the result being an n x 1 array, e.g.
a = np.array([[1, 2, 3],
[4, 5, 6]])
Then I want:
some_function(a, 0) = array([[1], [4]]) # to get slice of a, along index 0
I've tried a[:, 0] which gives array([1, 4]).
And:
np.transpose(a[:, 0])
also gives:
array([1, 4])
Which confuses me.
I'm sure this is really simple but can't find the correct some_function!
So I've solved it with np.reshape:
some_function(a,0) = np.reshape(a[:,0],(2,1))
But this doesn't seem too elegant. Anyone got a neater solution?
I have array of size N, I need to generate all permutations variants of size K from this array. Variants [1 2 3] and [3 1 2] are different. Standard solutions which I found were
1) Just permutations, where I obtain all reordering of the same size as array.
2) Just combinations, where I obtain all combinations of size K from array of size N, but for these algorithms [1 2 6] and [6 1 2] are the same, while I need them to be different.
Could You help me to find an effective solution?
I should implement it on Matlab, but I hope I will be able to translate Your solutions from other languages.
Basically, in any language which can produce all unordered subsets of size K from 1:N, and which can produce all permutations of 1:K, getting all the ordered subsets is as simple as iterating over the subsets and permuting them using every K-permutation.
In Julia language:
using Combinatorics, Iterators, Base.Iterators
N = 4
K = 2
collect(flatten(permutations(subset) for subset in subsets(1:N,K)))
Gives:
12-element Array{Array{Int64,1},1}:
[1, 2]
[2, 1]
[1, 3]
[3, 1]
[1, 4]
[4, 1]
[2, 3]
[3, 2]
[2, 4]
[4, 2]
[3, 4]
[4, 3]
Combine the two solutions you found. Here's the python code:
allPermutations = list()
combinations=getCombinations(arr, K)
for comb in combinations:
allPermutations.extend(getPermutations(comb))
1.arr is the input array.
2.getCombinations is a function which returns a list of all the combinations in arr of size K.
3.getPermutations returns all permutations of the array given as input.
I want to get a numpy array of sub arrays from a base array using some type of indexing arrays (style/format of indexing arrays open for suggestions). I can easily do this with a for loop, but wondering if there is a clever way to use numpy broadcasting?
Constraints: Sub-arrays are guaranteed to be the same size.
up_idx = np.array([[0, 0],
[0, 2],
[1, 1]])
lw_idx = np.array([[2, 2],
[2, 4],
[3, 3]])
base = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
samples = []
for index in range(up_idx.shape[0]):
up_row = up_idx[index, 0]
up_col = up_idx[index, 1]
lw_row = lw_idx[index, 0]
lw_col = lw_idx[index, 1]
samples.append(base[up_row:lw_row, up_col:lw_col])
samples = np.array(samples)
print(samples)
> [[[ 1 2]
[ 5 6]]
[[ 3 4]
[ 7 8]]
[[ 6 7]
[10 11]]]
I've tried:
vector_s = base[up_idx[:, 0]:lw_idx[:, 1], up_idx[:, 1]:lw_idx[:, 1]]
But that was just nonsensical it seems.
I don't think there is a fast way to do this in general via numpy broadcasting operations – for one thing, the way you set up the problem there is no guarantee that the resulting sub-arrays will be the same shape, and thus able to fit into a single output array.
The most succinct and efficient way to solve this is probably via a list comprehension; e.g.
result = np.array([base[i1:i2, j1:j2] for (i1, j1), (i2, j2) in zip(up_idx, lw_idx)])
Unless your base array is very large, this shouldn't be much of a bottleneck.
If you have different problem constraints (i.e. same size slice in every case) it may be possible to come up with a faster vectorized solution based on fancy indexing. For example, if every slice is of size two (as in your example above) then you can use fancy indexing like this to obtain the same result:
i, j = up_idx.T[:, :, None] + np.arange(2)
result = base[i[:, :, None], j[:, None]]
The key to understanding this fancy indexing is to realize that the result follows the broadcasted shape of the index arrays.
Introduction
While trying to do some cathegorization on nodes in a graph (which will be rendered differenty), I find myself confronted with the following problem:
The Problem
Given a superset of elements S = {0, 1, ... M} and a number n of non-disjoint subsets T_i thereof, with 0 <= i < n, what is the best algorithm to find out the partition of the set S called P?
P = S is the union of all disjoint partitions P_j of the original superset S, with 0 <= j < M, such that for all elements x in P_j, every x has the same list of "parents" among the "original" sets T_i.
Example
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
So all P_js would be:
P_1 = [1, 4] # all elements x have the same list of "parents": T_1, T_3
P_2 = [2] # all elements x have the same list of "parents": T_2
P_3 = [3] # all elements x have the same list of "parents": T_2, T_3
P_4 = [5, 6, 8, 9] # all elements x have the same list of "parents": S (so they're not in any of the P_j
Questions
What are good functions/classes in the python packages to compute all P_js and the list of their "parents", ideally restricted to numpy and scipy? Perhaps there's already a function which does just that
What is the best algorithm to find those partitions P_js and for each one, the list of "parents"? Let's note T_0 = S
I think the brute force approach would be to generate all 2-combinations of T sets and split them in at most 3 disjoint sets, which would be added back to the pool of T sets and then repeat the process until all resulting Ts are disjoint, and thus we've arrived at our answer - the set of P sets. A little problematic could be caching all the "parents" on the way there.
I suspect a dynamic programming approach could be used to optimize the algorithm.
Note: I would have loved to write the math parts in latex (via MathJax), but unfortunately this is not activated :-(
The following should be linear time (in the number of the elements in the Ts).
from collections import defaultdict
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
Ts = [S, T_1, T_2, T_3]
parents = defaultdict(int)
for i, T in enumerate(Ts):
for elem in T:
parents[elem] += 2 ** i
children = defaultdict(list)
for elem, p in parents.items():
children[p].append(elem)
print(list(children.values()))
Result:
[[5, 6, 8, 9], [1, 4], [2], [3]]
The way I'd do this is to construct an M × n boolean array In where In(i, j) = Si ∈ Tj. You can construct that in O(Σj|Tj|), provided you can map an element of S onto its integer index in O(1), by scanning all of the sets T and marking the corresponding bit in In.
You can then read the "signature" of each element i directly from In by concatenating row i into a binary number of n bits. The signature is precisely the equivalence relationship of the partition you are seeking.
By the way, I'm in total agreement with you about Math markup. Perhaps it's time to mount a new campaign.