How to create a uniformly random matrix in Julia? - matrix

l want to get a matrix with uniformly random values sampled from [-1,2]
x= rand([-1,2],(3,3))
3x3 Array{Int64,2}:
-1 -1 -1
2 -1 -1
-1 -1 -1
but it takes into consideration just -1 and 2, and I'm looking for continuous values for instance -0.9 , 0.75, -0.09, 1.80.
How can I do that?

Note: I am assuming here that you're looking for uniform random variables.
You can also use the Distributions package:
## Pkg.add("Distributions") # If you don't already have it installed.
using Distributions
rand(Uniform(-1,2), 3,3)
I do quite like isebarn's solution though, as it gets you thinking about the actual properties of the underlying probability distributions.

for random number in range [a,b]
rand() * (b-a) + a
and it works for a matrix aswell
rand(3,3) * (2 - (-1)) - 1
3x3 Array{Float64,2}:
1.85611 0.456955 -0.0219579
1.91196 -0.0352324 0.0296134
1.63924 -0.567682 0.45602

You need to use a FloatRange{Float64} with the dessired step:
julia> rand(-1.0:0.01:2.0, 3, 3)
3x3 Array{Float64,2}:
0.79 1.73 0.95
0.73 1.4 -0.46
1.42 1.68 -0.55

Related

Pivoted QR in Julia?

In Julia, the function qr(A) will perform a QR decomposition on a given matrix A. However, is there any function/way in Julia to do a "pivoted" QR decomposition on a given matrix?
Just pass in the magic option:
julia> A = 10*log.(1 .- rand(4,3));
julia> qr(A, ColumnNorm())
QRPivoted{Float64, Matrix{Float64}}
Q factor:
4×4 LinearAlgebra.QRPackedQ{Float64, Matrix{Float64}}:
-0.0101543 -0.218633 0.804736 -0.551812
-0.118832 -0.376628 0.446673 0.802816
-0.118236 -0.88692 -0.390704 -0.216204
-0.985797 0.154029 -0.0152722 -0.0651594
R factor:
3×3 Matrix{Float64}:
26.4193 4.80784 8.92215
0.0 18.7537 15.1792
0.0 0.0 -9.77702
permutation:
3-element Vector{Int64}:
2
3
1
Note that the docs are a little off.

Define a custom sortperm function

In Julia, let's suppose I have the following matrix:
julia> rank = [[1.0,2.0,NaN] [5.0,3.0,1.0]]
3×2 Array{Float64,2}:
1.0 5.0
2.0 3.0
NaN 1.0
Using mapslices and sortperm to get a ranking on each column gives:
r = mapslices(sortperm, rank; dims=1)
3×2 Array{Int64,2}:
1 3
2 2
3 1
The problem being that NaN are considered as "worst" elements instead of being kept in final matrix. What I finally want is:
3×2 Array{Int64,2}:
1 3
2 2
NaN 1
My current workaround is to compare each element of r with those of rank. But I'm quite sure Julia has a classier way of doing it :p.
Current workaround: not enough because requires extra computation after mapslices as well as creating another array new_r.
nrow, ncol = size(r)
new_r = [Float64(ifelse(isnan(rank[i,j]), NaN, r[i,j])) for i in 1:nrow, j in 1:ncol]
NaN is not "special" in Julia. It is just a floating point value. If you want NaN to be treated as missing value you should first convert it to missing and then use ordinalrank function from StatsBase.jl:
julia> rank = [[1.0,2.0,NaN] [5.0,3.0,1.0]]
3×2 Array{Float64,2}:
1.0 5.0
2.0 3.0
NaN 1.0
julia> using StatsBase
julia> mapslices(rank; dims=1) do x
ordinalrank(replace(x, NaN=>missing))
end
3×2 Array{Union{Missing, Int64},2}:
1 3
2 2
missing 1

How to insert a column of ones and zeros into a matrix using Octave?

Suppose I have a matrix with a set of integers. I want to use the check rand > 0.5 to prepend a random vector of 1s and 0s to my matrix. How could I do this?
Only a 6x1 matrix but you should get the point.
octave:1> a = [7;8;2;3;6;7];
octave:2> a = [a, rand(size(a))>0.5]
a =
7 0
8 1
2 1
3 0
6 1
7 0

draw random number following a custom distribution [duplicate]

This question already has answers here:
Weighted random numbers in MATLAB
(4 answers)
Closed 8 years ago.
I need to draw random numbers following a distribution I chose.
Example: draw 7 numbers from 1 to 7 with those probabilities:
1: 0.3
2: 0.2
3: 0.15
4: 0.15
5: 0.1
6: 0.05
7: 0.05
Since in my actual application I need to draw potentially 1000 numbers I need this to be as much efficient as possible (ideally linear).
I know there is a function in MATLAB that draws random numbers from a normal distribution; is there any way to adapt it?
Think you can use randsample too from Statistics Toolbox as referenced here.
%%// Replace 7 with 1000 for original problem
OUT = randsample([1:7], 7, true, [0.3 0.2 0.15 0.15 0.1 0.05 0.05])
numbers = 1:7;
probs = [.3 .2 .15 .15 .1 .05 .05];
N = 1000; %// how many random numbers you want
cumProbs = cumsum(probs(:)); %// will be used as thresholds
r = rand(1,N); %// random numbers between 0 and 1
output = sum(bsxfun(#ge, r, cumProbs))+1; %// how many thresholds are exceeded
You can use gendist from matlab file exchange: http://www.mathworks.com/matlabcentral/fileexchange/34101-random-numbers-from-a-discrete-distribution/content/gendist.m
This generates 1000 random numbers:
gendist([.3,.2,.15,.15,.1,.05,.05],1000,1)
If you do not have randsample, you can use histc like it does internally, just without all the fluff:
N = 100;
nums = 1:7;
p = [.3 .2 .15 .15 .1 .05 .05];
cdf = [0 cumsum(p(:).'/sum(p))]; cdf(end)=1; %' p is pdf
[~, isamps] = histc(rand(N,1),cdf);
out = nums(isamps);

How to use both binary and continuous features in the k-Nearest-Neighbor algorithm?

My feature vector has both continuous (or widely ranging) and binary components. If I simply use Euclidean distance, the continuous components will have a much greater impact:
Representing symmetric vs. asymmetric as 0 and 1 and some less important ratio ranging from 0 to 100, changing from symmetric to asymmetric has a tiny distance impact compared to changing the ratio by 25.
I can add more weight to the symmetry (by making it 0 or 100 for example), but is there a better way to do this?
You could try using the normalized Euclidean distance, described, for example, at the end of the first section here.
It simply scales every feature (continuous or discrete) by its standard deviation. This is more robust than, say, scaling by the range (max-min) as suggested by another poster.
If i correctly understand your question, normalizing (aka 'rescaling) each dimension or column in the data set is the conventional technique for dealing with over-weighting dimensions, e.g.,
ev_scaled = (ev_raw - ev_min) / (ev_max - ev_min)
In R, for instance, you can write this function:
ev_scaled = function(x) {
(x - min(x)) / (max(x) - min(x))
}
which works like this:
# generate some data:
# v1, v2 are two expectation variables in the same dataset
# but have very different 'scale':
> v1 = seq(100, 550, 50)
> v1
[1] 100 150 200 250 300 350 400 450 500 550
> v2 = sort(sample(seq(.1, 20, .1), 10))
> v2
[1] 0.2 3.5 5.1 5.6 8.0 8.3 9.9 11.3 15.5 19.4
> mean(v1)
[1] 325
> mean(v2)
[1] 8.68
# now normalize v1 & v2 using the function above:
> v1_scaled = ev_scaled(v1)
> v1_scaled
[1] 0.000 0.111 0.222 0.333 0.444 0.556 0.667 0.778 0.889 1.000
> v2_scaled = ev_scaled(v2)
> v2_scaled
[1] 0.000 0.172 0.255 0.281 0.406 0.422 0.505 0.578 0.797 1.000
> mean(v1_scaled)
[1] 0.5
> mean(v2_scaled)
[1] 0.442
> range(v1_scaled)
[1] 0 1
> range(v2_scaled)
[1] 0 1
You can also try Mahalanobis distance instead of Euclidean.

Resources