Constrained Random Number Generation - random

I need to generate 500 numbers, 250 1s and 250 0s, randomly located. Below is what I do now. But it does not feel right while the output is correct.
trialNo=500
RandomSample#Flatten[Table[#, {trialNo/2}] & /# {0, 1}]

I'd actually do something slightly different. Since you're looking for a random permutation of Flatten[{ConstantArray[0,250], ConstantArray[1,250]}], I'd generate the permutation and use Part to get the list you're looking for. As follows,
perm = RandomSample[Range[trialNo]];
Flatten[{ConstantArray[0, trialNo/2], ConstantArray[1, trialNo/2]}][[ perm ]]
This isn't operationally different than what you're doing, but I think it captures mathematically what your trying to accomplish better.

Here is another way to do this.
Round[Ordering[1~RandomReal~#] / N##]& # 500
Now with more magic for the guys in Chat.
Mod[RandomSample#Range##, 2] & # 500

Related

solve equation using a list as coefficient

I am trying to solve this equation :
solve[-Log2[0.001]/1000 == k*Log2[k/q] + (1 - k)*Log2[(1 - k)/(1 - q)], q]
k is a value from a list
v1 = {7,8,9}
So the desired results should be
q={somevaule1, somevaule2, somevalue2} corresponding to different choice of k in v1
I searched online but no luck. Thanks for your help!
This will do it
v1 = {7, 8, 9};
FindRoot[-Log2[10^-3]/1000==#*Log2[#/q]+(1-#)*Log2[(1-#)/(1-q)],{q,5}]&/#v1
It complains about not being able to get the accuracy that it wants, but you may be able to ignore that. Giving it a WorkingPrecision or an AccuracyGoal option can perhaps overcome that. I changed the 0.001 to 10^-3 because that was the only number in your post that had a decimal point and I hoped by making that an exact fraction it might get rid of the warnings about accuracy, but that wasn't enough.
What that does is turn the whole FindRoot into a function, using # with & as the variable, and then uses Map (which has a shorthand of /#) to use that function on each item in your v1 list and returns you the list of the results. You could write exactly the same thing with
Map[FindRoot[....]&, v1]
if that is more understandable for you.

Making a custom Distribution of Random Numbers

I am trying to make sense of the different distribution objects in c++11 and I am finding it overwhelming. I hope some of you can and will help.
This is why I am looking into all this:
I need a random number generator that I can adjust every time it is used so that it is more likely to produce the same number again. The second requirement I need to fill is that I need the random numbers generated to only be these numbers:
{1, 2, 4, 8, 16, ..., 128}
Third and last requirement is that on certain occasions I need to skip one or more numbers from the above set.
My problem is that I don't understand the descriptions of various distribution objects. I, thus, cannot determine what tools I need to use to meet my above needs.
Can somebody tell me what tools I need and how I need to use them? The more clear, concise and detailed the response the better.
Your range can be generated with a random number j in the range [0, 7], then you compute:
1 << j
to get your number. std::uniform_int_distribution<> would be handy for generating the value in [0, 7].
Additionally you could use a std::bernoulli_distribution (which returns a random bool) to decide if the next number is going to be the same as the last one, or if you should generate a new number. The std::bernoulli_distribution defaults to a 50/50 chance of true/false, but you can customize that distribution in the bernoulli_distribution constructor to anything you like (e.g. 80/20 or whatever).
If this isn't clear enough, just jump in with some code. Try coding it up, and if it isn't working, post what you have, and I'm sure somebody will help.
Oh, forgot about your 3rd requirement: For that just put your [0, 7] generation in a loop, and if you come up with a number you're supposed to skip, then iterate the loop, else break out of it.
For skipping numbers I completely agree with Howard that manual checking is probably the way to go, but there might be a better way altering the probability of a given number being generated.
Another way to do this would be to use a discrete_distribution object, which allows you to specify the probability of generating any given value, so for your example it would be something like
std::default_random_engine entropy;
std::array<double, 128> probs;
probs.fill(1.0);
std::discrete_distribution<int> choose(probs.begin(), probs.end());
then when you're in your loop, in addition to deciding whether or not to skip, you can increment one of those values by some amount to increase the odds of it coming up again, making sure to reinitialize the discrete distribution, like this:
int x;
double myValue = 0.2;//or whatever increment you want
for (something; something else; something else else)
{
x = choose(entropy);
if (skip(x))
continue;//alternately you could set probs.at(x) = 0
//only if you never want to generate it again
probs.at(x) += myValue;
choose = std::discrete_distribution<int>(probs.begin(), probs.end());
output(x);
}
where skip and output are your functions to decide if x should be skipped and do whatever you want with the generated value respectively

Is Put - Get cycle in Mathematica always deterministic?

In Mathematica as in other systems of computer math the numbers are internally stored in binary form. However when exporting them with such functions as Put and PutAppend they are converted into approximate decimals. When you import them back with such functions as Get they are restored from this approximate decimal representation to binary form.
The question is whether the recovered number is always identical to the original binary number and, if not always, in which cases it is not and how large can be the difference? I am particularly interested in the Put - Get cycle (on the same computer system).
The following two simple experiments show that probably the Put - Get cycle in Mathematica always restores original numbers exactly even for arbitrary precision numbers:
In[1]:= list=RandomReal[{-10^6,10^6},10000];
Put[list,"test.txt"];
list2=Get["test.txt"];
Order[list,list2]===0
Order[Total#Abs[list-list2],0.]===0
Out[4]= True
Out[5]= True
In[6]:= list=SetPrecision[RandomReal[{-10^6,10^6},10000],50];
Put[list,"test.txt"];
list2=Get["test.txt"];
Order[list,list2]===0
Total#Abs[list-list2]//InputForm
Out[9]= True
Out[10]//InputForm=
0``39.999515496936205
But maybe I am missing something?
UPDATE
With more correct test code I have found that in reality these tests show only that restored numbers have identical binary RealDigits but their Precisions may differ even in Equal sense. Here are more correct tests:
test := (Put[list, "test.txt"];
list2 = Get["test.txt"];
{Order[list, list2] === 0,
Order[Total#Abs[list - list2], 0.] === 0,
Total[Order ### RealDigits[Transpose[{list, list2}], 2]],
Total[Order ### Map[Precision, Transpose[{list, list2}], {-1}]],
Total[1 - Boole[Equal ### Map[Precision, Transpose[{list, list2}], {-1}]]]})
In[8]:= list=RandomReal[NormalDistribution[],10000]^1001;
test
Out[9]= {False,True,0,1,3}
In[6]:= list=RandomReal[NormalDistribution[],10000,WorkingPrecision->50]^1001;
test
Out[7]= {False,False,0,-2174,1}
I'm afraid I can't give a definitive answer. If you look into the text file you see it's stored as something like the InputForm of the values, including the precision indication for non-machine precision numbers.
Assuming that Get uses the same conversion routines as ImportString and ExportString your test can be sped up a tiny bit.
Monitor[
Do[
i = RandomReal[{$MinMachineNumber, 10 $MinMachineNumber}, 100000];
If[i =!=
ToExpression[ImportString[ExportString[i, "Text"], "List"]],
Print[i]], {n, 100}
],
n]
I have tested this for several hundreds of millions of numbers in various ranges between $MinMachineNumber and $MaxMachineNumber and I always get back the original numbers. It's no proof, of course, but it seems unlikely that you're going to see numbers for which this is not true if there are any (and in that case the difference would be so tiny as to be negligible).
One important thing to know is that Put[] / Get[] doesn't keep packed arrays packed. You should check out DumpSave[]. It's much faster as it's a binary format and keeps arrays packed.

How to prepend a column to a matrix?

Ok, imagine I have this Matrix: {{1,2},{2,3}}, and I'd rather have {{4,1,2},{5,2,3}}. That is, I prepended a column to the matrix. Is there an easy way to do it?
My best proposal is this:
PrependColumn[vector_List, matrix_List] :=
Outer[Prepend[#1, #2] &, matrix, vector, 1]
But it obfuscates the code and constantly requires loading more and more code. Isn't this built in somehow?
Since ArrayFlatten was introduced in Mathematica 6 the least obfuscated solution must be
matrix = {{1, 2}, {2, 3}}
vector = {{4}, {5}}
ArrayFlatten#{{vector, matrix}}
A nice trick is that replacing any matrix block with 0 gives you a zero block of the right size.
I believe the most common way is to transpose, prepend, and transpose again:
PrependColumn[vector_List, matrix_List] :=
Transpose[Prepend[Transpose[matrix], vector]]
I think the least obscure is the following way of doing this is:
PrependColumn[vector_List, matrix_List] := MapThread[Prepend, {matrix, vector}];
In general, MapThread is the function that you'll use most often for tasks like this one (I use it all the time when adding labels to arrays before formating them nicely with Grid), and it can make things a lot clearer and more concise to use Prepend instead of the equivalent Prepend[#1, #2]&.
THE... ABSOLUTELY.. BY FAR... FASTEST method to append or prepend a column from my tests of various methods on array RandomReal[100,{10^8,5}] (kids, don't try this at home... if your machine isn't built for speed and memory, operations on an array this size are guaranteed to hang your computer)
...is this: Append[tmp\[Transpose], Range#Length#tmp]\[Transpose].
Replace Append with Prepend at will.
The next fastest thing is this: Table[tmp[[n]]~Join~{n}, {n, Length#tmp}] - almost twice as slow.

What is a "good" R value when comparing 2 signals using cross correlation?

I apologize for being a bit verbose in advance: if you want to skip all the background mumbo jumbo you can see my question down below.
This is pretty much a follow up to a question I previously posted on how to compare two 1D (time dependent) signals. One of the answers I got was to use the cross-correlation function (xcorr in MATLAB), which I did.
Background information
Perhaps a little background information will be useful: I'm trying to implement an Independent Component Analysis algorithm. One of my informal tests is to (1) create the test case by (a) generate 2 random vectors (1x1000), (b) combine the vectors into a 2x1000 matrix (called "S"), and multiply this by a 2x2 mixing matrix (called "A"), to give me a new matrix (let's call it "T").
In summary: T = A * S
(2) I then run the ICA algorithm to generate the inverse of the mixing matrix (called "W"), (3) multiply "T" by "W" to (hopefully) give me a reconstruction of the original signal matrix (called "X")
In summary: X = W * T
(4) I now want to compare "S" and "X". Although "S" and "X" are 2x1000, I simply compare S(1,:) to X(1,:) and S(2,:) to X(2,:), each which is 1x1000, making them 1D signals. (I have another step which makes sure that these vectors are the proper vectors to compare to each other and I also normalize the signals).
So my current quandary is how to 'grade' how close S(1,:) matches to X(1,:), and likewise with S(2,:) to X(2,:).
So far I have used something like: r1 = max(abs(xcorr(S(1,:), X(1,:)))
My question
Assuming that using the cross correlation function is a valid way to go about comparing the similarity of two signals, what would be considered a good R value to grade the similarity of the signals? Wikipedia states that this is a very subjective area, and so I defer to the better judgment of those who might have experience in this field.
As you might realize, I'm not coming from a EE/DSP/statistical background at all (I'm a medical student) so I'm going through a sort of "baptism through fire" right now, and I appreciate all the help I can get. Thanks!
(edit: as far as directly answering your question about R values, see below)
One way to approach this would be to use cross-correlation. Bear in mind that you have to normalize amplitudes and correct for delays: if you have signal S1, and signal S2 is identical in shape, but half the amplitude and delayed by 3 samples, they're still perfectly correlated.
For example:
>> t = 0:0.001:1;
>> y = #(t) sin(10*t).*exp(-10*t).*(t > 0);
>> S1 = y(t);
>> S2 = 0.4*y(t-0.1);
>> plot(t,S1,t,S2);
These should have a perfect correlation coefficient. A way to compute this is to use maximum cross-correlation:
>> f = #(S1,S2) max(xcorr(S1,S2));
f =
#(S1,S2) max(xcorr(S1,S2))
>> disp(f(S1,S1)); disp(f(S2,S2)); disp(f(S1,S2));
12.5000
2.0000
5.0000
The maximum value of xcorr() takes care of the time-delay between signals. As far as correcting for amplitude goes, you can normalize the signals so that their self-cross-correlation is 1.0, or you can fold that equivalent step into the following:
ρ2 = f(S1,S2)2 / (f(S1,S1)*f(S2,S2);
In this case ρ2 = 5 * 5 / (12.5 * 2) = 1.0
You can solve for ρ itself, i.e. ρ = f(S1,S2)/sqrt(f(S1,S1)*f(S2,S2)), just bear in mind that both 1.0 and -1.0 are perfectly correlated (-1.0 has opposite sign)
Try it on your signals!
with respect to what threshold to use for acceptance/rejection, that really depends on what kind of signals you have. 0.9 and above is fairly good but can be misleading. I would consider looking at the residual signal you get after you subtract out the correlated version. You could do this by looking at the time index of the maximum value of xcorr():
>> t = 0:0.001:1;
>> y = #(a,t) sin(a*t).*exp(-a*t).*(t > 0);
>> S1=y(10,t);
>> S2=0.4*y(9,t-0.1);
>> f(S1,S2)/sqrt(f(S1,S1)*f(S2,S2))
ans =
0.9959
This looks pretty darn good for a correlation. But let's try fitting S2 with a scaled/shifted multiple of S1:
>> [A,i]=max(xcorr(S1,S2)); tshift = i-length(S1);
>> S2fit = zeros(size(S2)); S2fit(1-tshift:end) = A/f(S1,S1)*S1(1:end+tshift);
>> plot(t,[S2; S2fit]); % fit S2 using S1 as a basis
>> plot(t,[S2-S2fit]); % residual
Residual has some energy in it; to get a feel for how much, you can use this:
>> S2res=S2-S2fit;
>> dot(S2res,S2res)/dot(S2,S2)
ans =
0.0081
>> sqrt(dot(S2res,S2res)/dot(S2,S2))
ans =
0.0900
This says that the residual has about 0.81% of the energy (9% of the root-mean-square amplitude) of the original signal S2. (the dot product of a 1D signal with itself will always be equal to the maximum value of cross-correlation of that signal with itself.)
I don't think there's a silver bullet for answering how similar two signals are with each other, but hopefully I've given you some ideas that might be applicable to your circumstances.
A good starting point is to get a sense of what a perfect match will look like by calculating the auto-correlations for each signal (i.e. do the "cross-correlation" of each signal with itself).
THIS IS A COMPLETE GUESS - but I'm guessing max(abs(xcorr(S(1,:),X(1,:)))) > 0.8 implies success. Just out of curiosity, what kind of values do you get for max(abs(xcorr(S(1,:),X(2,:))))?
Another approach to validate your algorithm might be to compare A and W. If W is calculated correctly, it should be A^-1, so can you calculate a measure like |A*W - I|? Maybe you have to normalize by the trace of A*W.
Getting back to your original question, I come from a DSP background, so I get to deal with fairly noise-free signals. I understand that's not a luxury you get in biology :) so my 0.8 guess might be very optimistic. Perhaps looking at some literature in your field, even if they aren't using cross-correlation exactly, might be useful.
Usually in such cases people talk about "false acceptance rate" and "false rejection rate".
The first one describes how many times algorithm says "similar" for non-similar signals, the second one is the opposite.
Selecting a threshold thus becomes a trade-off between these criteria. To make FAR=0, threshold should be 1, to make FRR=0 threshold should be -1.
So probably, you will need to decide which trade-off between FAR and FRR is acceptable in your situation and this will give the right value for threshold.
Mathematically this can be expressed in different ways. Just a couple of examples:
1. fix some of rates at acceptable value and minimize other one
2. minimize max(FRR,FAR)
3. minimize aFRR+bFAR
Since they should be equal, the correlation coefficient should be high, between .99 and 1. I would take the max and abs functions out of your calculation, too.
EDIT:
I spoke too soon. I confused cross-correlation with correlation coefficient, which is completely different. My answer might not be worth much.
I would agree that the result would be subjective. Something that would involve the sum of the squares of the differences, element by element, would have some value. Two identical arrays would give a value of 0 in that form. You would have to decide what value then becomes "bad". Make up 2 different vectors that "aren't too bad" and find their cross-correlation coefficient to be used as a guide.
(parenthetically: if you were doing a correlation coefficient where 1 or -1 would be great and 0 would be awful, I've been told by bio-statisticians that a real-life value of 0.7 is extremely good. I understand that this is not exactly what you are doing but the comment on correlation coefficient came up earlier.)

Resources