I am having trouble on how to input data with frequency in Mathematica.
For example, I have 0 with 10,000 frequencies, 1 with 9000 freq, 2 with 3000 freq and 4 with 1000 freq.
Can anyone help me to input this data into mathematica and then I need to find the first, second and third moment of those numbers.
Thanks a lot!
Something like :
data = Join[ConstantArray[1, 9000], ConstantArray[2, 3000], ConstantArray[4, 1000]];
Mean[data]
Variance[data]
Kurtosis[data]
Table[
fun[
EmpiricalDistribution[
{10000, 9000, 3000, 1000} -> {0, 1, 2, 4}
]
],
{fun, {Mean, Variance, Kurtosis}}
]
Related
I need some help with my Mathematica homework..
I want to get the mean of some random numbers from 0 to 10 and that each time for a higher amount of samples (from 10 to 20). Then I would like to plot it somehow as a distribution of all means or if that's not possible as a List Plot. I have to show that with a rising amount of samples, the mean becomes more and more correct. That's what I already have..
For[i = 10, i < 20, i++, Print[Mean[RandomInteger[10, i]]]]
I'm grateful for every help!!
For loops do not return results, so they have to be collected.
Print won't help.
output = {};
For[i = 10, i < 20000, i++, AppendTo[output, Mean[RandomInteger[10, i]]]]
ListPlot[output, AxesLabel -> {"Samples", "Mean"}]
Better to use Table instead of For. Table does return results.
output = Table[Mean[RandomInteger[10, i]], {i, 10, 20000}];
ListPlot[output, AxesLabel -> {"Samples", "Mean"}]
I am working on image classification for cifar data set.I obtained the predicted labels as output mapped from 0-1 for 10 different classes is there any way to find the class the predicted label belongs?
//sample output obtained
array([3.3655483e-04, 9.4402254e-01, 1.1646092e-03, 2.8560971e-04,
1.4086446e-04, 7.1564602e-05, 2.4985364e-03, 6.5030693e-04,
3.4783698e-05, 5.0794542e-02], dtype=float32)
One way is to find the max and make that index as 1 and rest to 0.
//for above case it should look like this
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0])
can anybody tell me how to do this or else if you have any better methods please suggest. thanks
It is as simple as
>>> data = np.array([3.3655483e-04, 9.4402254e-01, 1.1646092e-03, 2.8560971e-04,
... 1.4086446e-04, 7.1564602e-05, 2.4985364e-03, 6.5030693e-04,
... 3.4783698e-05, 5.0794542e-02], dtype=np.float32)
>>>
>>> (data == data.max()).view(np.int8)
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
Explanation: data.max() finds the largest value. We compare that with each individual element to get a vector of truth values. This we then cast to integer taking advantage of the fact that True maps to 1 and False maps to 0.
Please note that this will return multiple ones if the maximum is not unique.
I participate codefights and have the task find the minimal number of moves that are required to obtain a strictly increasing sequence from the input. As an input there are arrays of integers and accourding to the rule I can increase exactly one element an array by one per one move.
inputArray: [1, 1, 1]
Expected Output:3
inputArray: [-1000, 0, -2, 0]
Expected Output:5
inputArray: [2, 1, 10, 1]
Expected Output:12
inputArray: [2, 3, 3, 5, 5, 5, 4, 12, 12, 10, 15]
Expected Output:13
There are also conditions for input and output:
[time limit] 4000ms (py3)
[input] array.integer inputArray
3 ≤ inputArray.length ≤ 105,
-105 ≤ inputArray[i] ≤ 105
[output] integer
I came up with the followitn solution:
def arrayChange(inputArray):
k=0
for i in range(len(inputArray)-1):
if (inputArray[i]<inputArray[i+1]) == False:
while inputArray[i+1]<=inputArray[i]:
inputArray[i+1] = inputArray[i+1] + 1
k +=1
return k
However, apperantly for some tests that I cannot observe my algorithm performance is out of the time limits:
6/8
Execution time limit exceeded on test 7: Program exceeded the execution time limit. Make sure that it completes execution in a few seconds for any possible input.
Sample tests: 4/4
Hidden tests: 2/4
How to improve my algorithm for increasing performance speed?
Right now you increase by 1 at a time. Currently you have this code snippet:
inputArray[i+1] = inputArray[i+1] + 1
Instead of incrementing by 1 each time, why not add all of the numbers at once? For example, if you have the list [1, 3, 0] it makes sense to add 4 to the last element. Doing this would go much more quickly than adding 1 4 times.
#fileyfood500 gave me a very usefull hint and here is my solution that works:
deltX=0
for i in range(len(a)-1):
if (a[i]<a[i+1]) == False:
deltX1 = abs(a[i+1]-a[i])+1
a[i+1] = a[i+1] + deltX1
deltX += deltX1
print(deltX)
Now I do not need while loop at all because I increase the item, that should be increased by necessary number in one step.
def arrayChange(inputArray):
count = 0
for i in range(1,len(inputArray)):
while inputArray[i] <= inputArray[i-1]:
c = inputArray[i]
inputArray[i]= inputArray[i-1]+1
count += inputArray[i] - c
return count
This way you increment it directly and the time will be less.
Then just subtract the new value from the earlier one to get the number of times it increments.
Let's say I have an array like this:
[
{
"player_id" => 1,
"number_of_matches" => 2,
"goals" => 5
},
{
"player_id" => 2,
"number_of_matches" => 4,
"goals" => 10
}
]
I want to have the average goals per match among all the players, not the average for each individual player, but the total average.
I have in mind doing it with .each and storing each of the individual averages, and at the end add them all and divide by the number of players I have. However, I am looking for a Ruby/ one-liner way of doing this.
As requested, a one-liner:
avg = xs.map { |x| x["goals"].to_f / x["number_of_matches"] }.reduce(:+) / xs.size
A more readable snippet:
goals, matches = xs.map { |x| [x["goals"], x["number_of_matches"]] }.transpose
avg = goals.reduce(:+).to_f / matches.reduce(:+) if goals
A slight modification to tokland's answer.
items.map{|e| e.values_at("goals", "number_of_matches")}.transpose.map{|e| e.inject(:+)}.instance_eval{|goals, matches| goals.to_f/matches}
a = [{player_id:1 , match_num:2, goals: 5}, {player_id:2 , match_num:4, goals: 10}]
a.reduce(0){|avg, p| avg += p[:goals].to_f/p[:match_num]}/a.size
Edit: renamed keys and block args to reduce char count. For those who care.
First, your keys need to use => if your going to use strings as keys.
reduce will iterate over the array and sum the individual averages for each player and finally we divide that result by the number of total players. The '0' in the parenthesis is your starting number for reduce.
To make string shorter, lets rename "number_of_matches" to "matches"
a = [
{"player_id":1 , "matches":2, "goals": 5},
{"player_id":2 , "matches":4, "goals": 10}
]
a.reduce([0,0]){|sum,h|[sum.first+h["goals"],sum.last+h["matches"]]}.reduce{|sum,m|sum.to_f/m}
#=> 2.5
I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form
T={{x1,y1,z1},{x2,y2,z2},....}
and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.
Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).
I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?
Here is a method based on Szabolcs's post that is about about an order of magnitude faster.
data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];
epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[
gb[[All, 1, 1]] ->
Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming
Gives about {2.012217, Null}
AbsoluteTiming[
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
res3 = SparseArray[indexes -> zvalues];
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
]
Gives about {0.195228, Null}
res3 == res2
True
"TreatRepeatedEntries" -> 1 adds duplicate positions up.
I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round, Floor, or Ceiling (with a second argument) in place of Nearest, the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.
Assuming I understand your requirements, I propose:
data = RandomReal[100, {75, 3}];
bins = {0, 20, 40, 60, 80, 100};
Reap[
Sow[{#3, #2}, bins ~Nearest~ #] & ### data,
bins,
Reap[Sow[#, bins ~Nearest~ #2] & ### #2, bins, Tr##2 &][[2]] &
][[2]] ~Flatten~ 1 ~Total~ {3} // MatrixForm
Refactored:
f[bins_] := Reap[Sow[{##2}, bins ~Nearest~ #]& ### #, bins, #2][[2]] &
bin2D[data_, X_, Y_] := f[X][data, f[Y][#2, #2~Total~2 &] &] ~Flatten~ 1 ~Total~ {3}
Use:
bin2D[data, xbins, ybins]
Here's my approach:
data = RandomReal[5, {500000, 3}]; (* 500k values *)
zvalues = data[[All, 3]];
epsilon = 1*^-10; (* prevent 101 index *)
(* rescale and round (x,y) coordinates to index pairs in the 1..100 range *)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
(* approach 1: create bin-matrix first, then fill up elements by adding zvalues *)
res1 = Module[
{result = ConstantArray[0, {100, 100}]},
Do[
AddTo[result[[##]], zvalues[[i]]] & ## indexes[[i]],
{i, Length[indexes]}
];
result
]; // Timing
(* approach 2: gather zvalues by indexes, add them up, convert them to a matrix *)
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[gb[[All, 1, 1]] -> (Total /# gb[[All, All, 2]])]
]; // Timing
res1 == res2
These two approaches (res1 & res2) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?
Here's my approach using the function SelectEquivalents defined in What is in your Mathematica tool bag? which is perfect for a problem like this one.
data = RandomReal[100, {75, 3}];
bins = Range[0, 100, 20];
binMiddles = (Most#bins + Rest#bins)/2;
nearest = Nearest[binMiddles];
SelectEquivalents[
data
,
TagElement -> ({First#nearest[#[[1]]], First#nearest[#[[2]]]} &)
,
TransformElement -> (#[[3]] &)
,
TransformResults -> (Total[#2] &)
,
TagPattern -> Flatten[Outer[List, binMiddles, binMiddles], 1]
,
FinalFunction -> (Partition[Flatten[# /. {} -> 0], Length[binMiddles]] &)
]
If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];