Compute Relative Frequency in Mathematica - wolfram-mathematica

With :
dalist = {{379, 219, 228, 401}, {387, 239, 230, 393},
{403, 238, 217, 429}, {377, 233, 225, 432}}
BarChart#dalist
I would like to compute / Plot the relative frequency instead of absolute count for each Bin for each condition.
Where :
{379, 219, 228, 401}
are the 4 bins count for one condition. So :
{379, 219, 228, 401}[[1]]/Total#{379, 219, 228, 401}
is the result I want to see of the first condition / first Bin, instead of the count itself.

belisarius beat me to it.
You might also want to explore BarChart[dalist, ChartLayout -> "Percentile"]

Isn't it
BarChart[dalist/Total /# dalist]
?

All you have to do is this:
In[13]:= #/Total[#] & /# dalist
Out[13]= {{379/1227, 73/409, 76/409, 401/1227}, {387/1249, 239/1249,
230/1249, 393/1249}, {31/99, 238/1287, 217/1287, 1/3}, {377/1267,
233/1267, 225/1267, 432/1267}}
and chart it instead

Related

Search algorithm with best Time Complexity [duplicate]

This question already has answers here:
How do I search for a number in a 2d array sorted left to right and top to bottom?
(21 answers)
Closed 4 years ago.
Given the following data:
[4]
[5, 8]
[9, 12, 20]
[10, 15, 23, 28]
[14, 19, 31, 36, 48]
[15, 22, 34, 41, 53, 60]
[19, 26, 42, 49, 65, 72, 88]
[20, 29, 45, 54, 70, 79, 95, 104]
[24, 33, 53, 62, 82, 91, 111, 120, 140]
[25, 36, 56, 67, 87, 98, 118, 129, 149, 160]
[29, 40, 64, 75, 99, 110, 134, 145, 169, 180, 204]
[30, 43, 67, 80, 104, 117, 141, 154, 178, 191, 215, 228]
[34, 47, 75, 88, 116, 129, 157, 170, 198, 211, 239, 252, 280]
[35, 50, 78, 93, 121, 136, 164, 179, 207, 222, 250, 265, 293, 308]
[Etc.]
What could be the best searching algorithm with the most optimal Time Complexity for finding a given number?
The rows are sorted
The columns are sorted
A number may occur more than once
Extra info:
Suppose we are looking for the number 26:
Due to order, this means we can eliminate the first 3 rows and the remaining columns to the right.
Due to order, this also means we can ignore every row after row=11.
Which results to this:
[10, 15, 23]
[14, 19, 31]
[15, 22, 34]
[19, 26, 42]
[20, 29, 45]
[24, 33, 53]
[25, 36, 56]
[29, 40, 64]
My current algorithm has a time complexity of O(x log(y)) where x is the amount of columns and y is the size for the Binary Search algorithm for each column.
I'm looking for something faster because I'm dealing with huge amount of data.
Currently I'm using BST on every column, but could I use BST on rows aswell? maybe achieving a O(log(x) log(y))?
It can be done in O(x)
Let's call the element we are trying to find n
Start with the bottom left element.
For each element we search through (let's call it e):
if e == n: we found it
if e < n: move to the right
Justification:
All elements to the left of e, including the column that e is in, are less than e. Those elements cannot == n and can be eliminated.
if e > n: move up
Justification:
All elements below e are greater than e and can be eliminated. What about the values less than e to the left of e? Can't those be == n? No. For e to make those moves to the right and have values to it's left, those values would have been already eliminated in step 2
Repeat until n found or index out of bounds in which case such an element does not exist.
Time complexity:
The worst case scenario is if the element isn't in the array and we have an index out of bounds. This occurs at the main diagonal and the total distance to the right and total distance up to any element on the long diagonal always sums to x.
You can find the bottom left of your trimmed array with a binary search of the first column, and the top right with a binary search of the last column of each row.
From there, the problem degenerates to How do I search for a number in a 2d array sorted left to right and top to bottom? which is well-studied in the linked question. The best algorithm is dependent on the shape of the result.

How to assert list of values in ascending order?

Here is a list of values in an array:
[463, 246, 216, 194, 154, 152, 147, 140, 129, 128, 123, 118, 118, 102, 102, 101, 97, 96, 93, 85]
How can I ensure/assert through RSpec that the array list is in ascending order?
The simplest way is probably:
expect(array.sort).to eq(array)
"Ascending" means "the next element is not smaller than the current". You can encode that into a predicate easily:
expect(array.each_cons(2).all? {|a, b| a <= b }).to be_truthy
Note that Array#sort is not stable, so something like
expect(array.sort).to eq(array)
does not work!

prolog - not a function warning

I have this Prolog program where I want to match players with players with the same level (newbie, intermediate, or expert) and server:
player(player29, 408, 183, europe).
player(player30, 462, 97, north-america).
player(player31, 25, 22, asia).
player(player32, 481, 248, asia).
player(player33, 111, 37, asia).
player(player34, 424, 359, north-america).
player(player35, 381, 358, asia).
player(player36, 231, 159, africa).
player(player37, 31, 20, africa).
player(player38, 22, 21, africa).
player(player39, 144, 35, oceania).
player(player40, 30, 25, asia).
player(player41, 221, 112, south-america).
player(player42, 344, 292, africa).
player(player43, 183, 148, asia).
player(player44, 62, 40, africa).
player(player45, 281, 23, north-america).
player(player46, 308, 173, south-america).
player(player47, 127, 125, asia).
player(player48, 441, 393, south-america).
player(player49, 213, 48, oceania).
player(player50, 343, 145, africa).
winrate(X):-player(X, T, W, _); (W/T) * 100.
newbie(X):-winrate(X) < 40.
intermediate(X):-winrate(X) >=40; winrate(X) < 80.
expert(X):-winrate(X) > 80.
However, I get a warning saying "Arithmetic: 'winrate(_G1082)' is not a function" when I compile it. Can someone explain to me what that means?
If I could nominate a misconception in Prolog for greatest all-time beginner stumbling block, it would be this: there is no such thing as a return value in Prolog. So the problem comes down to this definition:
winrate(X) :- player(X, T, W, _); (W/T) * 100.
If you type just that into Prolog, you'll get a mysterious looking warning message:
Warning: user://1:9:
Singleton variable in branch: T
Singleton variable in branch: W
I suspect you think that clause says "Look up T and W for player X and return W/T * 100." What Prolog actually thinks you said there is "Look up T and W for player X, or something over something times one hundred <awkward pause>" which is not particularly meaningful. When you ask winrate(player47), Prolog will stand in the corner with its hands in its pockets and say "um, true?"
The correction is this:
winrate(X, Rate) :- player(X, T, W, _), Rate is (W/T) * 100.
You have the exact same problem a little further down in newbie/1 et. al.: winrate(X) < 40 does not have intrinsic meaning in Prolog, because there is no "return value" in Prolog. The corrected expression is winrate(X, WinRate), WinRate < 40.
Note that there is nothing special about the last argument in Prolog. It's common for predicates with one-way semantics to use the last arguments for results, but it isn't really a law and Prolog is not enforcing anything there.
Pay close attention to singleton variable errors. Nothing meaningful happens in Prolog without variables appearing more than once, so if you get this error and you immediately replace the named variables with _, does your clause still seem to have enough information to do its work? If not, you have almost certainly missed something or are confused about Prolog's semantics. Believe me, I learned this one from experience.

Mathematical representation of large numbers?

I am attempting to write a function which takes a large number as input (upwards of 800 digits long) and returns a simple formula of no complex math as a string.
By simple math, I mean just numbers with +,-,*,/,^ and () as needed.
'4^25+2^32' = giveMeMath(1125904201809920); // example
Any language would do. I can refactor it, just looking for some help with the logic.
Bonus. The shorter the output the better. Processing time is important. Also, mathematical accuracy is a must.
Update:
to clarify, all input values will be positive integers (no decimals)
I think the entire problem can be recast to a run-length encoding problem on the binary representation of the long integer.
For example, take the following number:
17976931348623159077293051907890247336179769789423065727343008115773
26758055009631327084773224075360211201138798713933576587897688144166
22492847430639474110969959963482268385702277221395399966640087262359
69162804527670696057843280792693630866652907025992282065272811175389
6392184596904358265409895975218053120L
This looks fairly horrendous. In binary, though:
>>> bin(_)
'0b11111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111100000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
0000000'
Which is about 500 ones, followed by 500 zeroes. This suggests an expression like:
2**1024 - 2**512
Which is how I obtained the large number in the first place.
If there are no significantly long runs in the binary representation of the integer, this won't work well at all. 101010101010101010.... is the worst case.
Here is my attempt in Python:
def give_me_math(n):
if n % 2 == 1:
n = n - 1 # we need to make every odd number even, and add back one later
odd = 1
else:
odd = 0
exps = []
while n > 0:
c = 0
num = 0
while num <= n/2:
c += 1
num = 2**c
exps.append(c)
n = n - num
return (exps, odd)
Results:
>>> give_me_math(100)
([6, 5, 2], 0) #2**6 + 2**5 + 2**2 + 0 = 100
>>> give_me_math(99)
([6, 5, 1], 1) #2**6 + 2**5 + 2**1 + 1 = 99
>>> give_me_math(103)
([6, 5, 2, 1], 1) #2**6 + 2**5 + 2**2 + 2**1 + 1 = 103
I believe the results are accurate, but I am not sure about your other criteria.
Edit:
Result: Calculates in about a second.
>>> give_me_math(10**100 + 3435)
([332, 329, 326, 323, 320, 319, 317, 315, 314, 312, 309, 306, 304, 303, 300, 298, 295, 294, 289, 288, 286, 285, 284, 283, 282, 279, 278, 277, 275, 273, 272, 267, 265, 264, 261, 258, 257, 256, 255, 250, 247, 246, 242, 239, 238, 235, 234, 233, 227, 225, 224, 223, 222, 221, 220, 217, 216, 215, 211, 209, 207, 206, 203, 202, 201, 198, 191, 187, 186, 185, 181, 176, 172, 171, 169, 166, 165, 164, 163, 162, 159, 157, 155, 153, 151, 149, 148, 145, 142, 137, 136, 131, 127, 125, 123, 117, 115, 114, 113, 111, 107, 106, 105, 104, 100, 11, 10, 8, 6, 5, 3, 1], 1)
800 digit works fast too:
>>> give_me_math(10**800 + 3452)
But the output is too long to post here, which is OPs concern of course.
Time complexity here is 0(ln(n)), so it is pretty efficient.
In java, you should take a look at the BigDecimal class in java.math package.
I'd suggest you to have a look at
The GMP library (The GNU Multiple Precision Arithmetic Library) for performing the arithmetics
Take a look at integer factorization. The link redirects to Wikipedia which should give probably a good overview. However to be a bit more scientific:
Integer factorization (PDF) by Daniel Bernstein of the University of Illinois
Integer Factorization Algorithms (PDF) by Connelly Barnes of the Department of Physics, Oregon State University

What are some algorithms for finding a closed form function given an integer sequence?

I'm looking form a programatic way to take an integer sequence and spit out a closed form function. Something like:
Given: 1,3,6,10,15
Return: n(n+1)/2
Samples could be useful; the language is unimportant.
This touches an extremely deep, sophisticated and active area of mathematics. The solution is damn near trivial in some cases (linear recurrences) and damn near impossible in others (think 2, 3, 5, 7, 11, 13, ....) You could start by looking at generating functions for example and looking at Herb Wilf's incredible book (cf. page 1 (2e)) on the subject but that will only get you so far.
But I think your best bet is to give up, query Sloane's comprehensive Encyclopedia of Integer Sequences when you need to know the answer, and instead spend your time reading the opinions of one of the most eccentric personalities in this deep subject.
Anyone who tells you this problem is solvable is selling you snake oil (cf. page 118 of the Wilf book (2e).)
There is no one function in general.
For the sequence you specified, The On-Line Encyclopedia of Integer Sequences finds 133 matches in its database of interesting integer sequences. I've copied the first 5 here.
A000217 Triangular numbers: a(n) = C(n+1,2) = n(n+1)/2 = 0+1+2+...+n.
0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561, 595, 630, 666, 703, 741, 780, 820, 861, 903, 946, 990, 1035, 1081, 1128, 1176, 1225, 1275, 1326, 1378, 1431
A130484 Sum {0<=k<=n, k mod 6} (Partial sums of A010875).
0, 1, 3, 6, 10, 15, 15, 16, 18, 21, 25, 30, 30, 31, 33, 36, 40, 45, 45, 46, 48, 51, 55, 60, 60, 61, 63, 66, 70, 75, 75, 76, 78, 81, 85, 90, 90, 91, 93, 96, 100, 105, 105, 106, 108, 111, 115, 120, 120, 121, 123, 126, 130, 135, 135, 136, 138, 141, 145, 150, 150, 151, 153
A130485 Sum {0<=k<=n, k mod 7} (Partial sums of A010876).
0, 1, 3, 6, 10, 15, 21, 21, 22, 24, 27, 31, 36, 42, 42, 43, 45, 48, 52, 57, 63, 63, 64, 66, 69, 73, 78, 84, 84, 85, 87, 90, 94, 99, 105, 105, 106, 108, 111, 115, 120, 126, 126, 127, 129, 132, 136, 141, 147, 147, 148, 150, 153, 157, 162, 168, 168, 169, 171, 174, 178, 183
A104619 Write the natural numbers in base 16 in a triangle with k digits in the k-th row, as shown below. Sequence gives the leading diagonal.
1, 3, 6, 10, 15, 2, 1, 1, 14, 3, 2, 2, 5, 12, 4, 4, 4, 13, 6, 7, 11, 6, 9, 9, 10, 7, 12, 13, 1, 0, 1, 10, 5, 1, 12, 8, 1, 1, 14, 1, 9, 7, 1, 4, 3, 1, 2, 2, 1, 3, 4, 2, 7, 9, 2, 14, 1, 2, 8, 12, 2, 5, 10, 3, 5, 11, 3, 8, 15, 3, 14, 6, 3, 7, 0, 4, 3, 13, 4, 2, 13, 4, 4, 0, 5, 9, 6, 5, 1, 15, 5, 12, 11, 6
A037123 a(n) = a(n-1) + Sum of digits of n.
0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 46, 48, 51, 55, 60, 66, 73, 81, 90, 100, 102, 105, 109, 114, 120, 127, 135, 144, 154, 165, 168, 172, 177, 183, 190, 198, 207, 217, 228, 240, 244, 249, 255, 262, 270, 279, 289, 300, 312, 325, 330, 336, 343, 351, 360, 370, 381
If you restrict yourself to polynomial functions, this is easy to code up, and only mildly tedious to solve by hand.
Let , for some unknown
Now solve the equations
…
which simply a system of linear equations.
If your data is guaranteed to be expressible as a polynomial, I think you would be able to use R (or any suite that offers regression fitting of data). If your correlation is exactly 1, then the line is a perfect fit to describe the series.
There's a lot of statistics that goes into regression analysis, and I am not familiar enough with even the basics of calculation to give you much detail.
But, this link to regression analysis in R might be of assistance
The Axiom computer algebra system includes a package for this purpose. You can read its documentation here.
Here's the output for your example sequence in FriCAS (a fork of Axiom):
(3) -> guess([1, 3, 6, 10, 15])
2
n + 3n + 2
(3) [[function= -----------,order= 0]]
2
Type: List(Record(function: Expression(Integer),order: NonNegativeInteger))
I think your problem is ill-posed. Given any finite number of integers in a sequence with
no generating function, the next element can be anything.
You need to assume something about the sequence. Is it geometric? Arithmetic?
If your sequence comes from a polynomial then divided differences will find that polynomial expressed in terms of the Newton basis or binomial basis. See this.
There is no general answers; a simple method can be implemented bu using Pade approximants; in two words, assume your sequence is a sequence of coefficients of the Taylor expansion of an unknown function, then apply an algorithm (similar to the continued-fraction algorithm) in order to "simplify" this Taylor-expansion (more precisely: find a rational function very close to the initial (and truncated) function. The Maxima program can do it: look at "pade" on the page: http://maxima.sourceforge.net/docs/manual/maxima_28.html
Another answer tells about the "guess" package in the FriCAS fork of Axiom (see previous answer by jmbr). If I am not wrong; this package is itself inspired from the Rate program by Christian Krattenthaler; you can find it here: http://www.mat.univie.ac.at/~kratt/rate/rate.html Maybe looking at its source could tell you about other methods.

Resources