R: One-Way ANOVA and Scheffé - Test (Independent Measures vs. Repeated Measures) - measure

I know how to code a one way ANOVA with 3 groups (a, b, c):
> a=c(45,42,36,39,51,44); b=c(50,42,41,35,55,49);
> c=c(55,45,43,40,59,56); vec1=c(a,b,c); F0=gl(3,6,18);
>
> df=data.frame(f=F0,d=vec1); model=aov(d~f,data=df);
And I know how to perform, given that, a Scheffé-post-hoc test:
library(DescTools);
ScheffeTest(model)
BUT WHAT if I have a repeated-measures-ANOVA with 3 groups (a, b, c):
df=data.frame(pa=rep(1:6,3),f=F0,d=vec1);
modelRm=aov(d~factor(f)+Error(factor(pa)),data=df)
My question is:
HOW WOULD THE repeated-measures-Scheffé-post-hoc test BE expressed? When I type
ScheffeTest(modelRm)
it raises:
Error in modelRm.frame.default(formula = x ~ g, drop.unused.levels = TRUE) :
invalid type (list) for variable 'x'

Related

Algorithm to precisely compare two exponentiations for very large integers (order of 1 billion)

We want to compare a^b to c^d, and tell if the first is smaller, greater, or equal (where ^ denotes exponentiation).
Obviously, for very large numbers, we cannot explicitely compute these values.
The most common approach in this situation is to apply log on both sides and compare b * log(a) to d * log(c). The issue here is that logs are floating-point operations, and as such we cannot trust our answer with 100% confidence (there might be some values which are incredibly close, and because of floating-point error we get a wrong answer).
Is there an algorithm for solving this problem? I've been scouring the intrernet for this, but I can only find solutions which work for particular cases only (e.g. in which one exponent is a multiple of another), or which use floating point in some way (logarithms, division) etc.
This is sort of two questions in one:
Are they equal?
If not, which one is greater?
As Peter O. observes, it's easiest to build in a language that provides an arbitrary-precision fraction type. I'll use Python 3.
Let's assume without loss of generality that a ≤ c (swap if necessary) and b is relatively prime to d (divide both by the greatest common divisor).
To get at the core of the question, I'm going to assume that a, c > 0 and b, d ≥ 0. Removing this assumption is tedious but not difficult.
Equality test
There are some easy cases where a = 1 or b = 0 or c = 1 or d = 0.
Separately, necessary conditions for a^b = c^d are
i. b ≥ d, since otherwise b < d, which together with a ≤ c implies a^b < c^d;
ii. a is a divisor of c, since we know from (i) that a^b = c^d is a divisor of c^b = c^(b−d) c^d.
When these conditions hold, we can divide through by a^d to reduce the problem to testing whether a^(b−d) = (c/a)^d.
In Python 3:
def equal_powers(a, b, c, d):
while True:
lhs_is_one = a == 1 or b == 0
rhs_is_one = c == 1 or d == 0
if lhs_is_one or rhs_is_one:
return lhs_is_one and rhs_is_one
if a > c:
a, b, c, d = c, d, a, b
if b < d:
return False
q, r = divmod(c, a)
if r != 0:
return False
b -= d
c = q
def test_equal_powers():
for a in range(1, 25):
for b in range(25):
for c in range(1, 25):
for d in range(25):
assert equal_powers(a, b, c, d) == (a ** b == c ** d)
test_equal_powers()
Inequality test
Once we've established that the two quantities are not equal, it's time to figure out which one is greater. (Without the equality test, the code here could run forever.)
If you're doing this for real, you should consult an actual reference on computing elementary functions. I'm just going to try to do the simplest thing that works.
Time for a calculus refresher. We have the Taylor series
−log x = (1−x) + (1−x)^2/2 + (1−x)^3/3 + (1−x)^4/4 + ...
To get a lower bound, truncate the series. To get an upper bound, we can truncate but replace the final term (1−x)^n/n with (1−x)^n/n (1/x), since
(1−x)^n/n (1/x)
= (1−x)^n/n (1 + (1−x) + (1−x)^2 + ...)
= (1−x)^n/n + (1−x)^(n+1)/n + (1−x)^(n+2)/n + ...
> (1−x)^n/n + (1−x)^(n+1)/(n+1) + (1−x)^(n+2)/(n+2) + ...
To get a good convergence rate, we're going to want 0.5 ≤ x < 1, which we can achieve by dividing x by a power of two.
In Python, we'll represent a real number as an infinite generator of shrinking intervals that contain the true value. Once the intervals for b log a and d log c are disjoint, we can determine how they compare.
import fractions
def minus(x, y):
while True:
x_lo, x_hi = next(x)
y_lo, y_hi = next(y)
yield x_lo - y_hi, x_hi - y_lo
def times(b, x):
for lo, hi in x:
yield b * lo, b * hi
def restricted_log(a):
series = 0
n = 0
numerator = 1
while True:
n += 1
numerator *= 1 - a
series += fractions.Fraction(numerator, n)
yield -(series + fractions.Fraction(numerator * (1 - a), (n + 1) * a)), -series
def log(a):
n = 0
while a >= 1:
a = fractions.Fraction(a, 2)
n += 1
return minus(restricted_log(a), times(n, restricted_log(fractions.Fraction(1, 2))))
def less_powers(a, b, c, d):
lhs = times(b, log(a))
rhs = times(d, log(c))
while True:
lhs_lo, lhs_hi = next(lhs)
rhs_lo, rhs_hi = next(rhs)
if lhs_hi < rhs_lo:
return True
if rhs_hi < lhs_lo:
return False
def test_less_powers():
for a in range(1, 10):
for b in range(10):
for c in range(1, 10):
for d in range(10):
if a ** b != c ** d:
assert less_powers(a, b, c, d) == (a ** b < c ** d)
test_less_powers()

Cleaner way to represent languages accepted by DFAs?

I am given 2 DFAs. * denotes final states and -> denotes the initial state, defined over the alphabet {a, b}.
1) ->A with a goes to A. -> A with b goes to *B. *B with a goes to *B. *B with b goes to ->A.
The regular expression for this is clearly:
E = a* b(a* + (a* ba* ba*)*)
And the language that it accepts is L1= {w over {a,b} | w is b preceeded by any number of a's followed by any number of a's or w is b preceeded by any number of a's followed by any number of bb with any number of a's in middle of(middle of bb), end or beginning.}
2) ->* A with b goes to ->* A. ->*A with a goes to *B. B with b goes to -> A. *B with a goes to C. C with a goes to C. C with b goes to C.
Note: A is both final and initial state. B is final state.
Now the regular expression that I get for this is:
E = b* ((ab) * + a(b b* a)*)
Finally the language that this DFA accepts is:
L2 = {w over {a, b} | w is n 1's followed by either k 01's or a followed by m 11^r0' s where n,km,r >= 0}
Now the question is, is there a cleaner way to represent the languages L1 and L2 because it does seem ugly. Thanks in advance.
E = a* b(a* + (a* ba* ba*)*)
= a*ba* + a*b(a* ba* ba*)*
= a*ba* + a*b(a*ba*ba*)*a*
= a*b(a*ba*ba*)*a*
= a*b(a*ba*b)*a*
This is the language of all strings of a and b containing an odd number of bs. This might be most compactly denoted symbolically as {w in {a,b}* | #b(w) = 1 (mod 2)}.
For the second one: the only way to get to state B is to see an a in A, and the only way to get to C from outside C is to see an a in B. C is a dead state and the only way to get to it is to see aa starting in A. That is: if you ever see two as in a row, the string is not in the language; the language is the set of all strings over a and b not containing the substring aa. This might be most compactly denoted symbolically as {(a+b)*aa(a+b)*}^c where ^c means "complement".

How to find the sum of all values for a key using Apache Spark having ((key1,value),(key2,value)) pattern

I am having a dataset as following-
A B C
(a,c,30)
(a,b,20)
(b,c,10)
(c,d,1)
Now I need to process the above data to get output like -
Any key in column A will get multiplied by 2 times of C
and any Key in Column B will get multiplied by 3 times of C
So the expected output here will be -
a 100 =30*2+20*2
b 80 =20*3+10*2
c 122 =30*3+10*3+1*2
d 3 =1*3
I could manage to write like following-
val x = sc.parallelize(List(
("a","b",20),
("b","c",10),
("a","c",30),
("c","d",1)
))
val myVal = x.map({
case (a,b,c) => ((a-> 2 * c), (b -> 3 * c))
})
myVal.foreach(println)
output-
((a,60),(c,90))
((c,2),(d,3))
((a,40),(b,60))
((b,20),(c,30))
After that I am not able to break it further
How can I get the result expected using spark scala ?
The point is to make it flat first - associate one value with one key. Then it'd be possible to use reduceByKey operation to sum it up.
I'm not scala developer, but something like this would probably work.
myVal
.flatMap({ case (a, b, c) => List(a -> 2 * c, b -> 3 * c) })
.reduceByKey((a, b) => a + b)
.foreach(println(_))
List here is an additional object that has to be created each time and it might be better to avoid it. So, something like this might work - look through the data twice, but cache it before.
myVal.cache()
.map({ case (a, b, c) => a -> 2 * c })
.union(rdd.map({ case (a, b, c) => b -> 3 * c }))
.reduceByKey((a, b) => a + b)
.foreach(println(_))
myVal.unpersist()

Maximal sets intersection

Given 5 finite sets a,b,c,d,e. Each set is assigned the arbitrary number:
a = 100, b = 34, c = 15, d = 89, e = 57
complement of each set has the same number assigned but negated e.g. for (a') it will be -100.
We need to find such intersection of these all sets or their complements so the resulting set is not null set, and the sum of the assigned numbers is maximal.
I only see one brute force solution to this problem, but it will be very inefficient and it's not elegant. In this case we just generate all combinations and resolve them to see if they are not empty, combinations look like this:
{a∩b'∩c'∩d'∩e'}, {a'∩b∩c'∩d∩e'}, {a'∩b'∩c∩d'∩e'}, {a'∩b'∩c'∩d∩e'}, {a'∩b'∩c'∩d'∩e} {a∩b∩c'∩d'∩e'}, {a∩b'∩c∩d'∩e'}, {a∩b'∩c'∩d∩e}, {a∩b'∩c'∩d'∩e}, {a'∩b∩c∩d'∩e'} {a'∩b∩c'∩d∩e'} {a'∩b∩c'∩d'∩e} ...
and then just pick the max number.
Looking forward to see if someone can think of something better :)
Define score(x, X) be to be the value of set X if x is in X, otherwise its negation.
Then, letting * represent an element that's not in any of the 5 sets, the highest score possible is:
max_{x in union(A, B, C, D, E, {*}} sum_{X in A, B, C, D, E} score(x, X)
This follows from the observation that any particular x is either in a set or its complement. You don't actually have to compute the union here. In Python you might write:
def max_config(A, B, C, D, E):
best = None
for S in A, B, C, D, E, set([None]):
for x in S:
best = max(best, sum(score(x, X) for X in A, B, C, D, E)))
return best
Assuming a set membership test is O(1), this has complexity O(N), where N is the total size of the given sets.

How is this expression evaluated

Can anyone say how ruby evaluates this:
a = 1
b = 2
a, b = b, a + b
a will be 2 and b will be 3, not 4 as you might expect
It seems that instead of working from left to right it does both sides in parallel somehow?
It is expressed as :-
a = 1
b = 2
a, b = b, (a + b)
a # => 2
b # => 3
This is called parallel assignment. Here all RHS expressions will be evaluated first (left to right). After that assignment will be happened from left to right.
It means, the calculation as follows :
a, b = b, a + b
a, b = 2, (2 + 1)
a, b = 2, 3 # now the real assignment will be happened here.
This is called parallel association, and, like name suggests, it works like all the assignments are done in parallel. You can for example write:
a = 1
b = 2
a, b = b, a
a #=> 2
b #=> 1
a = 1
b = 2
a, b = b, a + b
a
#=> 2
b
#=> 3
Here first rvalue is assigned to first lvalue and the result of second rexp is assigned to second lvalue. These assignments are parallel in nature not sequential.
a, b = b, a is a swap operation using parallel assignments. This makes me think Ruby might be using temporary variables to perform parallel assignments. I invite for corrections here.

Resources