what does set() means in below pyspark rdd code - set

what does set([x[1]])) means in below code or in general what does set do ? Thanks
result_rdd = joined_df. \
map(lambda x : ((x[1], str(x[3])), (float(x[8]), int(x[0])))). \
combineByKey(
lambda x : (x[0], set([x[1]])),
lambda x, y : (x[0] + y[0], x[1] | set([y[1]])),
lambda x, y : (x[0] + y[0], x[1] | y[1])). \
map(lambda x :(x[0][0], x[0][1], x[1][0], len(x[1][1])))

set is a data structure which holds non duplicate elements.
so,set([y[1]]) means y[1] data is put into list then it is converted into set,so that if elements in y are getting duplicated,it will not be inserted into set.

Related

pow(X,Y,Z) <=> Z = X^Y with add

Would it be possible to do "pow" with "add" predicate (or just X is Y + Z )?
I make this:
pow(0,1,1).
pow(_,0,1).
pow(X,Y,Z) :- Y1 is Y - 1, pow(X,Y1,Z1), Z is Z1 * X.
But I want also make it with " + " (just for practise) like 3^2 = 3 * 3 = 3 + 3 + 3
You can write the multiplication (mul/3) in terms of addition. Like:
pow(0,1,1).
pow(_,0,1).
pow(X,Y,Z) :-
Y > 1,
Y1 is Y - 1,
pow(X,Y1,Z1),
mul(Z1,X,Z). %% originally: Z is Z1 * X.
mul(0,_,0).
mul(I,A,R) :-
I > 0,
I1 is I-1,
mul(I1,A,R1),
R is R1 + A.
Usually a basic exercise is to write addition, multiplication, and power predictates with the Peano number representation. In that case addition is written with the successor functor.

Invariant induction over horn-clauses with Z3py

I am currently using Z3py to to deduce some invariants which are encoded as a conjunction of horn-clauses whilst also providing a template for the invariant. I'm starting with a simple example first if you see the code snippet below.
x = 0;
while(x < 5){
x += 1
}
assert(x == 5)
This translates into the horn clauses
x = 0 => Inv(x)
x < 5 /\ Inv(x) => Inv(x +1)
Not( x < 5) /\ Inv(x) => x = 5
The invariant here is x <= 5.
I have provided a template for the invariant of the form a*x + b <= c
so that all the solver has to do is guess a set of values for a,b and c that can reduce to x <= 5.
However when I encode it up I keep getting unsat. If try to assert Not (x==5) I get a=2 , b = 1/8 and c = 2 which makes little sense to me as a counterexample.
I provide my code below and would be grateful for any help on correcting my encoding.
x = Real('x')
x_2 = Real('x_2')
a = Real('a')
b = Real('b')
c = Real('c')
s = Solver()
s.add(ForAll([x],And(
Implies(x == 0 , a*x + b <= c),
Implies(And(x_2 == x + 1, x < 5, a*x + b <= c), a*x_2 + b <= c),
Implies(And(a*x + b <= c, Not(x < 5)), x==5)
)))
if (s.check() == sat):
print(s.model())
Edit: it gets stranger for me. If I remove the x_2 definition and just replace x_2 with (x + 1) in the second horn clause as well as delete the x_2 = x_2 + 1, I get unsat whether I write Not( x==5) or x==5 in the final horn clause.
There were two things preventing your original encoding from working:
1) It's not possible to satisfy x_2 == x + 1 for all x for a single value of x_2. Thus, if you're going to write x_2 == x + 1, both x and x_2 need to be universally quantified.
2) Somewhat surprisingly, this problem is satisfiable in the integers but not in the reals. You can see the problem with the clause x < 5 /\ Inv(x) => Inv(x + 1). If x is an integer, then this is satisfied by x <= 5. However, if x is allowed to be any real value, then you could have x == 4.5, which satisfies both x < 5 and x <= 5, but not x + 1 <= 5, so Inv(x) = (x <= 5) does not satisfy this problem in the reals.
Also, you might find it helpful to define Inv(x), it cleans up the code quite a bit. Here is the encoding of your problem with those changes:
from z3 import *
# Changing these from 'Int' to 'Real' changes the problem from sat to unsat.
x = Int('x')
x_2 = Int('x_2')
a = Int('a')
b = Int('b')
c = Int('c')
def Inv(x):
return a*x + b <= c
s = Solver()
# I think this is the simplest encoding for your problem.
clause1 = Implies(x == 0 , Inv(x))
clause2 = Implies(And(x < 5, Inv(x)), Inv(x + 1))
clause3 = Implies(And(Inv(x), Not(x < 5)), x == 5)
s.add(ForAll([x], And(clause1, clause2, clause3)))
# Alternatively, if clause2 is specified with x_2, then x_2 needs to be
# universally quantified. Note the ForAll([x, x_2]...
#clause2 = Implies(And(x_2 == x + 1, x < 5, Inv(x)), Inv(x_2))
#s.add(ForAll([x, x_2], And(clause1, clause2, clause3)))
# Print result all the time, to avoid confusing unknown with unsat.
result = s.check()
print result
if (result == sat):
print(s.model())
One more thing: it's a bit strange to me to write a*x + b <= c as a template, because this is the same as a*x <= d for some integer d.

Drawing concentric tiling circles with even diameter

I need to draw circles using pixels with these constraints:
the total of pixels across the diameter is an even number,
there is no empty pixels between two circles of radius R and R+1 (R is an integer).
The midpoint algorithm can’t be used but I found out that Eric Andres wrote the exact thing I want. The algorithm can be found in this article under the name of “half integer centered circle”. For those who don’t have access to it, I put the interesting part is at the end of the question.
I encounter difficulties to implement the algorithm. I copied the algorithm in Processing using the Python syntax (for the ease of visualisation):
def half_integer_centered_circle(xc, yc, R):
x = 1
y = R
d = R
while y >= x:
point(xc + x, yc + y)
point(xc + x, yc - y + 1)
point(xc - x + 1, yc + y)
point(xc - x + 1, yc - y + 1)
point(xc + y, yc + x)
point(xc + y, yc - x + 1)
point(xc - y + 1, yc + x)
point(xc - y + 1, yc - x + 1)
if d > x:
d = d - x
x = x + 1
elif d < R + 1 - y:
d = d + y - 1
y = y - 1
else:
d = d + y - x - 1
x = x + 1
y = y - 1
The point() function just plot a pixel at the given coordinates. Please also note that in the article, x is initialised as S, which is strange because there is no S elsewhere (it’s not explained at all), however it is said that the circle begins at (x, y) = (1, R), so I wrote x = 1.
There is the result I get for a radii between 1 pixel and 20 pixels:
As you can see, there are holes between circles and the circle with R = 3 is different from the given example (see below). Also, the circles are not really round compared to what you get with the midpoint algorithm.
How can I get the correct result?
Original Eric Andres’ algorithm:
I don't understand the way in which the algorithm has been presented in that paper. As I read it the else if clause associated with case (b) doesn't have a preceding if. I get the same results as you when transcribing it as written
Looking at the text, rather than the pseudocode, the article seems to be suggesting an algorithm of the following form:
x = 1
y = R
while x is less than or equal to y:
draw(x, y)
# ...
if the pixel to the right has radius between R - 1/2 and R + 1/2:
move one pixel to the right
if the pixel below has radius between R - 1/2 and R + 1/2:
move one pixel down
else:
move one pixel diagonally down and right
Which seems plausible. In python:
#!/usr/bin/python3
import numpy as np
import matplotlib.pyplot as pp
fg = pp.figure()
ax = fg.add_subplot(111)
def point(x, y, c):
xx = [x - 1/2, x + 1/2, x + 1/2, x - 1/2, x - 1/2 ]
yy = [y - 1/2, y - 1/2, y + 1/2, y + 1/2, y - 1/2 ]
ax.plot(xx, yy, 'k-')
ax.fill_between(xx, yy, color=c, linewidth=0)
def half_integer_centered_circle(R, c):
x = 1
y = R
while y >= x:
point(x, y, c)
point(x, - y + 1, c)
point(- x + 1, y, c)
point(- x + 1, - y + 1, c)
point(y, x, c)
point(y, - x + 1, c)
point(- y + 1, x, c)
point(- y + 1, - x + 1, c)
def test(x, y):
rSqr = x**2 + y**2
return (R - 1/2)**2 < rSqr and rSqr < (R + 1/2)**2
if test(x + 1, y):
x += 1
elif test(x, y - 1):
y -= 1
else:
x += 1
y -= 1
for i in range(1, 5):
half_integer_centered_circle(2*i - 1, 'r')
half_integer_centered_circle(2*i, 'b')
pp.axis('equal')
pp.show()
This seems to work as intended. Note that I removed the circle centre for simplicity. It should be easy enough to add in again.
Edit Realised I could match the radius 3 image if I tweaked the logic a bit.
I have been looking into this matter and observed three issues in the original paper:
The arithmetic circle copied here (Figure 10.a in the paper) is not consistent with the formal definition of the "half integer centered circle". In one case the distance to the center must be between R-1/2 and R+1/2 and in the other between integer values. The consequence is that this specific algorithm, if properly implemented, can never generate the circle of Figure 10.a.
There is a mistake in one of the inequalities of the algorithm pseudo code: the test for case (b) should be d <= (R + 1 - y) instead of d < (R + 1 - y).
All those pixels that satisfy x==y have only 4-fold symmetry (not 8-fold) and are generated twice by the algorithm. Although producing duplicated pixels may not be a problem for a drawing routine, it is not acceptable for the application that I am interested in. However this can be easily fixed by adding a simple check of the x==y condition and skipping the four duplicated pixels.
The python code of the original question includes the inequality error mentioned above and an additional mistake due to missing parenthesis in one of the expressions that should read d = d + (y - x - 1).
The following implementation fixes all this and is compatible with python2 and python3 (no integer division issues in the point() function):
import numpy as np
import matplotlib.pyplot as pp
fg = pp.figure()
ax = fg.add_subplot(111)
def point(x, y, c):
xx = [x - 0.5, x + 0.5, x + 0.5, x - 0.5, x - 0.5 ]
yy = [y - 0.5, y - 0.5, y + 0.5, y + 0.5, y - 0.5 ]
ax.plot(xx, yy, 'k-')
ax.fill_between(xx, yy, color=c, linewidth=0)
def half_integer_centered_circle(R, c):
x = 1
y = R
d = R
while y >= x:
point(x, y, c)
point(x, - y + 1, c)
point(- x + 1, y, c)
point(- x + 1, - y + 1, c)
if y != x:
point(y, x, c)
point(y, - x + 1, c)
point(- y + 1, x, c)
point(- y + 1, - x + 1, c)
if d > x:
d = d - x
x = x + 1
elif d <= R + 1 - y:
d = d + y - 1
y = y - 1
else:
d = d + (y - x - 1)
x = x + 1
y = y - 1
for i in range(1, 5):
half_integer_centered_circle(2*i - 1, 'r')
half_integer_centered_circle(2*i, 'b')
pp.axis('equal')
pp.show()

how to calculate a quadratic equation that best fits a set of given data

I have a vector X of 20 real numbers and a vector Y of 20 real numbers.
I want to model them as
y = ax^2+bx + c
How to find the value of 'a' , 'b' and 'c'
and best fit quadratic equation.
Given Values
X = (x1,x2,...,x20)
Y = (y1,y2,...,y20)
i need a formula or procedure to find following values
a = ???
b = ???
c = ???
Thanks in advance.
Everything #Bartoss said is right, +1. I figured I just add a practical implementation here, without QR decomposition. You want to evaluate the values of a,b,c such that the distance between measured and fitted data is minimal. You can pick as measure
sum(ax^2+bx + c -y)^2)
where the sum is over the elements of vectors x,y.
Then, a minimum implies that the derivative of the quantity with respect to each of a,b,c is zero:
d (sum(ax^2+bx + c -y)^2) /da =0
d (sum(ax^2+bx + c -y)^2) /db =0
d (sum(ax^2+bx + c -y)^2) /dc =0
these equations are
2(sum(ax^2+bx + c -y)*x^2)=0
2(sum(ax^2+bx + c -y)*x) =0
2(sum(ax^2+bx + c -y)) =0
Dividing by 2, the above can be rewritten as
a*sum(x^4) +b*sum(x^3) + c*sum(x^2) =sum(y*x^2)
a*sum(x^3) +b*sum(x^2) + c*sum(x) =sum(y*x)
a*sum(x^2) +b*sum(x) + c*N =sum(y)
where N=20 in your case. A simple code in python showing how to do so follows.
from numpy import random, array
from scipy.linalg import solve
import matplotlib.pylab as plt
a, b, c = 6., 3., 4.
N = 20
x = random.rand((N))
y = a * x ** 2 + b * x + c
y += random.rand((20)) #add a bit of noise to make things more realistic
x4 = (x ** 4).sum()
x3 = (x ** 3).sum()
x2 = (x ** 2).sum()
M = array([[x4, x3, x2], [x3, x2, x.sum()], [x2, x.sum(), N]])
K = array([(y * x ** 2).sum(), (y * x).sum(), y.sum()])
A, B, C = solve(M, K)
print 'exact values ', a, b, c
print 'calculated values', A, B, C
fig, ax = plt.subplots()
ax.plot(x, y, 'b.', label='data')
ax.plot(x, A * x ** 2 + B * x + C, 'r.', label='estimate')
ax.legend()
plt.show()
A much faster way to implement solution is to use a nonlinear least squares algorithm. This will be faster to write, but not faster to run. Using the one provided by scipy,
from scipy.optimize import leastsq
def f(arg):
a,b,c=arg
return a*x**2+b*x+c-y
(A,B,C),_=leastsq(f,[1,1,1])#you must provide a first guess to start with in this case.
That is a linear least squares problem. I think the easiest method which gives accurate results is QR decomposition using Householder reflections. It is not something to be explained in a stackoverflow answer, but I hope you will find all that is needed with this links.
If you never heard about these before and don't know how it connects with you problem:
A = [[x1^2, x1, 1]; [x2^2, x2, 1]; ...]
Y = [y1; y2; ...]
Now you want to find v = [a; b; c] such that A*v is as close as possible to Y, which is exactly what least squares problem is all about.

Ruby - newlines and operators

Consider the following code:
x = 4
y = 5
z = (y + x)
puts z
As you'd expect, the output is 9. If you introduce a newline:
x = 4
y = 5
z = y
+ x
puts z
Then it outputs 5. This makes sense, because it's interpreted as two separate statements (z = y and +x).
However, I don't understand how it works when you have a newline within parentheses:
x = 4
y = 5
z = (y
+ x)
puts z
The output is 4. Why?
(Disclaimer: I'm not a Ruby programmer at all. This is just a wild guess.)
With parens, you get z being assigned the value of
y
+x
Which evaluates to the value of the last statement executed.
End the line with \ in order to continue the expression on the next line. This gives the proper output:
x = 4
y = 5
z = (y \
+ x)
puts z
outputs 9
I don't know why the result is unexpected without escaping the newline. I just learned never to do that.
Well you won't need the escaping character \ if your lines finishes with the operator
a = 4
b = 5
z = a +
b
puts z
# => 9

Resources