Julia sparse matrix - matrix

I have vector y_vec, How to convert the vector to a matrix of form Y_matrix
y_vec = [0; 1; 1; 2; 3; 4]
Y_matrix = [1 0 0 0 0
0 1 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1]
So far, I've tried using a for loop.
Y_mat = full(spzeros(length(y_vec), length(unique(y_vec))))
for (i,j) in enumerate(1:length(y_vec))
Y_mat[i, y_vec[j]+1] = 1
end
But, there seems to be a problem when y_vec is not continuous, say y_vec = [0; 1; 1; 2; 3; 4; 8], using for loop fails !!! How to get around this issue.
Is there a way to solve the above problem using sparse matrix in Julia.

you can use sparse matrix constructor sparse(I,J,V):
y_vec = [0; 1; 1; 2; 3; 4; 8]
I = collect(1:length(y_vec))
J = y_vec+1
V = ones(length(y_vec))
S = sparse(I,J,V)
full(S)
julia> full(S)
7x9 Array{Float64,2}:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

Related

Python3 : unpack requires a bytes object of length 117

I am trying to run script for astronomical algorithm VSOP2013.
but when I running this script, it displays error in line 178
how to solve it?
What's wrong with unpack function?
FYI, the original script is python2, i am using python3
a = self.fmt.unpack(terms.encode())
error: unpack requires a bytes object of length 117
This is my full python3 script, adapted from the original python2 version from http://domenicomustara.blogspot.co.id
# -*- coding: utf-8 -*-
import gmpy2 as gmp
import struct
import ctypes
gmp.get_context().precision=200
def cal2jul(year, month, day, hour=0, minute =0, second=0):
month2 = month
year2 = year
if month2 <= 2:
year2 -= 1
month2 += 12
if (year*10000 + month*100 + day) > 15821015:
a = int(year2/100)
b = 2 - a + int(a/4)
else:
a = 0
b = 0
if year < 0:
c = int((365.25 * year2)-0.75)
else:
c = int(365.25 * year2)
d = int(30.6001 *(month2 + 1))
return b + c + d + day + hour / 24.0 + minute / 1440.0 + second / 86400.0 + 1720994.5
class VSOP2013():
def __init__(self, t, planet, precision=1e-7):
# calculate millennia from J2000
self.JD = t
self.t = gmp.div((t - cal2jul(2000,1,1,12)), 365250.0)
# predefine powers of self.t
self.power = []; self.power.append(gmp.mpfr(1.0)); self.power.append(self.t)
for i in range(2,21):
t = self.power[-1]
self.power.append(gmp.mul(self.t,t))
# choose planet file in a dict
self.planet = planet
self.planets = {'Mercury':'VSOP2013p1.dat',
'Venus' :'VSOP2013p2.dat',
'EMB' :'VSOP2013p3.dat',
'Mars' :'VSOP2013p4.dat',
'Jupiter':'VSOP2013p5.dat',
'Saturn' :'VSOP2013p6.dat',
'Uranus' :'VSOP2013p7.dat',
'Neptune':'VSOP2013p8.dat',
'Pluto' :'VSOP2013p9.dat'}
# VSOP2013 routines precision
self.precision = precision
# lambda coefficients
# l(1,13) : linear part of the mean longitudes of the planets (radian).
# l(14): argument derived from TOP2013 and used for Pluto (radian).
# l(15,17) : linear part of Delaunay lunar arguments D, F, l (radian).
self.l = (
(gmp.mpfr(4.402608631669), gmp.mpfr(26087.90314068555)),
(gmp.mpfr(3.176134461576), gmp.mpfr(10213.28554743445)),
(gmp.mpfr(1.753470369433), gmp.mpfr(6283.075850353215)),
(gmp.mpfr(6.203500014141), gmp.mpfr(3340.612434145457)),
(gmp.mpfr(4.091360003050), gmp.mpfr(1731.170452721855)),
(gmp.mpfr(1.713740719173), gmp.mpfr(1704.450855027201)),
(gmp.mpfr(5.598641292287), gmp.mpfr(1428.948917844273)),
(gmp.mpfr(2.805136360408), gmp.mpfr(1364.756513629990)),
(gmp.mpfr(2.326989734620), gmp.mpfr(1361.923207632842)),
(gmp.mpfr(0.599546107035), gmp.mpfr(529.6909615623250)),
(gmp.mpfr(0.874018510107), gmp.mpfr(213.2990861084880)),
(gmp.mpfr(5.481225395663), gmp.mpfr(74.78165903077800)),
(gmp.mpfr(5.311897933164), gmp.mpfr(38.13297222612500)),
(gmp.mpfr(0.000000000000), gmp.mpfr(0.3595362285049309)),
(gmp.mpfr(5.198466400630), gmp.mpfr(77713.7714481804)),
(gmp.mpfr(1.627905136020), gmp.mpfr(84334.6615717837)),
(gmp.mpfr(2.355555638750), gmp.mpfr(83286.9142477147)))
# planetary frequencies in longitude
self.freqpla = {'Mercury' : gmp.mpfr(0.2608790314068555e5),
'Venus' : gmp.mpfr(0.1021328554743445e5),
'EMB' : gmp.mpfr(0.6283075850353215e4),
'Mars' : gmp.mpfr(0.3340612434145457e4),
'Jupiter' : gmp.mpfr(0.5296909615623250e3),
'Saturn' : gmp.mpfr(0.2132990861084880e3),
'Uranus' : gmp.mpfr(0.7478165903077800e2),
'Neptune' : gmp.mpfr(0.3813297222612500e2),
'Pluto' : gmp.mpfr(0.2533566020437000e2)}
# target variables
self.ax = gmp.mpfr(0.0) # major semiaxis
self.ml = gmp.mpfr(0.0) # mean longitude
self.kp = gmp.mpfr(0.0) # e*cos(perielium longitude)
self.hp = gmp.mpfr(0.0) # e*sin(perielium longitude)
self.qa = gmp.mpfr(0.0) # sin(inclination/2)*cos(ascending node longitude)
self.pa = gmp.mpfr(0.0) # sin(inclination/2)*cos(ascending node longitude)
self.tg_var = {'A':self.ax, 'L':self.ml, 'K':self.kp,
'H':self.hp, 'Q':self.qa, 'P':self.pa }
# eps = (23.d0+26.d0/60.d0+21.41136d0/3600.d0)*dgrad
self.eps = gmp.mpfr((23.0+26.0/60.0+21.411360/3600.0)*gmp.const_pi()/180.0)
self.phi = gmp.mpfr(-0.051880 * gmp.const_pi() / 180.0 / 3600.0)
self.ceps = gmp.cos(self.eps)
self.seps = gmp.sin(self.eps)
self.cphi = gmp.cos(self.phi)
self.sphi = gmp.sin(self.phi)
# rotation of ecliptic -> equatorial rect coords
self.rot = [[self.cphi, -self.sphi*self.ceps, self.sphi*self.seps],
[self.sphi, self.cphi*self.ceps, -self.cphi*self.seps],
[0.0, self.seps, self.ceps ]]
self.fmt = struct.Struct("""6s 3s 3s 3s 3s x 3s 3s 3s 3s 3s x 4s 4s 4s 4s x
6s x 3s 3s 3s 20s x 3s 20s x 3s x""")
self.gmp_ = {
'Mercury' : gmp.mpfr(4.9125474514508118699e-11),
'Venus' : gmp.mpfr(7.2434524861627027000e-10),
'EMB' : gmp.mpfr(8.9970116036316091182e-10),
'Mars' : gmp.mpfr(9.5495351057792580598e-11),
'Jupiter' : gmp.mpfr(2.8253458420837780000e-07),
'Saturn' : gmp.mpfr(8.4597151856806587398e-08),
'Uranus' : gmp.mpfr(1.2920249167819693900e-08),
'Neptune' : gmp.mpfr(1.5243589007842762800e-08),
'Pluto' : gmp.mpfr(2.1886997654259696800e-12)}
self.gmsol = gmp.mpfr(2.9591220836841438269e-04)
self.rgm = gmp.sqrt(self.gmp_[self.planet]+self.gmsol)
# run calculus routine
self.calc()
def __str__(self):
vsop_out = "{:3.13} {:3.13} {:3.13} {:3.13} {:3.13} {:3.13}\n".format(
self.tg_var['A'],
self.tg_var['L'],
self.tg_var['K'],
self.tg_var['H'],
self.tg_var['Q'],
self.tg_var['P'])
vsop_out += "{:3.13} {:3.13} {:3.13} {:3.13} {:3.13} {:3.13}\n".format(
self.ecl[0],
self.ecl[1],
self.ecl[2],
self.ecl[3],
self.ecl[4],
self.ecl[5])
vsop_out += "{:3.13} {:3.13} {:3.13} {:3.13} {:3.13} {:3.13}\n".format(
self.equat[0],
self.equat[1],
self.equat[2],
self.equat[3],
self.equat[4],
self.equat[5])
return vsop_out
def calc(self):
with open(self.planets[self.planet]) as file_in:
terms = []
b = '*'
while b != '':
b = file_in.readline()
if b != '':
if b[:5] == ' VSOP':
header = b.split()
#print header[3], header[7], header[8], self.t**int(header[3])
no_terms = int(header[4])
for i in range(no_terms):
#6x,4i3,1x,5i3,1x,4i4,1x,i6,1x,3i3,2a24
terms = file_in.readline()
# print('terms',terms)
a = self.fmt.unpack(terms.encode())
S = gmp.mul(gmp.mpfr(a[18]),gmp.exp10(int(a[19])))
C = gmp.mul(gmp.mpfr(a[20]),gmp.exp10(int(a[21])))
if gmp.sqrt(S*S+C*C) < self.precision:
break
aa = 0.0; bb = 0.0;
for j in range(1,18):
aa += gmp.mul(gmp.mpfr(a[j]), self.l[j-1][0])
bb += gmp.mul(gmp.mpfr(a[j]), self.l[j-1][1])
arg = aa + bb * self.t
power = int(header[3])
comp = self.power[power] * (S * gmp.sin(arg) + C * gmp.cos(arg))
if header[7] == 'L' and power == 1 and int(a[0]) == 1:
pass
else:
self.tg_var[header[7]] += comp
self.tg_var['L'] = self.tg_var['L'] + self.t * self.freqpla[self.planet]
self.tg_var['L'] = self.tg_var['L'] % (2 * gmp.const_pi())
if self.tg_var['L'] < 0:
self.tg_var['L'] += 2*gmp.const_pi()
print ("Julian date {}".format(self.JD))
file_in.close()
##print self.tg_var
#### def ELLXYZ(self):
xa = self.tg_var['A']
xl = self.tg_var['L']
xk = self.tg_var['K']
xh = self.tg_var['H']
xq = self.tg_var['Q']
xp = self.tg_var['P']
# Computation
xfi = gmp.sqrt(1.0 -xk * xk - xh * xh)
xki = gmp.sqrt(1.0 -xq * xq - xp * xp)
u = 1.0/(1.0 + xfi)
z = complex(xk, xh)
ex = abs(z)
ex2 = ex * ex
ex3 = ex2 * ex
z1 = z.conjugate()
#
gl = xl % (2*gmp.const_pi())
gm = gl - gmp.atan2(xh, xk)
e = gl + (ex - 0.1250 * ex3) * gmp.sin(gm)
e += 0.50 * ex2 * gmp.sin(2.0 * gm)
e += 0.3750 * ex3 * gmp.sin(3.0 * gm)
#
while True:
z2 = complex(0.0, e)
zteta = gmp.exp(z2)
z3 = z1 * zteta
dl = gl - e + z3.imag
rsa = 1.0 - z3.real
e = e + dl / rsa
if abs(dl) < 1e-15:
break
#
z1 = u * z * z3.imag
z2 = gmp.mpc(z1.imag, -z1.real)
zto = (-z+zteta+z2)/rsa
xcw = zto.real
xsw = zto.imag
xm = xp * xcw - xq * xsw
xr = xa * rsa
#
self.ecl = []; self.equ = {}
self.ecl.append(xr * (xcw -2.0 *xp * xm))
self.ecl.append(xr * (xsw +2.0 *xq * xm))
self.ecl.append(-2.0 * xr * xki * xm)
#
xms = xa *(xh + xsw) / xfi
xmc = xa *(xk + xcw) / xfi
xn = self.rgm / xa ** (1.50)
#
self.ecl.append( xn *((2.0 * xp * xp - 1.0) * xms + 2.0 * xp * xq * xmc))
self.ecl.append( xn *((1.0 -2.0 * xq * xq) * xmc -2.0 * xp * xq * xms))
self.ecl.append( 2.0 * xn * xki * (xp * xms + xq * xmc))
# Equatorial rectangular coordinates and velocity
#
#
# --- Computation ------------------------------------------------------
#
self.equat = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
for i in range(3):
for j in range(3):
self.equat[i] = self.equat[i] + self.rot[i][j] * self.ecl[j]
self.equat[i+3] = self.equat[i+3] + self.rot[i][j] * self.ecl[j+3]
if __name__ == '__main__':
for planet in ('Mercury', 'Venus', 'EMB', 'Mars', 'Jupiter',
'Saturn', 'Uranus', 'Neptune', 'Pluto'):
print ("PLANETARY EPHEMERIS VSOP2013 "+ planet + "(TDB)\n"+"""
1/ Elliptic Elements: a (au), lambda (radian), k, h, q, p - Dynamical Frame J2000
2/ Ecliptic Heliocentric Coordinates: X,Y,Z (au) X',Y',Z' (au/d) - Dynamical Frame J2000
3/ Equatorial Heliocentric Coordinates: X,Y,Z (au) X',Y',Z' (au/d) - ICRS Frame J2000
""")
init_date = cal2jul(1890,6,26,12)
set_date = init_date
while set_date < init_date + 41000:
v = VSOP2013(set_date, planet)
print (v)
set_date += 4000
Respond comment from Mr.barrycarter,
this is a bit of the result when i print(terms.encode()):
b' 2 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0.6684459764580090 -07 0.3603178002233933 -06\n'
b' 3 0 2 -4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2383757728520679 -07 0.9861749707454420 -07\n'
b' 4 0 4 -6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.2193692495097233 -07 -0.8959173003201546 -07\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 0.1017891898227051 -03\n'
b' 2 0 3 -5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.4236543085970792 -07 -0.8775084424897674 -08\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 0.4702795245810685 -04\n'
b' 2 0 3 -5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.5710471800210820 -09 -0.1800837750117577 -08\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 -0.5421827377115325 -06\n'
b' 2 0 3 -5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.7074507338012408 -10 0.1742474656298139 -10\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 -0.2508633795522544 -07\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 0.4575014479216901 -09\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 0.5208591612817609 -11\n'
b' 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0000000000000000 +00 -0.1737141639583644 -12'

For loop for computing two vectors in R

Suppose i have a genotype dataset: geno
FID rs1 rs2 rs3
1 1 0 2
2 1 1 1
3 0 1 1
4 0 1 0
5 0 0 2
Another dataset is : coed
rs1 rs2 rs3
0.6 0.2 0.3
Do the following code:
geno$rs1 <- geno$rs1 * coed$rs1
geno$rs2 <- geno$rs2 * coed$rs2
geno$rs3 <- geno$rs3 * coed$rs3
sum3 <- rowSums(geno[,c(2:4)])
c <- cbind(geno,sum3)
I will get the output as i want
FID rs1 rs2 rs3 sum3
1 0.6 0 0.6 1.2
2 0.6 0.2 0.3 1.1
3 0 0.2 0.3 0.5
4 0 0.2 0 0.2
5 0 0 0.6 0.6
But i have thousands of SNPs, which i tried to build the below for loop
snp <- names(geno)[2:4]
geno.new <- numeric(0)
for (i in snp){
geno.new[i] = geno1[i] * coed[i]
}
The results is not what i would expected
$rs1
[1] 0.6 0.6 0.0 0.0 0.0
$rs2
[1] 0.0 0.2 0.2 0.2 0.0
$rs3
[1] 0.6 0.3 0.3 0.0 0.6
Could any one help me to improve that?
Thanks
I did find the solution, see the code below:
## read datasets
geno <- read.table("Genotype.csv",header=T,sep=",")
dim(geno)
coed <- read.table("beta.csv",header=T,sep=",")
## define the snp name
snp <- names(geno)[2:4]
## building for loop
for (i in snp){
geno[i] <- geno[i] * coed[i]
}
## caculate the sums
sum <- rowSums(geno[,c(2:4)])
## combind the results
all <- cbind(geno,sum)

Read numbers from stdin into a Data.Vector.Unboxed.Vector Int64

Given is a text file (for piping) with many numbers divided by a space, like so:
234 456 345 ...
What is the best way to read them all into a Data.Vector.Unboxed.Vector Int64? My current code looks like this:
import Control.Applicative
import Control.Arrow
import Data.Int
import Data.Maybe
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed as V
main :: IO ()
main = do
v <- readInts <$> B.getContents
print $ V.maximum v
-- splitted for profiling
readInts :: B.ByteString -> V.Vector Int64
readInts = a >>> b >>> c >>> d
a = B.split ' '
b = mapMaybe (B.readInt >>> liftA fst)
c = map fromIntegral
d = V.fromList
Here is the profiler output
Thu Sep 18 16:08 2014 Time and Allocation Profiling Report (Final)
FastReadInts +RTS -p -K800M -RTS
total time = 0.51 secs (505 ticks # 1000 us, 1 processor)
total alloc = 1,295,988,256 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
d Main 74.3 5.2
b Main 9.9 35.6
a Main 6.3 40.0
main Main 4.8 0.0
c Main 3.2 19.3
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 60 0 0.4 0.0 100.0 100.0
main Main 121 0 4.8 0.0 98.2 100.0
readInts Main 123 0 0.0 0.0 93.5 100.0
a Main 131 0 6.1 40.0 6.1 40.0
b Main 129 0 9.9 35.6 9.9 35.6
c Main 127 0 3.2 19.3 3.2 19.3
d Main 125 0 74.3 5.2 74.3 5.2
CAF Main 119 0 0.0 0.0 0.2 0.0
a Main 130 1 0.2 0.0 0.2 0.0
b Main 128 1 0.0 0.0 0.0 0.0
c Main 126 1 0.0 0.0 0.0 0.0
d Main 124 1 0.0 0.0 0.0 0.0
readInts Main 122 1 0.0 0.0 0.0 0.0
main Main 120 1 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 103 0 0.6 0.0 0.6 0.0
CAF GHC.IO.Encoding 96 0 0.2 0.0 0.2 0.0
CAF GHC.IO.Handle.Internals 93 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 83 0 0.2 0.0 0.2 0.0
CAF GHC.IO.Encoding.Iconv 81 0 0.2 0.0 0.2 0.0
The programm is compiled and run this way:
ghc -O2 -prof -auto-all -rtsopts FastReadInts.hs
./FastReadInts +RTS -p -K800M < many_numbers.txt
many_numbers.txt is about 14MB large.
How can this bottleneck, i.e. V.fromList, be removed?
It is hard to answer questions like this without some expected level of performance or point of comparison. By simply omitting the profiling your code runs in 100ms over an ASCii file of 21MB of random 64-bit numbers, this seems reasonable to me.
$ time ./so < randoms.txt
9223350746261547498
real 0m0.109s
user 0m0.094s
sys 0m0.013s
And the generation of the test data:
import System.Random
main = do
g <- newStdGen
let rs = take (2^20) $ randomRs (0,2^64) g :: [Integer]
writeFile "randoms.txt" $ unwords (map show rs)
EDIT:
As requested:
import Data.Vector.Unboxed.Mutable as M
...
listToVector :: [Int64] -> V.Vector Int64
listToVector ls = unsafePerformIO $ do
m <- M.unsafeNew (2^20)
zipWithM_ (M.unsafeWrite m) [0..(2^20)-1] ls
V.unsafeFreeze m
Just wanted to note that pre-allocating mutable vector does not impact performance too much. In most cases run time will be dominated by reading file.
I have benchmarked both versions on 2^23 numbers and it seems that pre-allocated mutable array is even a bit slower.
benchmarking V.fromList
time 49.51 ms (47.65 ms .. 51.07 ms)
0.998 R² (0.995 R² .. 1.000 R²)
mean 48.24 ms (47.82 ms .. 49.01 ms)
std dev 971.5 μs (329.1 μs .. 1.438 ms)
benchmarking listToVector
time 109.9 ms (106.2 ms .. 119.9 ms)
0.993 R² (0.975 R² .. 1.000 R²)
mean 109.3 ms (107.6 ms .. 113.8 ms)
std dev 4.041 ms (1.149 ms .. 6.129 ms)
And here is the code of the benchmark:
import Control.Applicative
import Control.Monad (zipWithM_)
import System.IO.Unsafe
import Data.Int
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M
import Criterion.Main
main :: IO ()
main = do
let readInt x = let Just (i,_) = B.readInt x in fromIntegral i
nums <- map readInt . B.words <$> B.readFile "randoms.txt"
defaultMain
[bench "V.fromList" $ whnf (V.maximum . V.fromList) nums
,bench "listToVector" $ whnf (V.maximum . listToVector) nums
]
listToVector :: [Int64] -> V.Vector Int64
listToVector ls = unsafePerformIO $ do
m <- M.unsafeNew (2^23)
zipWithM_ (M.unsafeWrite m) [0..(2^23)-1] ls
V.unsafeFreeze m

Haskell: unnecessary reevaluations of constant expressions

I am going to demonstrate the problem using the following example program
{-# LANGUAGE BangPatterns #-}
data Point = Point !Double !Double
fmod :: Double -> Double -> Double
fmod a b | a < 0 = b - fmod (abs a) b
| otherwise = if a < b then a
else let q = a / b
in b * (q - fromIntegral (floor q :: Int))
standardMap :: Double -> Point -> Point
standardMap k (Point q p) =
Point (fmod (q + p) (2 * pi)) (fmod (p + k * sin(q)) (2 * pi))
iterate' gen !p = p : (iterate' gen $ gen p)
main = putStrLn
. show
. (\(Point a b) -> a + b)
. head . drop 100000000
. iterate' (standardMap k) $ (Point 0.15 0.25)
where k = (cos (pi/3)) - (sin (pi/3))
Here standardMap k is the parametrized function and k=(cos (pi/3))-(sin (pi/3)) is a parameter. If i compile this program with ghc -O3 -fllvm the execution time on my machine is approximately 42s, however, if I write k in the form 0.5 - (sin (pi/3)) the execution time equals 21s and if I write k = 0.5 - 0.5 * (sqrt 3) it will take only 12s.
The conclusion is that k is reevaluated on each call of standardMap k.
Why this is not optimized?
P.S. compiler ghc 7.6.3 on archlinux
EDIT
For those who are concerned with the weird properties of standardMap here is a simpler and more intuitive example, which exhibits the same problem
{-# LANGUAGE BangPatterns #-}
data Point = Point !Double !Double
rotate :: Double -> Point -> Point
rotate k (Point q p) =
Point ((cos k) * q - (sin k) * p) ((sin k) * q + (cos k) * p)
iterate' gen !p = p : (iterate' gen $ gen p)
main = putStrLn
. show
. (\(Point a b) -> a + b)
. head . drop 100000000
. iterate' (rotate k) $ (Point 0.15 0.25)
where --k = (cos (pi/3)) - (sin (pi/3))
k = 0.5 - 0.5 * (sqrt 3)
EDIT
Before I asked the question I have tried to make k strict, the same way Don suggested, but with ghc -O3 I didn't see a difference. The solution with strictness works if the program is compiled with ghc -O2. I missed that because I didn't try all possible combinations of flags with the all possible versions of the program.
So what is the difference between -O3 and -O2 that affects such cases?
Should I prefer -O2 in general?
EDIT
As observed by Mike Hartl and others, if rotate k is changed into rotate $ k or standardMap k into standardMap $ k, the performance is improved, though it is not the best possible (Don's solution). Why?
As always, check the core.
With ghc -O2, k is inlined into the loop body, which is floated out as a top level function:
Main.main7 :: Main.Point -> Main.Point
Main.main7 =
\ (ds_dAa :: Main.Point) ->
case ds_dAa of _ { Main.Point q_alG p_alH ->
case q_alG of _ { GHC.Types.D# x_s1bt ->
case p_alH of _ { GHC.Types.D# y_s1bw ->
case Main.$wfmod (GHC.Prim.+## x_s1bt y_s1bw) 6.283185307179586
of ww_s1bi { __DEFAULT ->
case Main.$wfmod
(GHC.Prim.+##
y_s1bw
(GHC.Prim.*##
(GHC.Prim.-##
(GHC.Prim.cosDouble# 1.0471975511965976)
(GHC.Prim.sinDouble# 1.0471975511965976))
(GHC.Prim.sinDouble# x_s1bt)))
6.283185307179586
of ww1_X1bZ { __DEFAULT ->
Main.Point (GHC.Types.D# ww_s1bi) (GHC.Types.D# ww1_X1bZ)
Indicating that the sin and cos calls aren't evaluated at compile time.
The result is that a bit more math is going to occur:
$ time ./A
3.1430515093368085
real 0m15.590s
If you make it strict, it is at least not recalculated each time:
main = putStrLn
. show
. (\(Point a b) -> a + b)
. head . drop 100000000
. iterate' (standardMap k) $ (Point 0.15 0.25)
where
k :: Double
!k = (cos (pi/3)) - (sin (pi/3))
Resulting in:
ipv_sEq =
GHC.Prim.-##
(GHC.Prim.cosDouble# 1.0471975511965976)
(GHC.Prim.sinDouble# 1.0471975511965976) } in
And a running time of:
$ time ./A
6.283185307179588
real 0m7.859s
Which I think is good enough for now. I'd also add unpack pragmas to the Point type.
If you want to reason about numeric performance under different code arrangements, you must inspect the Core.
Using your revised example. It suffers the same issue. k is inlined rotate. GHC thinks it is really cheap, when in this benchmark it is more expensive.
Naively, ghc-7.2.3 -O2
$ time ./A
0.1470480616244365
real 0m22.897s
And k is evaluated each time rotate is called.
Make k strict: one way to force it to be not shared.
$ time ./A
0.14704806100839019
real 0m2.360s
Using UNPACK pragmas on the Point constructor:
$ time ./A
0.14704806100839019
real 0m1.860s
I don't think it is repeated evaluation.
First, I switched to "do" notation and used a "let" on the definition of "k" which I figured should help. No - still slow.
Then I added a trace call - just being evaluated once. Even checked that the fast variant was in fact producing a Double.
Then I printed out both variations. There is a small difference in the starting values.
Tweaking the value of the "slow" variant makes it the same speed. I've no idea what your algorithm is for - would it be very sensitive to starting values?
import Debug.Trace (trace)
...
main = do
-- is -0.3660254037844386
let k0 = (0.5 - 0.5 * (sqrt 3))::Double
-- was -0.3660254037844385
let k1 = (cos (pi/3)) - (trace "x" (sin (pi/3))) + 0.0000000000000001;
putStrLn (show k0)
putStrLn (show k1)
putStrLn
. show
. (\(Point a b) -> a + b)
. head . drop 100000000
. iterate' (standardMap k1) $ (Point 0.15 0.25)
EDIT: this is the version with numeric literals. It's displaying runtimes of 23sec vs 7sec for me. I compiled two separate versions of the code to make sure I wasn't doing something stupid like not recompiling.
main = do
-- -0.3660254037844386
-- -0.3660254037844385
let k2 = -0.3660254037844385
putStrLn
. show
. (\(Point a b) -> a + b)
. head . drop 100000000
. iterate' (standardMap k2) $ (Point 0.15 0.25)
EDIT2: I don't know how to get the opcodes from ghc, but comparing the hexdumps for the two .o files shows they differ by a single byte - presumably the literal. So it can't be the runtime.
EDIT3: Tried turning profiling on, and that's just puzzled me even more. unless I'm missing something the only difference is a small discrepancy in the number of calls to fmod (fmod.q to be precise).
The "5" profile is for the constant ending "5", same with "6".
Fri Sep 6 12:37 2013 Time and Allocation Profiling Report (Final)
constant-timings-5 +RTS -p -RTS
total time = 38.34 secs (38343 ticks # 1000 us, 1 processor)
total alloc = 12,000,105,184 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
standardMap Main 71.0 0.0
iterate' Main 21.2 93.3
fmod Main 6.3 6.7
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 50 0 0.0 0.0 100.0 100.0
main Main 101 0 0.0 0.0 0.0 0.0
CAF:main1 Main 98 0 0.0 0.0 0.0 0.0
main Main 100 1 0.0 0.0 0.0 0.0
CAF:main2 Main 97 0 0.0 0.0 1.0 0.0
main Main 102 0 1.0 0.0 1.0 0.0
main.\ Main 110 1 0.0 0.0 0.0 0.0
CAF:main3 Main 96 0 0.0 0.0 99.0 100.0
main Main 103 0 0.0 0.0 99.0 100.0
iterate' Main 104 100000001 21.2 93.3 99.0 100.0
standardMap Main 105 100000000 71.0 0.0 77.9 6.7
fmod Main 106 200000001 6.3 6.7 6.9 6.7
fmod.q Main 109 49999750 0.6 0.0 0.6 0.0
CAF:main_k Main 95 0 0.0 0.0 0.0 0.0
main Main 107 0 0.0 0.0 0.0 0.0
main.k2 Main 108 1 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 93 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 90 0 0.0 0.0 0.0 0.0
CAF GHC.Float 89 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 66 0 0.0 0.0 0.0 0.0
Fri Sep 6 12:38 2013 Time and Allocation Profiling Report (Final)
constant-timings-6 +RTS -p -RTS
total time = 22.17 secs (22167 ticks # 1000 us, 1 processor)
total alloc = 11,999,947,752 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
standardMap Main 48.4 0.0
iterate' Main 38.2 93.3
fmod Main 10.9 6.7
main Main 1.4 0.0
fmod.q Main 1.0 0.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 50 0 0.0 0.0 100.0 100.0
main Main 101 0 0.0 0.0 0.0 0.0
CAF:main1 Main 98 0 0.0 0.0 0.0 0.0
main Main 100 1 0.0 0.0 0.0 0.0
CAF:main2 Main 97 0 0.0 0.0 1.4 0.0
main Main 102 0 1.4 0.0 1.4 0.0
main.\ Main 110 1 0.0 0.0 0.0 0.0
CAF:main3 Main 96 0 0.0 0.0 98.6 100.0
main Main 103 0 0.0 0.0 98.6 100.0
iterate' Main 104 100000001 38.2 93.3 98.6 100.0
standardMap Main 105 100000000 48.4 0.0 60.4 6.7
fmod Main 106 200000001 10.9 6.7 12.0 6.7
fmod.q Main 109 49989901 1.0 0.0 1.0 0.0
CAF:main_k Main 95 0 0.0 0.0 0.0 0.0
main Main 107 0 0.0 0.0 0.0 0.0
main.k2 Main 108 1 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 93 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 90 0 0.0 0.0 0.0 0.0
CAF GHC.Float 89 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 66 0 0.0 0.0 0.0 0.0
EDIT4: Link below is to the two opcode dumps (thanks to #Tom Ellis). Although I can't read them, they seem to have the same "shape". Presumably the long random-char strings are internal identifiers. I've just recompiled both with -O2 -fforce-recomp and the time differences are real.
https://gist.github.com/anonymous/6462797

Is performance of partial or curried functions well defined in Haskell?

In the following code:
ismaxl :: (Ord a) => [a] -> a -> Bool
ismaxl l x = x == maxel
where maxel = maximum l
main = do
let mylist = [1, 2, 3, 5]
let ismax = ismaxl mylist
--Is each call O(1)? Does each call remember maxel?
let c1 = ismax 1
let c2 = ismax 2
let c3 = ismax 3
let c5 = ismax 5
putStrLn (show [c1, c2, c3, c5])
Does the partial function ismax, compute the maxel? Speficially, can someone point to a rule about the complexity of partial functions in Haskell? MUST the compiler only call maximum once in the above example? Put another way, does a partial function keep the references of prior calls for internal where clauses?
I have some CPU-bound code that is not performing acceptably, and I'm looking for possible errors in my reasoning about the complexity.
As a demonstration of what you can learn from profiling your Haskell code, here's the result of some minor modifications to your code. First, I've replaced mylist with [0..10000000] to make sure it takes a while to compute the maximum.
Here's some lines from the profiling output, after running that version:
COST CENTRE MODULE %time %alloc
ismaxl Main 55.8 0.0
main Main 44.2 100.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 1 0 0.0 0.0 100.0 100.0
CAF:main_c5 Main 225 1 0.0 0.0 15.6 0.0
main Main 249 0 0.0 0.0 15.6 0.0
ismaxl Main 250 1 15.6 0.0 15.6 0.0
CAF:main_c3 Main 224 1 0.0 0.0 15.6 0.0
main Main 246 0 0.0 0.0 15.6 0.0
ismaxl Main 247 1 15.6 0.0 15.6 0.0
CAF:main_c2 Main 223 1 0.0 0.0 14.3 0.0
main Main 243 0 0.0 0.0 14.3 0.0
ismaxl Main 244 1 14.3 0.0 14.3 0.0
CAF:main_c1 Main 222 1 0.0 0.0 10.4 0.0
main Main 239 0 0.0 0.0 10.4 0.0
ismaxl Main 240 1 10.4 0.0 10.4 0.0
CAF:main8 Main 221 1 0.0 0.0 44.2 100.0
main Main 241 0 44.2 100.0 44.2 100.0
It's pretty obviously recomputing the maximum here.
Now, replacing ismaxl with this:
ismaxl :: (Ord a) => [a] -> a -> Bool
ismaxl l = let maxel = maximum l in (== maxel)
...and profiling again:
COST CENTRE MODULE %time %alloc
main Main 60.5 100.0
ismaxl Main 39.5 0.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 1 0 0.0 0.0 100.0 100.0
CAF:main_c5 Main 227 1 0.0 0.0 0.0 0.0
main Main 252 0 0.0 0.0 0.0 0.0
ismaxl Main 253 1 0.0 0.0 0.0 0.0
CAF:main_c3 Main 226 1 0.0 0.0 0.0 0.0
main Main 249 0 0.0 0.0 0.0 0.0
ismaxl Main 250 1 0.0 0.0 0.0 0.0
CAF:main_c2 Main 225 1 0.0 0.0 0.0 0.0
main Main 246 0 0.0 0.0 0.0 0.0
ismaxl Main 247 1 0.0 0.0 0.0 0.0
CAF:main_c1 Main 224 1 0.0 0.0 0.0 0.0
CAF:main_ismax Main 223 1 0.0 0.0 39.5 0.0
main Main 242 0 0.0 0.0 39.5 0.0
ismaxl Main 243 2 39.5 0.0 39.5 0.0
CAF:main8 Main 222 1 0.0 0.0 60.5 100.0
main Main 244 0 60.5 100.0 60.5 100.0
...this time it's spending most of its time in one single call to ismaxl, the others being too fast to even notice, so it must be computing the maximum only once here.
Here's a modified version of your code that will allow you to see whether or not maxel is reused:
import Debug.Trace
ismaxl :: (Ord a) => [a] -> a -> Bool
ismaxl l x = x == maxel
where maxel = trace "Hello" $ maximum l
main = do
let mylist = [1, 2, 3, 5]
let ismax = ismaxl mylist
--Is each call O(1)? Does each call remember maxel?
let c1 = ismax 1
let c2 = ismax 2
let c3 = ismax 3
let c5 = ismax 5
putStrLn (show [c1, c2, c3, c5])
You'll see that maxel is not 'remembered' between applications.
In general, you shouldn't expect Haskell to start doing reductions until all of the arguments have been supplied to a function.
On the other hand, if you have aggressive optimisation turned on then it's hard to predict what a particular compiler would actually do. But you probably ought not to rely on any part of the compiler that's hard to predict when you can easily rewrite the code to make what you want explicit.
Building off other good answers, GHC hasn't been eager to perform this sort of optimization in my experience. If I can't easily make something point-free, I've often resorted to writing with a mix of bound vars on the LHS and a lambda:
ismaxl :: (Ord a) => [a] -> a -> Bool
ismaxl l = \x -> x == maxel
where maxel = maximum l
I don't particularly like this style, but it does ensure that maxel is shared between calls to a partially applied ismaxl.
I haven't been able to find any such requirement in the Haskell Report, and in fact GHC doesn't seem to perform this optimization by default.
I changed your main function to
main = do
let mylist = [1..99999]
let ismax = ismaxl mylist
let c1 = ismax 1
let c2 = ismax 2
let c3 = ismax 3
let c5 = ismax 5
putStrLn (show [c1, c2, c3, c5])
Simple profiling shows (on my old Pentium 4):
$ ghc a.hs
$ time ./a.out
[False,False,False,False]
real 0m0.313s
user 0m0.220s
sys 0m0.044s
But when I change the definition of c2, c3 and c5 to let c2 = 2 == 99999 etc. (leaving c1 as it is), I get
$ ghc a.hs
$ time ./a.out
[False,False,False,False]
real 0m0.113s
user 0m0.060s
sys 0m0.028s

Resources