I heard a lot about amazing performance of programs written in Haskell, and wanted to make some tests. So, I wrote a 'library' for matrix operations just to compare it's performance with the same stuff written in pure C.
First of all I tested 500000 matrices multiplication performance, and noticed that it was... never-ending (i. e. ending with out of memory exception after 10 minutes of so)! After studying haskell a bit more I managed to get rid of laziness and the best result I managed to get is ~20 times slower than its equivalent in C.
So, the question: could you review the code below and tell if its performance can be improved a bit more? 20 times is still disappointing me a bit.
import Prelude hiding (foldr, foldl, product)
import Data.Monoid
import Data.Foldable
import Text.Printf
import System.CPUTime
import System.Environment
data Vector a = Vec3 a a a
| Vec4 a a a a
deriving Show
instance Foldable Vector where
foldMap f (Vec3 a b c) = f a `mappend` f b `mappend` f c
foldMap f (Vec4 a b c d) = f a `mappend` f b `mappend` f c `mappend` f d
data Matr a = Matr !a !a !a !a
!a !a !a !a
!a !a !a !a
!a !a !a !a
instance Show a => Show (Matr a) where
show m = foldr f [] $ matrRows m
where f a b = show a ++ "\n" ++ b
matrCols (Matr a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3)
= [Vec4 a0 a1 a2 a3, Vec4 b0 b1 b2 b3, Vec4 c0 c1 c2 c3, Vec4 d0 d1 d2 d3]
matrRows (Matr a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3)
= [Vec4 a0 b0 c0 d0, Vec4 a1 b1 c1 d1, Vec4 a2 b2 c2 d2, Vec4 a3 b3 c3 d3]
matrFromList [a0, b0, c0, d0, a1, b1, c1, d1, a2, b2, c2, d2, a3, b3, c3, d3]
= Matr a0 b0 c0 d0
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
matrId :: Matr Double
matrId = Matr 1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
normalise (Vec4 x y z w) = Vec4 (x/w) (y/w) (z/w) 1
mult a b = matrFromList [f r c | r <- matrRows a, c <- matrCols b] where
f a b = foldr (+) 0 $ zipWith (*) (toList a) (toList b)
First, I doubt that you'll ever get stellar performance with this implementation. There are too many conversions between different representations. You'd be better off basing your code on something like the vector package. Also you don't provide all your testing code, so there are probably other issues that we can't here. This is because the pipeline of production to consumption has a big impact on Haskell performance, and you haven't provided either end.
Now, two specific problems:
1) Your vector is defined as either a 3 or 4 element vector. This means that for every vector there's an extra check to see how many elements are in use. In C, I imagine your implementation is probably closer to
struct vec {
double *vec;
int length;
}
You should do something similar in Haskell; this is how vector and bytestring are implemented for example.
Even if you don't change the Vector definition, make the fields strict. You should also either add UNPACK pragmas (to Vector and Matrix) or compile with -funbox-strict-fields.
2) Change mult to
mult a b = matrFromList [f r c | r <- matrRows a, c <- matrCols b] where
f a b = Data.List.foldl' (+) 0 $ zipWith (*) (toList a) (toList b)
The extra strictness of foldl' will give much better performance in this case than foldr.
This change alone might make a big difference, but without seeing the rest of your code it's difficult to say.
Answering my own question just to share new results I got yesterday:
I upgraded ghc to the most recent version and performance became indeed not that bad (only ~7 times worse).
Also I tried implementing the matrix in a stupid and simple way (see the listing below) and got really acceptable performance - only about 2 times slower than C equivalent.
data Matr a = Matr ( a, a, a, a
, a, a, a, a
, a, a, a, a
, a, a, a, a)
mult (Matr (!a0, !b0, !c0, !d0,
!a1, !b1, !c1, !d1,
!a2, !b2, !c2, !d2,
!a3, !b3, !c3, !d3))
(Matr (!a0', !b0', !c0', !d0',
!a1', !b1', !c1', !d1',
!a2', !b2', !c2', !d2',
!a3', !b3', !c3', !d3'))
= Matr ( a0'', b0'', c0'', d0''
, a1'', b1'', c1'', d1''
, a2'', b2'', c2'', d2''
, a3'', b3'', c3'', d3'')
where a0'' = a0 * a0' + b0 * a1' + c0 * a2' + d0 * a3'
b0'' = a0 * b0' + b0 * b1' + c0 * b2' + d0 * b3'
c0'' = a0 * c0' + b0 * c1' + c0 * c2' + d0 * c3'
d0'' = a0 * d0' + b0 * d1' + c0 * d2' + d0 * d3'
a1'' = a1 * a0' + b1 * a1' + c1 * a2' + d1 * a3'
b1'' = a1 * b0' + b1 * b1' + c1 * b2' + d1 * b3'
c1'' = a1 * c0' + b1 * c1' + c1 * c2' + d1 * c3'
d1'' = a1 * d0' + b1 * d1' + c1 * d2' + d1 * d3'
a2'' = a2 * a0' + b2 * a1' + c2 * a2' + d2 * a3'
b2'' = a2 * b0' + b2 * b1' + c2 * b2' + d2 * b3'
c2'' = a2 * c0' + b2 * c1' + c2 * c2' + d2 * c3'
d2'' = a2 * d0' + b2 * d1' + c2 * d2' + d2 * d3'
a3'' = a3 * a0' + b3 * a1' + c3 * a2' + d3 * a3'
b3'' = a3 * b0' + b3 * b1' + c3 * b2' + d3 * b3'
c3'' = a3 * c0' + b3 * c1' + c3 * c2' + d3 * c3'
d3'' = a3 * d0' + b3 * d1' + c3 * d2' + d3 * d3'
Related
I am learning Z3 and perhaps my question does not apply, so please be patient.
Suppose I have the following:
c1, c2 = z3.BitVec('c1', 32), z3.BitVec('c2', 32)
c1 = c1 + c1
c2 = c2 + c2
c2 = c2 + c1
c1 = c1 + c2
e1 = z3.simplify(c1)
e2 = z3.simplify(c2)
When I print their sexpr():
print "e1=", e1.sexpr()
print "e2=", e2.sexpr()
Output:
e1= (bvadd (bvmul #x00000004 c1) (bvmul #x00000002 c2))
e2= (bvadd (bvmul #x00000002 c2) (bvmul #x00000002 c1))
My question is, how can I evaluate the numerical value of 'e1' and 'e2' for user supplied values of c1 and c2?
For example, e1(c1=1, c2=1) == 6, e2(c1=1, c2=1) == 4
Thanks!
I figured it out. I had to introduce two separate variables that will hold the expressions. Then I had to introduce two result variables for which I can query the model for their value:
e1, e2, c1, c2, r1, r2 = z3.BitVec('e1', 32), z3.BitVec('e2', 32), z3.BitVec('c1', 32),
z3.BitVec('c2', 32), z3.BitVec('r1', 32), z3.BitVec('r2', 32)
e1 = c1
e2 = c2
e1 = e1 + e1
e2 = e2 + e2
e2 = e2 + e1
e1 = e1 + e2
e1 = z3.simplify(e1)
e2 = z3.simplify(e2)
print "e1=", e1
print "e2=", e2
s = z3.Solver()
s.add(c1 == 5, c2 == 1, e1 == r1, e2 == r2)
if s.check() == z3.sat:
m = s.model()
print 'r1=', m[r1].as_long()
print 'r2=', m[r2].as_long()
How do I calculate NORMSINV of a number (between 0 and 1) and Square root of a number in Ruby ?
I am using the functions in Excel as below and I need to implement it in Ruby.
=NORMSINV(A1)
=SQRT(A1)
As c650 pointed out in his comment, the square root of x is just
x**0.5
Martin Vidner gave you a link to a decent algorithm for approximation of the normal distribution. For your convenience, I ported it to Ruby:
# algorithm ported from http://www.source-code.biz/snippets/vbasic/9.htm
A1 = -39.6968302866538
A2 = 220.946098424521
A3 = -275.928510446969
A4 = 138.357751867269
A5 = -30.6647980661472
A6 = 2.50662827745924
B1 = -54.4760987982241
B2 = 161.585836858041
B3 = -155.698979859887
B4 = 66.8013118877197
B5 = -13.2806815528857
C1 = -7.78489400243029E-03
C2 = -0.322396458041136
C3 = -2.40075827716184
C4 = -2.54973253934373
C5 = 4.37466414146497
C6 = 2.93816398269878
D1 = 7.78469570904146E-03
D2 = 0.32246712907004
D3 = 2.445134137143
D4 = 3.75440866190742
P_low = 0.02425
P_high = 1 - P_low
def phi(p)
raise ArgumentError if p < 0 || p > 1
if p < P_low
q = (-2 * Math::log(p))**0.5
(((((C1 * q + C2) * q + C3) * q + C4) * q + C5) * q + C6) /
((((D1 * q + D2) * q + D3) * q + D4) * q + 1)
elsif p <= P_high
q = p - 0.5
r = q * q
(((((A1 * r + A2) * r + A3) * r + A4) * r + A5) * r + A6) * q /
(((((B1 * r + B2) * r + B3) * r + B4) * r + B5) * r + 1)
else
q = (-2 * Math::log(1 - p))**0.5
(((((C1 * q + C2) * q + C3) * q + C4) * q + C5) * q + C6) /
((((D1 * q + D2) * q + D3) * q + D4) * q + 1)
end
end
have a question on curve fitting / optimizing. I have three coupled ODEs that descibe a biochemical reaction with a disappearing substrate and two products being formed. I've found examples that have helped me create code to solve the ODEs (below). Now I want to optimize the unknown rate constants (k, k3 and k4) to fit to the experimental data, P, which is a signal from product y[1]. What would be the easiest way of doing this?
Thanks.
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
# Experimental data
P = [29.976,193.96,362.64,454.78,498.42,517.14,515.76,496.38,472.14,432.81,386.95,
352.93,318.93,279.47,260.19,230.92,202.67,180.3,159.09,137.31,120.47,104.51,99.371,
89.606,75.431,67.137,58.561,55.721]
# Three coupled ODEs
def conc (y, t) :
a1 = k * y[0]
a2 = k2 * y[0]
a3 = k3 * y[1]
a4 = k4 * y[1]
a5 = k5 * y[2]
f1 = -a1 -a2
f2 = a1 -a3 -a4
f3 = a4 -a5
f = np.array([f1, f2, f3])
return f
# Initial conditions for y[0], y[1] and y[2]
y0 = np.array([50000, 0.0, 0.0])
# Times at which the solution is to be computed.
t = np.linspace(0.5, 54.5, 28)
# Experimentally determined parameters.
k2 = 0.071
k5 = 0.029
# Parameters which would have to be fitted
k = 0.002
k3 = 0.1
k4 = 0.018
# Solve the equation
y = odeint(conc, y0, t)
# Plot data and the solution.
plt.plot(t, P, "bo")
#plt.plot(t, y[:,0]) # Substrate
plt.plot(t, y[:,1]) # Product 1
plt.plot(t, y[:,2]) # Product 2
plt.xlabel('t')
plt.ylabel('y')
plt.show()
Edit: I made some changes to the code in order to show how to fit to the experimental data of all ODEs.
Like this:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
from scipy.odr import Model, Data, ODR
# Experimental data
P = [29.976,193.96,362.64,454.78,498.42,517.14,515.76,496.38,472.14,432.81,386.95,
352.93,318.93,279.47,260.19,230.92,202.67,180.3,159.09,137.31,120.47,104.51,99.371,
89.606,75.431,67.137,58.561,55.721]
# Times at which the solution is to be computed.
t = np.linspace(0.5, 54.5, 28)
def coupledODE(beta, x):
k, k3, k4 = beta
# Three coupled ODEs
def conc (y, t) :
a1 = k * y[0]
a2 = k2 * y[0]
a3 = k3 * y[1]
a4 = k4 * y[1]
a5 = k5 * y[2]
f1 = -a1 -a2
f2 = a1 -a3 -a4
f3 = a4 -a5
f = np.array([f1, f2, f3])
return f
# Initial conditions for y[0], y[1] and y[2]
y0 = np.array([50000, 0.0, 0.0])
# Experimentally determined parameters.
k2 = 0.071
k5 = 0.029
# Parameters which would have to be fitted
#k = 0.002
#k3 = 0.1
#k4 = 0.018
# Solve the equation
y = odeint(conc, y0, x)
return y[:,1]
# in case you are only fitting to experimental findings of ODE #1
# return y.ravel()
# in case you have experimental findings of all three ODEs
data = Data(t, P)
# with P being experimental findings of ODE #1
# data = Data(np.repeat(t, 3), P.ravel())
# with P being a (3,N) array of experimental findings of all ODEs
model = Model(coupledODE)
guess = [0.1,0.1,0.1]
odr = ODR(data, model, guess)
odr.set_job(2)
out = odr.run()
print out.beta
print out.sd_beta
f = plt.figure()
p = f.add_subplot(111)
p.plot(t, P, 'ro')
p.plot(t, coupledODE(out.beta, t))
plt.show()
In case you were using peak-o-mat (http://lorentz.sf.net) which is an interactive curve fitting program based on scipy, you could add your ODE model and save it to userfunc.py (see the customisation section in the docs):
import numpy as np
from scipy.integrate import odeint
from peak_o_mat import peaksupport as ps
def coupODE(x, k, k3, k4):
# Three coupled ODEs
def conc (y, t) :
a1 = k * y[0]
a2 = k2 * y[0]
a3 = k3 * y[1]
a4 = k4 * y[1]
a5 = k5 * y[2]
f1 = -a1 -a2
f2 = a1 -a3 -a4
f3 = a4 -a5
f = np.array([f1, f2, f3])
return f
# Initial conditions for y[0], y[1] and y[2]
y0 = np.array([50000, 0.0, 0.0])
# Times at which the solution is to be computed.
#t = np.linspace(0.5, 54.5, 28)
# Experimentally determined parameters.
k2 = 0.071
k5 = 0.029
# Parameters which would have to be fitted
#k = 0.002
#k3 = 0.1
#k4 = 0.018
# Solve the equation
y = odeint(conc, y0, x)
print y
return y[:,1]
ps.add('ODE',
func='coupODE(x,k,k3,k4)',
info='thre coupled ODEs',
ptype='MISC')
You would need to prepare your data as a text file with two columns for time and experimental data. Import the data into peak-o-mat, enter 'ODE' as fit model, choose appropriate initial parameters for k,k3,k4 and hit 'Fit'.
You are my last hope.
In my university there are no people able to answer my question.
I've got a function quite complex depending on 6 paramethers a0,a1,a2,b0,b1,b2 that minimize the delta of pression, volume liquid and volume vapor calculated by a rather new equation of state.
NMinimize is very slow and I could not do any considerations about this equation because timing is very high.
In the code there are some explanations and some problems concerning my code.
On my knees I pray you to help me.
I'm sorry, but after 4 months on construction of these equation I could not test it. And frustration is increasing day after day!
Clear["Global`*"];
data = {{100., 34.376, 0.036554, 23.782}, {105., 56.377, 0.037143,
15.116}, {110., 88.13, 0.037768, 10.038}, {115., 132.21, 0.038431,
6.9171}, {120., 191.43, 0.039138, 4.9183}, {125., 268.76,
0.039896, 3.5915}, {130., 367.32, 0.040714, 2.6825}, {135.,
490.35, 0.0416, 2.0424}, {140., 641.18, 0.042569, 1.5803}, {145.,
823.22, 0.043636, 1.2393}, {150., 1040., 0.044825,
0.98256}, {155., 1295., 0.046165, 0.78568}, {160., 1592.1,
0.047702, 0.63206}, {165., 1935.1, 0.0495, 0.51014}, {170.,
2328.3, 0.051667, 0.41163}, {175., 2776.5, 0.054394,
0.33038}, {180., 3285.2, 0.058078, 0.26139}, {185., 3861.7,
0.063825, 0.19945}, {190., 4518.6, 0.079902, 0.12816}};
tvector = data[[All, 1]];(*K*)
pvector =
data[[All, 2]];(*KPa*)
vlvector = data[[All, 3]];(*L/mol*)
vvvector =
data[[All, 4]];
(*L/mol.*)
r = 8.314472;
tc = 190.56;
avvicinamento = Length[tvector] - 3;
trexp = Take[tvector, avvicinamento]/tc;
vlexp = Take[vlvector, avvicinamento];
vvexp = Take[vvvector, avvicinamento];
zeri = Table[i*0., {i, avvicinamento}];
pexp = Take[pvector, avvicinamento];
(*Function for calculation of Fugacity of CSD Equation*)
(*Function for calculation of Fugacity of CSD Equation*)
fug[v_, p_, t_, a_, b_] :=
Module[{y, z, vbv, vb, f1, f2, f3, f4, f}, y = b/(4 v);
z = (p v)/(r t);
vbv = Log[(v + b)/v];
vb = v + b;
f1 = (4*y - 3*y^2)/(1 - y)^2;
f2 = (4*y - 2*y^2)/(1 - y)^3;
f3 = (2*vbv)/(r t*b)*a;
f4 = (vbv/b - 1/vb)/(r t)*a;
f = f1 + f2 - f3 + f4 - Log[z];
Exp[f]]
(*g Minimize the equality of fugacity*)
g[p_?NumericQ, t_?NumericQ, a0_?NumericQ, a1_?NumericQ, a2_?NumericQ,
b0_?NumericQ, b1_?NumericQ, b2_?NumericQ] := Module[{},
a = a0*Exp[a1*t + a2*t^2];
b = b0 + b1*t + b2*t^2;
csd = a/(r*t*(b + v)) - (-(b^3/(64*v^3)) + b^2/(16*v^2) +
b/(4*v) + 1)/(1 - b/(4*v))^3 + (p*v)/(r*t);
vol = NSolve[csd == 0 && v > 0, v, Reals];
sol = v /. vol;
(*If[Length[sol]==1,Interrupt[];Print["Sol==1"]];*)
vliquid = Min[sol];
vvapor = Max[sol];
fl = fug[vliquid, p, t, a, b];
fv = fug[vvapor, p, t, a, b];
(*Print[{t,p,vol,Abs[fl-fv]}];*)
Abs[fl - fv]];
(*This function minimize the pcalc-pexp and vcalc-vexp *)
hope[a0_?NumericQ, a1_?NumericQ, a2_?NumericQ, b0_?NumericQ,
b1_?NumericQ, b2_?NumericQ] :=
Module[{},
pp[a0, a1, a2, b0, b1, b2] :=
Table[FindRoot[{g[p, tvector[[i]], a0, a1, a2, b0, b1, b2]},
{p,pvector[[i]]}],{i,avvicinamento}];
pressioni1 = pp[a0, a1, a2, b0, b1, b2];
pcalc = p /. pressioni1;
differenza = ((pcalc - pexp)/pexp)^2;
If[MemberQ[differenza, 0.],
differenza = zeri + RandomReal[{100000, 500000}];(*
First problem:
As I've FindRoot that finds the solutions equal to the starting \
point, I don't want these kind of solutions and with this method - \
+RandomReal[{100000,500000}] -
a keep away this solutions.Is it right? *)
deltap = Total[differenza],
differenzanonzero = Select[differenza, # > 0 &];
csd1[a_, b_, p_, t_] :=
a/(r*t*(b + v)) - (-(b^3/(64*v^3)) + b^2/(16*v^2) + b/(4*v) +
1)/(1 - b/(4*v))^3 + (p*v)/(r*t);(*Funzione CSD*)
volumi =
Table[NSolve[csd1[a, b, pcalc[[i]], tvector[[i]]], v, Reals], {i,
avvicinamento}];
soluzioni = v /. volumi;
vvcalc = Table[Max[soluzioni[[i]]], {i, avvicinamento}];
vlcalc = Table[Min[soluzioni[[i]]], {i, avvicinamento}];
deltavl = Total[((vlexp - vlcalc)/vlcalc)^2];
deltavv = Total[((vvexp - vvcalc)/vvcalc)^2];
deltap = Total[differenza];
Print[a0, " ", b0, " ", delta];
delta = 0.1*deltavl + 0.1*deltavv + deltap]];
NMinimize[{hope[a0, a1, a2, b0, b1, b2],
500 < a0 < 700 && -0.01 < a1 < -1.0*10^-5 && -10^-5 < a2 < -10^-7 &&
0.0010 < b0 < 0.1 && -0.0010 < b1 < -1.0*10^-5 &&
10^-9 < b2 < 10^-7}, {a0, a1, a2, b0, b1, b2}]
Thanks in advance!
Mariano Pierantozzi
PhD Student in chemical Engineering
I have a data.frame named "d" of ~1,300,000 lines and 4 columns and another data.frame named "gc" of ~12,000 lines and 2 columns (but see the smaller example below).
d <- data.frame( gene=rep(c("a","b","c"),4), val=rnorm(12), ind=c( rep(rep("i1",3),2), rep(rep("i2",3),2) ), exp=c( rep("e1",3), rep("e2",3), rep("e1",3), rep("e2",3) ) )
gc <- data.frame( gene=c("a","b","c"), chr=c("c1","c2","c3") )
Here is how "d" looks like:
gene val ind exp
1 a 1.38711902 i1 e1
2 b -0.25578496 i1 e1
3 c 0.49331256 i1 e1
4 a -1.38015272 i1 e2
5 b 1.46779219 i1 e2
6 c -0.84946320 i1 e2
7 a 0.01188061 i2 e1
8 b -0.13225808 i2 e1
9 c 0.16508404 i2 e1
10 a 0.70949804 i2 e2
11 b -0.64950167 i2 e2
12 c 0.12472479 i2 e2
And here is "gc":
gene chr
1 a c1
2 b c2
3 c c3
I want to add a 5th column to "d" by incorporating data from "gc" that match with the 1st column of "d". For the moment I am using sapply.
d$chr <- sapply( 1:nrow(d), function(x) gc[ gc$gene==d[x,1], ]$chr )
But on the real data, it takes a "very long" time (I am running the command with "system.time()" since more than 30 minutes and it's still not finished).
Do you have any idea of how I could rewrite this in a clever way? Or should I consider using plyr, maybe with the "parallel" option (I have four cores on my computer)? In such a case, what would be the best syntax?
Thanks in advance.
I think you can just use the factor as index:
gc[ d[,1], 2]
[1] c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3
Levels: c1 c2 c3
does the same as:
sapply( 1:nrow(d), function(x) gc[ gc$gene==d[x,1], ]$chr )
[1] c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3
Levels: c1 c2 c3
But is much faster:
> system.time(replicate(1000,sapply( 1:nrow(d), function(x) gc[ gc$gene==d[x,1], ]$chr )))
user system elapsed
5.03 0.00 5.02
>
> system.time(replicate(1000,gc[ d[,1], 2]))
user system elapsed
0.12 0.00 0.13
Edit:
To expand a bit on my comment. The gc dataframe requires one row for each level of gene in the order of the levels for this to work:
d <- data.frame( gene=rep(c("a","b","c"),4), val=rnorm(12), ind=c( rep(rep("i1",3),2), rep(rep("i2",3),2) ), exp=c( rep("e1",3), rep("e2",3), rep("e1",3), rep("e2",3) ) )
gc <- data.frame( gene=c("c","a","b"), chr=c("c1","c2","c3") )
gc[ d[,1], 2]
[1] c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3
Levels: c1 c2 c3
sapply( 1:nrow(d), function(x) gc[ gc$gene==d[x,1], ]$chr )
[1] c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3 c1
Levels: c1 c2 c3
But it is not hard to fix that:
levels(gc$gene) <- levels(d$gene) # Seems redundant as this is done right quite often automatically
gc <- gc[order(gc$gene),]
gc[ d[,1], 2]
[1] c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3 c1
Levels: c1 c2 c3
sapply( 1:nrow(d), function(x) gc[ gc$gene==d[x,1], ]$chr )
[1] c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3 c1
Levels: c1 c2 c3
An alternative solution that does not beat Sasha's approach timing-wise, but is more generalizable and readable, is to simply merge the two data frames:
d <- merge(d, gc)
I have a slower system, so here are my timings:
> system.time(replicate(1000,sapply( 1:nrow(d), function(x) gc[ gc$gene==d[x,1], ]$chr )))
user system elapsed
11.22 0.12 11.86
> system.time(replicate(1000,gc[ d[,1], 2]))
user system elapsed
0.34 0.00 0.35
> system.time(replicate(1000, merge(d, gc, by="gene")))
user system elapsed
3.35 0.02 3.40
The benefit is that you could have multiple keys, fine control over non-matching items, etc.