Statsmodels: How to add numbers in the formula - statmodels

How can I add numbers in the formula? ex:
sm.ols(formula = 'weight ~ 2.30 + mpg*2 + horsepower', data=mpg).fit()

Related

Multiply polynomials using DFT algorithm [duplicate]

I am new to FFTs so I am slightly confused on some concepts. So far the FFT examples I've seen for equation multiplication involve equations with consecutive exponents (i.e. A(x) = 1 + 3x + 5x^2 +... and B(x) = 4 + 6x + 9x^2 + ... and C(x) = A(x)*B(x)). However, it is possible to use FFT on two equations that do not have equal exponents? For example, is it possible to use FFT to multiply:
A(x) = 1 + 3x^2 + 9x^8
and
B(x) = 5x + 6 x^3 + 10x^8
in O(nlogn) time?
If not, are there any cases where the runtime will be O(nlogn)? For example, if the number of terms in the product is O(n) instead of O(n^2)?
Even if the runtime is more than O(nlogn), how can we use FFT to minimize the runtime?
yes it is possible to use DFFT on non equal exponent polynomials...
the missing exponents are just multiplied by 0 which is also a number... just rewrite your polynomials:
A(x) = 1 + 3x^2 + 9x^8
B(x) = 5x + 6x^3 + 10x^8
to something like this:
A(x) = 1x^0 + 0x^1 + 3x^2 + 0x^3 + 0x^4+ 0x^5+ 0x^6+ 0x^7 + 9x^8
B(x) = 0x^0 + 5x^1 + 0x^2 + 6x^3 + 0x^4+ 0x^5+ 0x^6+ 0x^7 + 10x^8
so your vectors for DFFT are:
A = (1,0,3,0,0,0,0,0, 9)
B = (0,5,0,6,0,0,0,0,10)
add zero's so the vector is the correct result size (max A exponent +1 + max B exponent +1) and also round up to closest power of 2 for DFFT usage so original sizes are 9,9 -> 9+9 -> 18 -> round up -> 32
A = (1,0,3,0,0,0,0,0, 9,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0)
B = (0,5,0,6,0,0,0,0,10,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0)
// | original | correct result | nearest power of 2 |
and do the DFFT stuff you want ... I assume you want to do something like this:
A' = DFFT(A)
B' = DFFT(B)
C(i)' = A'(i) * B'(i) // i=0..n-1
C= IDFFT(C')
which is O(n*log(n)). Do not forget that if you use DFFT (not DFT) n = 32 and not 18 !!! because n must be power of 2 for fast algorithm of DFT also if you want performance improvements than look at the DFFT weight matrices for DFFT(A),DFFT(B) they are the same so no need to compute them twice ...

Fixed-Effects Regression with Panel Data - Dummy variable is excluded in Output

I'm rather new to statistics and R. I'm currently working on a paper and I'm really stuck with a coding problem right now. I suspect that the problem lays within my dummy variable.
I did a fixed-effects regression with my panel data which worked out fine. All my variables (the Y and the Xs) were numeric. I decided to add another variable that is a dummy variable with two levels (yes/no).
I set the variable as as factor variable but whenever I try to run the regression it does not show up in the output. As soon as I remove one specific numeric variable - it shows up in the output.
I obviously don't want to exclude that one numeric variable to include my dummy variable - there must be another way or something I did wrong...
regdata <- read_excel("LinRegData.xlsx")
regdata$eu <- as.factor(regdata$eu)
attach(regdata)
pdata <- pdata.frame(regdata, index=c("country","year"))
pdata$eu <-as.factor((pdata$eu))
plmwithin <- plm(subaus ~ employ + pref + + iEaeM + eu, data = pdata, model="within", family = poisson, effect = 'twoways', index = c('country', 'year'))
The output always excludes the eu-variable:
Call:
plm(formula = subaus ~ employ + pref + eu + iEaeM, data = pdata,
effect = "twoways", model = "within", index = c("country",
"year"), family = poisson)
Unbalanced Panel: n = 32, T = 8-11, N = 288
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.4828507 -0.0867719 -0.0021724 0.0857117 0.7668712
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
employ 0.03496136 0.00590717 5.9185 1.1e-08 ***
pref -0.02081850 0.00908030 -2.2927 0.02272 *
iEaeM -0.00010028 0.00070435 -0.1424 0.88690
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 9.4599
Residual Sum of Squares: 8.183
R-Squared: 0.13498
Adj. R-Squared: -0.021654
F-statistic: 12.639 on 3 and 243 DF, p-value: 1.0498e-07
>
If I exclude the iEaeM variable it shows:
Call:
plm(formula = subaus ~ employ + pref + eu, data = pdata, effect = "twoways",
model = "within", index = c("country", "year"), family = poisson)
Unbalanced Panel: n = 32, T = 8-23, N = 659
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-1.04463 -0.18859 -0.01558 0.14696 1.46631
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
employ 0.01346798 0.00525102 2.5648 0.01056 *
pref -0.00014268 0.00927955 -0.0154 0.98774
eu1 -0.27793656 0.06324072 -4.3949 1.31e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 64.478
Residual Sum of Squares: 61.197
R-Squared: 0.050889
Adj. R-Squared: -0.037401
F-statistic: 10.7592 on 3 and 602 DF, p-value: 6.7744e-07
>

Find the sum of Fibonacci Series

I have given a Set A I have to find the sum of Fibonacci Sum of All the Subsets of A.
Fibonacci(X) - Is the Xth Element of Fibonacci Series
For example, for A = {1,2,3}:
Fibonacci(1) + Fibonacci(2) + Fibonacci(3) + Fibonacci(1+2) + Fibonacci(2+3) + Fibonacci(1+3) + Fibonacci(1+2+3)
1 + 1 + 2 + 2 + 5 + 3 + 8 = 22
Is there any way I can find the sum without generating the subset?
Since I find the Sum of all subset easily
i.e. Sum of All Subset - (1+2+3)*(pow(2,length of set-1))
There surely is.
First, let's recall that the nth Fibonacci number equals
φ(n) = [φ^n - (-φ)^(-n)]/√5
where φ = (√5 + 1)/2 (Golden Ratio) and (-φ)^(-1) = (1-√5)/2. But to make this shorter, let me denote φ as A and (-φ)^(-1) as B.
Next, let's notice that a sum of Fibonacci numbers is a sum of powers of A and B:
[φ(n) + φ(m)]*√5 = A^n + A^m - B^n - B^m
Now what is enough to calc (in the {1,2,3} example) is
A^1 + A^2 + A^3 + A^{1+2} + A^{1+3} + A^{2+3} + A^{1+2+3}.
But hey, there's a simpler expression for this:
(A^1 + 1)(A^2 + 1)(A^3 + 1) - 1
Now, it is time to get the whole result.
Let our set be {n1, n2, ..., nk}. Then our sum will be equal to
Sum = 1/√5 * [(A^n1 + 1)(A^n2 + 1)...(A^nk + 1) - (B^n1 + 1)(B^n2 + 1)...(B^nk + 1)]
I think, mathematically, this is the "simplest" form of the answer as there's no relation between n_i. However, there could be some room for computative optimization of this expression. In fact, I'm not sure at all if this (using real numbers) will work faster than the "straightforward" summing, but the question was about avoiding subsets generation, so here's the answer.
I tested the answer from YakovL using Python 2.7. It works very well and is plenty quick. I cannot imagine that summing the sequence values would be quicker. Here's the implementation.
_phi = (5.**0.5 + 1.)/2.
A = lambda n: _phi**n
B = lambda n: (-_phi)**(-n)
prod = lambda it: reduce(lambda x, y: x*y, it)
subset_sum = lambda s: (prod(A(n)+1 for n in s) - prod(B(n)+1 for n in s))/5**0.5
And here are some test results:
print subset_sum({1, 2, 3})
# 22.0
# [Finished in 0.1s]
print subset_sum({1, 2, 4, 8, 16, 32, 64, 128, 256, 512})
# 7.29199318438e+213
# [Finished in 0.1s]

Finding Smallest Number of Elements to make a Sum

I have a simple algorithmic question:
If I have certain elements that integer values like:
1 1 1 1 1 1 1 1 1 1 1 1 10 12 2
and I have to make the sum 12, the minimum number of elements needed would 1, I would just use 12.
Thus, my question is how would you:
find the minimum number of elements to make some sum, and if you can't output -1.
Please suggest an algorithm I can look into so I can solve this efficiently. I've already tried brute force but it is much to slow for my needs.
The problem is np-complete and can be reduced to subset sum or knapsack problem. There is pseudo polynomial time algorithm that can solve it using dynamic programming. Following is a solution similar to knapsack analogy:-
1. Knapsack capacity = Sum
2. Items have same weight and value
3. Maximize profit
4. if max_profit == Sum then there is a solution
5. else Sum cannot be made from the items given.
6. Evaluate the minimum items needed using matrix alongside the DP.
7. Can also reconstruct all solutions and get the minimum one.
Time Complexity : - O(Sum*Items)
Java Implementation :-
public class SubSetSum {
static int[][] costs;
static int[][] minItems;
public static void calSets(int target,int[] arr) {
costs = new int[arr.length][target+1];
minItems = new int[arr.length][target+1];
for(int j=0;j<=target;j++) {
if(arr[0]<=j) {
costs[0][j] = arr[0];
minItems[0][j] = 1;
}
}
for(int i=1;i<arr.length;i++) {
for(int j=0;j<=target;j++) {
costs[i][j] = costs[i-1][j];
minItems[i][j] = minItems[i-1][j];
if(arr[i]<=j) {
costs[i][j] = Math.max(costs[i][j],costs[i-1][j-arr[i]]+arr[i]);
if(costs[i-1][j]==costs[i-1][j-arr[i]]+arr[i]) {
minItems[i][j] = Math.min(minItems[i][j],minItems[i-1][j-arr[i]]+1);
}
else if(costs[i-1][j]<costs[i-1][j-arr[i]]+arr[i]) {
minItems[i][j] = minItems[i-1][j-arr[i]]+1;
}
}
}
}
// System.out.println(costs[arr.length-1][target]);
if(costs[arr.length-1][target]==target) {
System.out.println("Minimum items need : "+minItems[arr.length-1][target]);
}
else System.out.println("No such Set found");
}
public static void main(String[] args) {
int[] arr = {1,1,1,1, 1 ,1 ,1, 1 ,1, 1 ,1 ,1, 10 ,12, 2};
calSets(12, arr);
}
}
here is a recursive approach that should be rather fast:
1) if your input vector is of length 1, either return 1 if the value is equal the target, or return -1 if it doesn't. similarly, if your target is less than any of your items in your input vector, return -1.
2) otherwise, loop on (unique) values in your input vector (in descending order, for performance):
2a) remove the value for your vector, and substract it from your target.
2b) recursively call this function on the new vector and the new target
note: you can pass down the algorithm a max.step parameter, so that if you have already found a solution with length K, you would stop the recursive calls at that depth, but not beyond. remember to decrease your max.step value in each recursive call.
3) collect all the values from the recursive calls, take the minimum (which is not -1) and add 1 to it and return, or, if all values in the loop are -1, return -1.
Disclaimer: This is an advertisement for nice but relatively simple mathematics which leads to very clever and fast counting formulas and algorithms. I'm aware that you can find a much simpler and efficient solution using usual programming. I just like the fact that using properly a Computer Algebra System you can do it in a one liner: Lets get 19 with this list:
sage: l = [1,1,1,2,5,2,1,3,12,1,3]; goal = 19
sage: prod((1+t*x^i) for i in l).expand().collect(x).coefficient(x,goal).low_degree(t)
3
What about 25:
sage: goal=25
sage: prod((1+t*x^i) for i in l).expand().collect(x).coefficient(x,goal).low_degree(t)
5
36 is not feasible:
sage: goal=36
sage: prod((1+t*x^i) for i in l).expand().collect(x).coefficient(x,goal).low_degree(t)
0
Here are some details: Just expand the product
(1+t*x^l[0]) (1+t*x^l[1]) ... (1+t*x^l[n])
Where your list is l. Then to find the minimum number of element required to get the sum S, collect the coefficients of x^S and return the minimum degree of a term in t.
Here is how it could be done in sage:
sage: var("x t")
(x, t)
sage: l = [1,1,1,2,5,2,1,3,12,1,3]
sage: s = prod((1+t*x^i) for i in l)
sage: s = expand(s).collect(x)
Now
sage: print(s)
t^11*x^32 + 5*t^10*x^31 + 2*(t^10 + 5*t^9)*x^30 + 2*(t^10 + 5*t^9 + 5*t^8)*x^29 + (11*t^9 + 20*t^8 + 5*t^7)*x^28 + (t^10 + 4*t^9 + 25*t^8 + 20*t^7 + t^6)*x^27 + 2*(3*t^9 + 10*t^8 + 15*t^7 + 5*t^6)*x^26 + (2*t^9 + 17*t^8 + 40*t^7 + 20*t^6 + 2*t^5)*x^25 + (2*t^9 + 12*t^8 + 30*t^7 + 40*t^6 + 7*t^5)*x^24 + (11*t^8 + 30*t^7 + 35*t^6 + 20*t^5 + t^4)*x^23 + 2*(2*t^8 + 13*t^7 + 20*t^6 + 13*t^5 + 2*t^4)*x^22 + (t^8 + 20*t^7 + 35*t^6 + 30*t^5 + 11*t^4)*x^21 + (t^10 + 7*t^7 + 40*t^6 + 30*t^5 + 12*t^4 + 2*t^3)*x^20 + (5*t^9 + 2*t^7 + 20*t^6 + 40*t^5 + 17*t^4 + 2*t^3)*x^19 + 2*(t^9 + 5*t^8 + 5*t^6 + 15*t^5 + 10*t^4 + 3*t^3)*x^18 + (2*t^9 + 10*t^8 + 10*t^7 + t^6 + 20*t^5 + 25*t^4 + 4*t^3 + t^2)*x^17 + (11*t^8 + 20*t^7 + 5*t^6 + 5*t^5 + 20*t^4 + 11*t^3)*x^16 + (t^9 + 4*t^8 + 25*t^7 + 20*t^6 + t^5 + 10*t^4 + 10*t^3 + 2*t^2)*x^15 + 2*(3*t^8 + 10*t^7 + 15*t^6 + 5*t^5 + 5*t^3 + t^2)*x^14 + (2*t^8 + 17*t^7 + 40*t^6 + 20*t^5 + 2*t^4 + 5*t^2)*x^13 + (2*t^8 + 12*t^7 + 30*t^6 + 40*t^5 + 7*t^4 + t)*x^12 + (11*t^7 + 30*t^6 + 35*t^5 + 20*t^4 + t^3)*x^11 + 2*(2*t^7 + 13*t^6 + 20*t^5 + 13*t^4 + 2*t^3)*x^10 + (t^7 + 20*t^6 + 35*t^5 + 30*t^4 + 11*t^3)*x^9 + (7*t^6 + 40*t^5 + 30*t^4 + 12*t^3 + 2*t^2)*x^8 + (2*t^6 + 20*t^5 + 40*t^4 + 17*t^3 + 2*t^2)*x^7 + 2*(5*t^5 + 15*t^4 + 10*t^3 + 3*t^2)*x^6 + (t^5 + 20*t^4 + 25*t^3 + 4*t^2 + t)*x^5 + (5*t^4 + 20*t^3 + 11*t^2)*x^4 + 2*(5*t^3 + 5*t^2 + t)*x^3 + 2*(5*t^2 + t)*x^2 + 5*t*x + 1
Ok this is a huge expression. The nice feature here is that If I take the coefficient say of x^17 I get:
sage: s.coefficient(x, 17)
2*t^9 + 10*t^8 + 10*t^7 + t^6 + 20*t^5 + 25*t^4 + 4*t^3 + t^2
which says the following: the term 10*t^7 tells me that there are 10 different way to obtains the sum 17 using 7 number. Another example, there are 25 way to get 17 using 4 number (25*t^4).
Also since this expression ends with t^2 I learn that I only need two number to get 17. Unfortunately this doesn't tells which numbers.
If you want to understand the trick, look at Wikipedia article on generating functions and This Page.
Note 1: this is not the most efficient since I compute much more than what you need. The huge expression actually described and somehow computed all possible choices (that is 2^the length of the list). But it's a one liner:
sage: prod((1+t*x^i) for i in l).expand().collect(x).coefficient(x,17).low_degree(t)
2
And still relatively efficient:
sage: %timeit prod((1+t*x^i) for i in l).expand().collect(x).coefficient(x,17).low_degree(t)
10 loops, best of 3: 42.6 ms per loop
Note 2: After thinking carefully about it I also realized the following: Generating series is just a compact encoding of what you would have written if you tried to implement a dynamic programming solution.
I don't think this solution is optimal, but it's very easy to understand and use, you sort the elements in decreasing order, then you take each element and try to fit it in your number. If you have the sequence [5,6,2,7] and you need to make the 15 number, you'll reorder the sequence [7,6,5,2] and take 7, then you need to extract 8 so you'll take 6, then you'll need 2 more, check 5 but it's too big and you'll skip it and check the last number, 2, which it's perfect and finishes your number. So you'd print out 3. This is the worst case of the algorithm which is O(n). But in your example with 12, it'll be O(1), because you'll pick 12 from the first checkup of the ordered sequence. (running time applies only for the program of choosing items, not sorting)
resolve_sum(ordered_items[], number) {
count = 0;
aux = number;
i = 0;
while (aux - ordered_items[i] <= 0) {
count = count + 1;
aux = aux - ordered_items[i];
i = i + 1;
}
if (aux == 0) return count;
else return -1;
}
I haven't included an algorithm for sorting, you can choose one that you know best or try to learn a new efiecient one. Link with sorting algorithms and their running time. This is just a sample code you can use in C/C++ or Java or what you need. I hope it isn't way too much brute force.

Calculate cash flows given a target IRR

I apologize if the answer for this is somewhere already, I've been searching for a couple of hours now and I can't find what I'm looking for.
I'm building a simple financial calculator to calculate the cash flows given the target IRR. For example:
I have an asset worth $18,000,000 (which depreciates at $1,000,000/year)
I have a target IRR of 10% after 5 years
This means that the initial investment is $18,000,000, and in year 5, I will sell this asset for $13,000,000
To reach my target IRR of 10%, the annual cash flows have to be $2,618,875. Right now, I calculate this by hand in an Excel sheet through guess-and-check.
There's other variables and functionality, but they're not important for what I'm trying to do here. I've found plenty of libraries and functions that can calculate the IRR for a given number of cash flows, but nothing comes up when I try to get the cash flow for a given IRR.
At this point, I think the only solution is to basically run a loop to plug in the values, check to see if the IRR is higher or lower than the target IRR, and keep calculating the IRR until I get the cash flow that I want.
Is this the best way to approach this particular problem? Or is there a better way to tackle it that I'm missing? Help greatly appreciated!
Also, as an FYI, I'm building this in Ruby on Rails.
EDIT:
IRR Function:
NPV = -(I) + CF[1]/(1 + R)^1 + CF[2]/(1 + R)^2 + ... + CF[n]/(1 + R)^n
NPV = the Net Present Value (this value needs to be as close to 0 as possible)
I = Initial investment (in this example, $18,000,000)
CF = Cash Flow (this is the value I'm trying to calculate - it would end up being $2,618,875 if I calculated it by hand. In my financial calculator, all of the cash flows would be the same since I'm solving for them.)
R = Target rate of return (10%)
n = the year (so this example would end at 5)
I'm trying to calculate the Cash Flows to within a .005% margin of error, since the numbers we're working with are in the hundreds of millions.
Let
v0 = initial value
vn = value after n periods
n = number of periods
r = annual rate of return
y = required annual net income
The one period discount factor is:
j = 1/(1+r)
The present value of the investment is:
pv = - v0 + j*y + j^2*y + j^3*y +..+ j^n*y + j^n*vn
= - v0 + y*(j + j^2 + j^3 +..+ j^n) + j^n*vn
= - v0 + y*sn + j^n*vn
where
sn = j + j^2 + j^3 + j^4 +..+ j^n
We can calulate sn as follows:
sn = j + j^2 + j^3 + j^4 +..+ j^n
j*sn = j^2 + j^3 + j^4 +..+ j^n + j^(n+1)
sn -j*sn = j*(1 - j^n)
sn = j*(1 - j^n)/(1-j)
= (1 - j^n)/[(1+r)(r/(1+r)]
= (1 - j^n)/r
Set pv = 0 and solve for y:
y*sn = v0 - vn * j^n
y = (v0 - vn * j^n)/sn
= r * (v0 - vn * j^n)/(1 - j^n)
Our Ruby method:
def ann_ret(v0, vn, n, r)
j = 1/(1+r)
(r * (v0 - vn * j**n)/(1 - j**n)).round(2)
end
With annual compounding:
ann_ret(18000000, 13000000, 5, 0.1) # => 2618987.4
With semi-annual compounding:
2 * ann_ret(18000000, 13000000, 10, 0.05) # => 2595045.75
With daily compounding:
365 * ann_ret(18000000, 13000000, 5*365, 0.10/365) # => 2570881.20
These values differ slightly from the required annual return you calculate. You should be able to explain the difference by comparing present value formulae.
There's a module called Newton in Ruby... it uses the Newton Raphson method.
I've been using this module to implement the IRR function into this library:
https://github.com/Noverde/exonio
If you need the IRR, you can use like this:
Exonio.irr([-100, 39, 59, 55, 20]) # ==> 0.28095

Resources