So basically I defined a temperature function:
temps[y_]:=Map[(makeOrdinal[#,y])&,WeatherData["KGRB","MeanTemperature",
{{y,1,1},{y,12,31}, "Day"}]
that will give the average temperatures for KGRB (Green Bay) for any given year as long as it is in this format.
temps2010=temps[2010]
My problem is that I do not know how to create the code that will show the temperatures from 1983 to 2012. Then map the parameter finding function
params=FindFit[temps2009, -a*Cos[2*Pi*(t - b)/365] + c, {a, b, c}, t]
across this table to construct a table of parameter values for the 30 years in question, therefore it will not be 2009 as shown in the code.
Related
#dataset, sigmoidal curve
ratio278a267 ={{5.445, 0.0501}, {6.177, 0.035}, {7., 0.0589}, {7.368,
0.0953}, {7.73, 0.1419}, {8.169, 1.0697}, {9.141, 1.0869}, {10.3,
1.0981}}
#nonlinearmodelfitting for dataset
fit = FindFit[ratio278a267, (1/(a + (E^(-b*(x - c))))), {a, b, c}, x]
fit2 = NonlinearModelFit[
ratio278a267, (1/(a + (E^(-b*(x - c))))), {a, b, c}, x]["RSquared"]
#fit1 & fit2 output respectively
output:
{a -> 0.639719, b -> -250.201, c -> -1008.92}
0.
The code above is what i used for a nonlinear fitting in Mathematica, and the output has not provided reasonably small figures, despite me having plotted this in a graphing calculator overlaid ontop of my dataset, with numbers between 0-10 for a,b,c and have obtained reasonable fitting
One way to get FindFit to converge on a good solution is to give it good starting values, particularly when your model could give wildly bad fits for some values of the values.
x=.;a=.;b=.;c=.;
ratio278a267 ={{5.445, 0.0501}, {6.177, 0.035}, {7., 0.0589}, {7.368,0.0953},
{7.73, 0.1419}, {8.169, 1.0697}, {9.141, 1.0869}, {10.3,1.0981}};
fit = FindFit[ratio278a267, (1/(a+(E^(-b*(x-c))))), {{a,0.92}, {b,8.7}, {c,7.9}}, x]
Show[ListPlot[ratio278a267],Plot[(1/(a+(E^(-b*(x-c)))))/.fit,{x,5.445,10.3}]]
In this example I found those starting values by doing ten thousand Monte Carlo trials looking for the smallest sum of squared error between the model and the data points and then let FindFit converge on the best values that it could find.
I am having a hard time with the question below. I am not sure if I got it correct, but either way, I need some help futher understanding it if anyone has time to explain, please do.
Design L1 and L2 distance functions to assess the similarity of bank customers. Each customer is characterized by the following attributes:
− Age (customer’s age, which is a real number with the maximum age is 90 years and minimum age 15 years)
− Cr (“credit rating”) which is ordinal attribute with values ‘very good’, ‘good, ‘medium’, ‘poor’, and ‘very poor’.
− Av_bal (avg account balance, which is a real number with mean 7000, standard deviation is 4000)
Using the L1 distance function computes the distance between the following 2 customers: c1 = (55, good, 7000) and c2 = (25, poor, 1000). [15 points]
Using the L2 distance function computes the distance between the above mentioned 2 customers
Using the L2 distance function computes the distance between the above mentioned 2 customers.
Answer with L1
d(c1,c2) = (c1.cr-c2.cr)/4 +(c1.avg.bal –c2.avg.bal/4000)* (c1.age-mean.age/std.age)-( c2.age-mean.age/std.age)
The question as is, leaves some room for interpretation. Mainly because similarity is not specified exactly. I will try to explain what the standard approach would be.
Usually, before you start, you want to normalize values such that they are rougly in the same range. Otherwise, your similarity will be dominated by the feature with the largest variance.
If you have no information about the distribution but just the range of the values you want to try to nomalize them to [0,1]. For your example this means
norm_age = (age-15)/(90-15)
For nominal values you want to find a mapping to ordinal values if you want to use Lp-Norms. Note: this is not always possible (e.g., colors cannot intuitively be mapped to ordinal values). In you case you can transform the credit rating like this
cr = {0 if ‘very good’, 1 if ‘good, 2 if ‘medium’, 3 if ‘poor’, 4 if ‘very poor’}
afterwards you can do the same normalization as for age
norm_cr = cr/4
Lastly, for normally distributed values you usually perform standardization by subtracting the mean and dividing by the standard deviation.
norm_av_bal = (av_bal-7000)/4000
Now that you have normalized your values, you can go ahead and define the distance functions:
L1(c1, c2) = |c1.norm_age - c2.norm_age| + |c1.norm_cr - c2.norm_cr |
+ |c1.norm_av_bal - c2.norm_av_bal|
and
L2(c1, c2) = sqrt((c1.norm_age - c2.norm_age)2 + (c1.norm_cr -
c2.norm_cr)2 + (c1.norm_av_bal -
c2.norm_av_bal)2)
So i am trying to build one factor models with stocks and indices in R. I have 30 stocks and 16 indices in total. They are all time series from "2013-1-1" to "2014-12-31". Well at least all my stocks are. All of my indices are missing some entries here and there. For example, all of my stocks' data have the length of 522 but one indice has a length of 250, one 300, another 400 etc. But they all start from "2013-1-1" and end at "2014-12-31". Because my indice data has holes in it, i can't check correlations and build linear models with them. I can't do anything basically. So i need to fill these holes. I am thinking about filling those holes with their mean. But i don't know how to do it.I am open to other ideas of course. Can you help me? It is an important term project for me, so there is a lot on the line...
Edited based upon your comments (and to fix a mistake I made):
This is basic data management and I'm surprised that you're being required to work with timeseries data without knowing how to merge() and how to create dataframes.
Create some fake date and value data with holes in the dates:
dFA <- data.frame(seq.Date(as.Date("2014-01-01"), as.Date("2014-02-28"), 3))
names(dFA) <- "date"
dFA$vals <- rnorm(nrow(dFA), 25, 5)
Create a dataframe of dates from the min value in dFA to the max value in dFA
dFB <- as.data.frame(seq.Date(as.Date(min(dFA$date, na.rm = T), format = "%Y-%m-%d"),
as.Date(max(dFA$date, na.rm = T), format = "%Y-%m-%d"),
1))
names(dFB) <- "date"
Merge the two dataframes together
tmp <- merge(dFB, dFA, by = "date", all = T)
Change NA values in tmp$vals to whatever you want
tmp$vals[is.na(tmp$vals)] <- mean(dFA$vals)
head(tmp)
date vals
1 2014-01-01 18.48131
2 2014-01-02 24.16256
3 2014-01-03 24.16256
4 2014-01-04 28.78855
5 2014-01-05 24.16256
6 2014-01-06 24.16256
Original comment below
The easiest way to fill in the holes is with merge().
Create a new data frame with one vector as a sequence of dates that span the range of your original dataframe and the other vector with whatever you're going to fill the holes (zeroes, means, whatever). Then just merge() the two together:
merge(dFB, dFA, by = [the column with the date values], all = TRUE)
I have a dataset with nodes that are companies linked by transactions.
A company has these properties : name, country, type, creation_date
The relationships "SELLS_TO" have these properties : item, date, amount
All dates are in the following format YYYYMMDD.
I'm trying to find a series of transactions that :
- include 2 companies from 2 distinct countries
- where between the first node in the series and the last one, there is a company that has been created less than 90 days ago
- where the total time between the first transaction and the last transaction is < 15 days
I think I can handle the conditions 1) and 2) but I'm stuck on 3).
MATCH (a:Company)-[r:SELLS_TO]->(b:Company)-[v:SELLS_TO*]->(c:Company)
WHERE NOT(a.country = c.country) AND (b.creation_date + 90 < 20140801)
Basically I don't know how to get the date of the last transaction in the series. Anyone knows how to do that?
jvilledieu,
In answer to your most immediate question, you can access the collections of nodes and relationships in the matched path and get the information you need. The query would look something like this.
MATCH p=(a:Company)-[rs:SELLS_TO*]->(c:Company)
WHERE a.country <> c.country
WITH p, a, c, rs, nodes(p) AS ns
WITH p, a, c, rs, filter(n IN ns WHERE n.creation_date - 20140801 < 90) AS bs
WITH p, a, c, rs, head(bs) AS b
WHERE NOT b IS NULL
WITH p, a, b, c, head(rs) AS r1, last(rs) AS rn
WITH p, a, b, c, r1, rn, rn.date - r1.date AS d
WHERE d < 15
RETURN a, b, c, d, r1, rn
This query finds a chain with at least one :SELLS_TO relationship between :Company nodes and assigns the matched path to 'p'. The match is then limited to cases where the first and last company have different countries. At this point the WITH clauses develop the other elements that you need. The collection of nodes in the path is obtained and named 'ns'. From this, a collection of nodes where the creation date is less than 90 days from the target date is found and named 'bs'. The first node of the 'bs' collection is then found and named 'b', and the match is limited to cases where a 'b' node was found. The first and last relationships are then found and named 'r1' and 'rn'. After this, the difference in their dates is calculated and named 'd'. The match is then limited to cases where d is less than 15.
So that gives you an idea of how to do this. There is another problem though. At least, in the way you have described the problem, you will find that the date math will fail. Dates that are represented as numbers, such as 20140801, are not linear, and thus cannot be used for interval math. As an example, 15 days from 20140820 is 20140904. If you subtract these two date 'numbers', you get 84. One example of how to do this is to represent your dates as days since an epoch date.
Grace and peace,
Jim
I would like to apply a function to all the tuples in an ETS table :
The table is a set, each key appears only once.
My table contains only tuples of the same type:
{Key, X, Y, VX, VY}
All values are ::integer()
What i want to do is have a certain value Elapsed and update all
my tuples with a function apply_vector/2
apply_vector({K, X, Y, 0, 0}, _Elapsed) ->
{K, X, Y, 0, 0};
apply_vector({K, X, Y, VX, VY}, Elapsed) ->
NewX = X + (VX * Elapsed),
NewY = Y + (VY * Elapsed),
{K, NewX, NewY, VX, VY}.
Possible solutions
If i use ets:foldl, my inserts could be traversed during the fold, and
result in an infinite (very long) loop.
I could prepare the new tuples with a ets:foldl, and then insert the whole
list.
I could insert to a new table, then replace the old table by the new,
but i don't want to limit access to the table with calls to a
gen_server, the table must be accessible at any time.
I can not use ets:update_element because I need to read VX and VY values
to update X and Y.
I know there are some iterators utilities but no one seems to allow to
pass a fun.
I need to make this update every 1 - 5 seconds. So, wich solution is
the most efficient with 10 tuples ? With 100 tuples ? With more ?
Thank You !
I Keep a list of boats, the Key is the Boat ID, X and Y are geographic
coordinates, VX and VY represents a vector of movement: a deplacement
for one second. Elapsed is a ratio, a number of seconds since the last
update. The table helps to know at any time the position of each boat.
I probably use ets:foldl for this.
Seems to me like you only need iterative access on this data. So you could just use list of records here.
Another approach (which I would go for) is to put a boat supervisor and create a gen_server for every boat. That way every boat will have its own state and you won't actually need to traverse any list.
Also checkout qlc http://www.erlang.org/doc/man/qlc.html
qlc let's you use list comrehensions on ets or mnesia tables. It will probably have the same performance as foldl though.
There's no great way to do what you are doing.
Recently, I ran in to a similar issue (Updating ~5000 rows, once a second)
Which led me to just take a totally different approach.
So, do you NEED to store the interpreted value? Or can you compute the value upon lookup?
How about storing:
{K,X,Y,VX,VY,LastUpdateTime}
And then doing something like (uncompilable example):
boats:get_all() -> % Record syntax would be smarter here, but it's an example...
ets:foldl(?TABLE,[],fun(Row={Id,_,_,_,_,_},Acc) -> [{Id,calc(Row)}|Acc] end).
boats:get(Id) ->
[Row] = ets:lookup(?TABLE,Id),
calc(Row).
calc({_K,X,Y,VX,VY,LastUpdate}) ->
{X + (VX * (now() - LastUpdate)), Y + (VY * (now() - LastUpdate)}
This allows you not to block on a genserver, but also not to have to update the table every 5 seconds. You'd basically only update the table every time a boat reports in.