Sorry if it's a dumb question, but I am having trouble figuring out how to use pmodels in the drc package. I've searched everywhere online and all I can find is the definition, which is: "a data frame with a many columns as there are parameters in the non-linear function. Or a list containing a formula for each parameter in the nonlinear function." There are examples online, but I have no what it represents. For example, for the commands:
sel.m2 <- drm(dead/total~conc, type, weights=total, data=selenium, fct=LL.2(),
type="binomial", pmodels=list(~1, ~factor(type)-1))
met.as.m1<-drm(gain ~ dose, product, data = methionine, fct = AR.3(),
pmodels = list(~1, ~factor(product), ~factor(product)))
plot(met.as.m1, log = "", ylim = c(1450, 1800))
auxins.m1 <- boxcox(drm(y ~ dose, h, pmodels = data.frame(h, h, 1, h), fct = LL.4(), data = auxins), method = "anova")
I see pmodels as a list and data frame, but what does the "-1"vs "~1" mean or what does it mean to list a factor, what's the significance of the order within the parenthesis?
I agree that it's not well explained for new people. Unfortunately, I can only answer you in part.
A late response but for anyone else:
Two resources are available for reference with drc:
a) The writers published about drc. See main text and supplementary (S3 in this example) DOI:10.1371/journal.pone.0146021
b) See the drc.pdf and ctrl+f for pmodel to inspect the various uses.
data.frame vs. list depends on the grouping level I believe.
After playing around with my data (subsets), I found that
pmodels() = parameter/pooled models
aka how you set those parameters to equal (i.e., global/shared or not).
With your last example using the auxins df
library(drc)
auxins.m1 <- boxcox(drm(y ~ dose, h, pmodels = data.frame(h, h, 1, h),
fct = LL.4(), data = auxins), method = "anova")
## changed names to familiar terms by a non-statistician
auxins.m1 <- boxcox(drm(y ~ dose, h, pmodels = data.frame(h, h, 1, h),
fct = LL.4(names=c("hill.slope","bot","top","ed50"), data = auxins), method = "anova")
Shows that the top is set to 1.
The order is the same as the LL.4(names...)
So if you set
pmodels = data.frame(h, 1, 1, h) ## ("hill.slope","bot","top","ed50")
as they do in the drc.pdf on pg.10, you'll see that it's to set a common/shared bottom and top.
Check out pg.9 of their supplementary article, it shows that for LL.2, the two-parameter logistic fit has pre-set top = 1 and bottom = 0. The output of
selenium.LL.2.2 <- drm(dead/total~conc, type, weights = total,
data = selenium, fct = LL.2(), type="binomial",
pmodels = list(~factor(type)-1, ~1)) ## ("hill-slope", "ed50")
Shows that ed50 is assumed constant.
Alternatively from pg.91 of the drc.pdf:
## Fitting the model with freely varying ED50 values
mecter.free <- drm(rgr ~ dose, pct, data = mecter,
fct = LL.4(), pmodels = list(~1, ~1, ~1, ~factor(pct) - 1))
Unfortunately, it's really not clear what the object-1 means vs. just the object.
A better approach might be to use the base drm() without the special case of LL.#()
Check
getMeanFunctions()
to see all available functions
if you're trying to fix a value at a certain value you can
fct = LL.4(fixed = c(NA,0,1,NA))
## effectively becomes the standard LL.2()
## or
fct = LL.4(fixed = c(1,0,NA,NA))
## common hill slope = 1; assumes baseline correction hence = 0
Related in part; see a lot of drm functions laid out:
https://stackoverflow.com/a/39257095
Related
I am adapting some k-fold cross validation code written for glmer/merMod models to a glmmTMB model framework. All seems well until I try and use the output from the model(s) fit with training data to predict and exponentiate values into a matrix (to then break into quantiles/number of bins to assess predictive performance). I can get get this line to work using glmer models, but it seems when I run the same model using glmmTMB I get Error in model.matrix: requires numeric/complex matrix/vector arguments There are many other posts out there discussing this error code and I have tried converting the data frame into matrix form and changing the class of the covariates with no luck. Separately running the parts before and after the %*% works but when combined I get the error. For context, this code is intended to be run with use/availability data so the example variables may not make sense, but the problem gets shown well enough. Any suggestions as to what is going on?
library(lme4)
library(glmmTMB)
# Example with mtcars dataset
data(mtcars)
# Model both with glmmTMB and lme4
m1 <- glmmTMB(am ~ mpg + wt + (1|carb), family = poisson, data=mtcars)
m2 <- glmer(am ~ mpg + wt + (1|carb), family = poisson, data=mtcars)
#--- K-fold code (hashed out sections are original glmer version of code where different)---
# define variables
k <- 5
mod <- m1 #m2
dt <- model.frame(mod) #data used
reg.list <- list() # initialize object to store all models used for cross validation
# finds the name of the response variable in the model dataframe
resp <- as.character(attr(terms(mod), "variables"))[attr(terms(mod), "response") + 1]
# define column called sets and populates it with character "train"
dt$sets <- "train"
# randomly selects a proportion of the "used"/am records (i.e. am = 1) for testing data
dt$sets[sample(which(dt[, resp] == 1), sum(dt[, resp] == 1)/k)] <- "test"
# updates the original model using only the subset of "trained" data
reg <- glmmTMB(formula(mod), data = subset(dt, sets == "train"), family=poisson,
control = glmmTMBControl(optimizer = optim, optArgs=list(method="BFGS")))
#reg <- glmer(formula(mod), data = subset(dt, sets == "train"), family=poisson,
# control = glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))
reg.list[[i]] <- reg # store models
# uses new model created with training data (i.e. reg) to predict and exponentiate values
predall <- exp(as.numeric(model.matrix(terms(reg), dt) %*% glmmTMB::fixef(reg)))
#predall <- exp(as.numeric(model.matrix(terms(reg), dt) %*% lme4::fixef(reg)))
Without looking at the code too carefully: glmmTMB::fixef(reg) returns a list (with elements cond (conditional model parameters), zi (zero-inflation parameters), disp (dispersion parameters) rather than a vector.
If you replace this bit with glmmTMB::fixef(reg)[["cond"]] it will probably work.
I don't understand the input f expected by play3d and movie3d in the rgl package.
library(rgl)
nobs<-10
x<-runif(nobs)
y<-runif(nobs)
z<-runif(nobs)
n<-rep(1:nobs)
df<-as.data.frame(cbind(x,y,z,n))
listofobs<-split(df,n)
plot3d(df[,1],df[,2],df[,3], type = "n", radius = .2 )
myplotfunction<-function(x) {
rgl.spheres(x=x$x,y=x$y,z=x$z, type="s", r=0.025)
}
When executing the 2 lines below, the animation does play but both lines (play3d() and movie3d()) trigger the error displayed below:
play3d(f=lapply(listofobs,myplotfunction), fps=1 )
movie3d(f=lapply(listofobs,myplotfunction), fps=1 , duration=20)
I am hoping someone can correct my code and help me understand the f input to play3d and movie3d.
Question 1: Why is the play3d line above correct enough that the animation does display correctly?
Question 2: Why is the play3d line above incorrect enough that it triggers the error?
Question 3: What is wrong with the movie3d line that it does not produce a video output?
As the docs say, f is "A function returning a list that may be passed to par3d". It's not a list, which is what your usage passes.
To answer the questions:
R evaluates the lapply call which does the animation, then play3d looks at the result and dies because it's not a function.
f needs to be a function, as described in the help page.
It dies when it looks at f, because it's not a function.
This looks like it will do what you want:
library(rgl)
nobs<-10
x<-runif(nobs)
y<-runif(nobs)
z<-runif(nobs)
df<-data.frame(x,y,z)
plot3d(df, type = "n" )
id <- NA
myplotfunction<-function(time) {
index <- round(time)
# For a 3x faster display, use index <- round(3*time)
# To cycle through the points several times, use
# index <- round(3*time) %% nobs + 1
if (!is.na(id))
pop3d(id = id) # Delete previous item
id <<- spheres3d(df[index,], r=0.025)
list()
}
play3d(myplotfunction, startTime = 1, duration = nobs - 1)
movie3d(myplotfunction, startTime = 1, duration = nobs - 1, fps = 1)
This will leave a GIF in file.path(tempdir(), "movie.gif").
Some other notes:
don't call rgl.spheres. It will cause you immense pain later. Use spheres3d, or never call any *3d function, and never upgrade rgl: you're living in the past using the rgl.* functions. The *3d functions and the rgl.* functions don't play nicely together.
to construct a dataframe, just use the data.frame() function, don't convert
a matrix.
you don't need all those contortions to extract points from the dataframe.
Most rgl functions can handle a dataframe with x, y, and z columns.
You might notice the plot3d frame move a little: spheres are bigger than points, so it will adjust to accommodate them. You could use xlim, ylim and zlim to set the original frame a little bigger if you don't like this.
I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function. In output, I retrieve mean AUCs over test sets and predictions. To do so, after having get some advices (Get predictions on test sets in MLR), I use "makeFeatSelWrapper" function in combination with "resample" function. The goal seems to be achieved but results are strange. With a logistic regression as classifier, I get an AUC of 0.5 which means no variable selected. This result is unexpected as I get an AUC of 0.9824432 with this classifier using the method mentioned in the linked question. With a neural network as classifier, I get an error message
Error in sum(x) : invalid 'type' (list) of argument
What is wrong?
Here is the sample code:
# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################
install.packages("mlbench")
library(mlbench)
data(BreastCancer)
# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
p<-mlbench.waveform(1000)
# convert list into dataframe
dataset<-as.data.frame(p)
# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
# 2. Perform cross validation with embedded feature selection using logistic regression
#######################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.logreg", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of hold-out sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
# 3. Perform cross validation with embedded feature selection using neural network
##################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
If you run your logistic regression part of the code a couple of times, you should also get the Error in sum(x) : invalid 'type' (list) of argument error. However, I find it strange that fixing a particular seed (e.g., set.seed(1)) before resampling does not ensure that the error does or does not appear.
The error occurs in internal mlr code for printing the output of feature selection to the console. A very simple workaround is to simply avoid printing such output with show.info = FALSE in makeFeatSelWrapper (see code below). While this removes the error, it is possible that what caused it may have other consequences, although I it is possible the error only affects the printing code.
When running your code, I only get AUC above 0.90. Please find below a your code for logistic regression, slightly re-organized and with the workaround. I have added a droplevels() to the dataset2 to remove the missing level 3 from the factor, though this is not related with the workaround.
library(mlbench)
library(mlr)
data(BreastCancer)
p<-mlbench.waveform(1000)
dataset<-as.data.frame(p)
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 )
mCT <- makeClassifTask(data =dataset2, target = "classes")
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
mL <- makeLearner("classif.logreg", predict.type = "prob")
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl, show.info = FALSE)
# uncomment this for the error to appear again. Might need to run the code a couple of times to see the error
# lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
Edit: I've reported an issue and created a pull request with a fix.
I have 3 measurements for a machine. Each measurement is trigged every time its value changes by a certain delta.
I have these 3 data sets, represented as Matlab objects: T1, T2 and O. Each of them has a obj.t containing the timestamp values and obj.y containing the measurement values.
I will measure T1 and T2 for a long time, but O only for a short period. The task is to reconstruct O_future from T1 and T2, using the existing values for O for training and validation.
Note that T1.t, T2.t and O.t are not equal, not even their frequency (I might call it 'variable sample rate', but not sure if this name applies).
Is it possible to solve this problem using Matlab or other software? Do I need to resample all data to a common time vector?
Concerning the common time. Below some basic code which does this. (I guess you might know how to do it but just in case). However, the second option might bring you further...
% creating test signals
t1 = 1:2:100;
t2 = 1:3:200;
to = [5 6 100 140];
s1 = round (unifrnd(0,1,size(t1)));
s2 = round (unifrnd(0,1,size(t2)));
o = ones(size(to));
maxt = max([t1 t2 to]);
mint = min([t1 t2 to]);
% determining minimum frequency
frequ = min([t1(2:length(t1)) - t1(1:length(t1)-1) t2(2:length(t2)) - t2(1:length(t2)-1) to(2:length(to)) - to(1:length(to)-1)] );
% create a time vector with highest resolution
tinterp = linspace(mint,maxt,(maxt-mint)/frequ+1);
s1_interp = zeros(size(tinterp));
s2_interp = zeros(size(tinterp));
o_interp = zeros(size(tinterp));
for i = 1: length(t1)
s1_interp(ceil(t1(i))==floor(tinterp)) =s1(i);
end
for i = 1: length(t2)
s2_interp(ceil(t2(i))==floor(tinterp)) =s2(i);
end
for i = 1: length(to)
o_interp(ceil(to(i))==floor(tinterp)) = o(i);
end
figure,
subplot 311
hold on, plot(t1,s1,'ro'), plot(tinterp,s1_interp,'k-')
legend('observation','interpolation')
title ('signal 1')
subplot 312
hold on, plot(t2,s2,'ro'), plot(tinterp,s2_interp,'k-')
legend('observation','interpolation')
title ('signal 2')
subplot 313
hold on, plot(to,o,'ro'), plot(tinterp,o_interp,'k-')
legend('observation','interpolation')
title ('O')
Its not ideal as for large vectors this might become ineffective as soon as you have small sampling frequencies in one of the signals which will determine the lowest resolution.
Another option would be to define a coarser time vector and look at the number of events that happend in a certain period which might have some predictive power as well (not sure about your setup).
The structure would be something like
coarse_t = 1:5:100;
s1_coarse = zeros(size(coarse_t));
s2_coarse = zeros(size(coarse_t));
o_coarse = zeros(size(coarse_t));
for i = 2:length(coarse_t)
s1_coarse(i) = sum(nonzeros(s1(t1<coarse_t(i) & t1>coarse_t(i-1))));
s2_coarse(i) = sum(nonzeros(s2(t2<coarse_t(i) & t2>coarse_t(i-1))));
o_coarse(i) = sum(nonzeros(o(to<coarse_t(i) & to>coarse_t(i-1))));
end
An API's response JSON string contain the following:
"user_index": "7.88413374111.4681.4.7U84.76Q4657.469U.461654.8114.15Q......"
It seems to be used to render a chart on a web page. What's the data structure?
Edit: added more information
The source URL is: index.baidu.com. Unfortunately, you have to register a (free) account and log in to use the site.
What I want is to get the data shown in the image below
When you hover over the chart, you will get a score of that day. This chart describes trends of Taylor Swift.
The XHR response data I made a gist here
This chart seems to be using SVG.
I'll start with a link to jsfiddle that decodes Taylor Swift data that you had in the sample request:
http://jsfiddle.net/pFV7p/2/ (start with the end of the output to compare to last dates on the chart).
Here is how the chart looked to me:
So I've spent some time investigating how it works (I find decoding stuff really exciting challenge :) ) and I have come to some conclusions which may help you to decode the data.
1) Pay attention to the file fun.js?140221 from
http://index.baidu.com/static/js/funs.js?140221
2) After de-obfuscation, find these two functions in the source:
function b(p, m, o) {
var l = c(m || "tyufCBJKQas", o || k || a);
p = p.split("");
for (var n = p.length; n--; ) {
p[n] = l[p[n]]
}
return p.join("")
}
function c(m, o) {
m = m || "k";
o = ((o || "") + m).split("");
var p = m.length;
var q = 0;
var l = {};
for (var n = 0; n < o.length; n++) {
if (!(o[n] in l)) {
l[o[n]] = q;
if (++q >= p) {
l[o[n]] = ",";
break
}
}
}
return l
}
They do the decoding stuff. Function c builds the "alphabet" and function b does the actual decoding. One of the keys for decoding is "tyufCBJKQas" and the other two are in variables k and a. Basically to decode the userIndexes you'd need to know what key should be in the variable k (because if k is set, a is not used, and as far as I tested - k is always set).
3) Pay attention to url used when requesting the data.
The request is sent in similarly to this one (which might be expired by now):
http://index.baidu.com/Interface/Search/getAllIndex/?res=gxIWXSIBDCczDANlWGpCCUUQYXYgADdtLlgmdxc2IzEjVA9OPwEAPGUWGhEKJC8AIDwtDzMeKF5QAEsvLg4PaEYlKyRmGQY7AlcmLg4%2BSTsTAjUwYz0TQSQcLR04JhExfFRIKUs0LENiVAQvIiAhECAmZSFKKzAKZ10pAVkVGnJXCQcmPnhuQVgHLXwnGAtmBTczdDF%2Bf0EqDAcMOCQgdB5EGiQMHXwzB2FVFXkxNTwkNHgwAFx2AhItDFUEW1UEckcgZn08AR5mfxUBPw%3D%3D&res2=12EXSTREXSTR4.962418.408N0QI0bMRC6i2ZfuiaiBEsFk3dWZSOOR7rUjv107q2CvEc6L5cZ
In this case parameter res2 equals to "12EXSTREXSTR4.962418.408N0QI0bMRC6i2ZfuiaiBEsFk3dWZSOOR7rUjv107q2CvEc6L5cZ" That is important because before mentioned variable a tends to be equal to res2 parameter.
Too bad we need variable k to decode, which often is very different from a, but sometimes it is quite similar, as in this case. For this request it was "12EXSTR4.962418.408N0QI0bMRC6i2ZfuiaiBEsFk3dWZSOOR7rUjv107q2CvEc6L5cZ". Which means that perhaps variable k can also be calculated from the request itself.
4) To create working solution that can decode the data every time, you just need to find where variable k is assigned and what is the algorithm for it's value. When you do that - use similar code to what I have in the fiddle with the right value for k and userIndex.
NOTE: Considering the fact that baidu has implemented some custom data encoding/decoding algorithm I suspect that they are not very happy by the fact that someone would use their engine directly, so perhaps you should pay attention to whether you break their user agreement, usage rules, etc and proceed accordingly.
Most probably they are using a javascript chart library (e.g. flot) and updating their data using an ajax.
Data structure should hold labels and other information. Control that page and see which charting library they use. You should be able to see network calls using your browser's developer tools.