Related
Image one file with 250 datasets with varying length (2000 +-500) lines and 11 columns. Here a comprehensive small example:
file.sum:
0.00000e+00 9.51287e-09
1.15418e-04 8.51287e-09
4.16445e-04 7.51287e-09
8.53721e-04 6.51287e-09
1.42697e-03 5.51287e-09
1.70302e-03 4.51287e-09
2.27189e-03 3.51287e-09
2.54732e-03 1.51287e-09
3.11304e-03 0.51287e-09
0.00000e+00 13.28378e-09
1.15418e-04 12.28378e-09
3.19663e-04 11.28378e-09
5.78178e-04 10.28378e-09
8.67479e-04 09.28378e-09
1.20883e-03 08.28378e-09
1.58817e-03 07.28378e-09
1.75840e-03 06.28378e-09
2.21069e-03 05.28378e-09
I wanted to display every 10 datasets and normalize it to the first element. The first value to normalize is 9.51287e-09 and the second would be 13.28378e-09. Of course with this massive dataset, I can not do it manually or even split the file.
So far I got every ten'th dataset but with the normalization, I do have my problems.
#!/usr/bin/gnuplot
reset
set xrange [0:0.1]
plot for [val=1:250:10] 'file.sum' i val u 1:11 w l
Working of this example:
plot.gp:
#!/usr/bin/gnuplot
reset
set xrange [0:0.01]
plot for [val=1:2:1] 'file.sum' i val u 1:2 w l
Some hints I found in:
Gnuplot: data normalization
I guess you can write a awk script to handle this, but there may be a more gnuplot friendlier way. Any suggestions are appreciated.
Assuming you have one file with data sections each separated by two or more empty lines you can use the script below.
In gnuplot console check help pseudocolumns. column(-2) tells you in which block you are and column(0) tells you wich line of this block you are (counting starts from 0).
Define a function Normalized(n) which does the following: if you are in the first line of a subblock put the value of column(n) into the variable y0. All values of this block will now be divided by y0. Also check help ternary.
In case you want a legend for the blocks you can plot a dummy plot, actually plotting NaN (i.e. nothing) but place an entry for the key.
Code:
### normalize each block by its first value
reset session
set colorsequence classic
$Data <<EOD
0.00000e+00 9.51287e-09
1.15418e-04 8.51287e-09
4.16445e-04 7.51287e-09
8.53721e-04 6.51287e-09
1.42697e-03 5.51287e-09
1.70302e-03 4.51287e-09
2.27189e-03 3.51287e-09
2.54732e-03 1.51287e-09
3.11304e-03 0.51287e-09
0.00000e+00 13.28378e-09
1.15418e-04 12.28378e-09
3.19663e-04 11.28378e-09
5.78178e-04 10.28378e-09
8.67479e-04 09.28378e-09
1.20883e-03 08.28378e-09
1.58817e-03 07.28378e-09
1.75840e-03 06.28378e-09
2.21069e-03 05.28378e-09
EOD
Normalized(n) = column(n)/(column(0)==0 ? y0=column(n) : y0)
plot $Data u 1:(Normalized(2)):(myBlocks=column(-2)+1) w lp pt 7 lc var notitle, \
for [i=0:myBlocks-1] '' u 1:(NaN) w lp pt 7 lc i+1 ti sprintf("Block %d",i)
### end of code
Result:
I have read a lot about the pain of replicate the easy robust option from STATA to R to use robust standard errors. I replicated following approaches: StackExchange and Economic Theory Blog. They work but the problem I face is, if I want to print my results using the stargazer function (this prints the .tex code for Latex files).
Here is the illustration to my problem:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
stargazer(reg1)
This prints the R output as .tex code (non-robust SE) If i want to use robust SE, i can do it with the sandwich package as follow:
vcov <- vcovHC(reg1, "HC1")
if I now use stargazer(vcov) only the output of the vcovHC function is printed and not the regression output itself.
With the package lmtest() it is possible to print at least the estimator, but not the observations, R2, adj. R2, Residual, Residual St.Error and the F-Statistics.
lmtest::coeftest(reg1, vcov. = sandwich::vcovHC(reg1, type = 'HC1'))
This gives the following output:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.54923 6.85521 -0.3719 0.710611
id 0.39634 0.12376 3.2026 0.001722 **
source 1.48164 4.20183 0.3526 0.724960
country -4.00398 4.00256 -1.0004 0.319041
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How can I add or get an output with the following parameters as well?
Residual standard error: 17.43 on 127 degrees of freedom
Multiple R-squared: 0.09676, Adjusted R-squared: 0.07543
F-statistic: 4.535 on 3 and 127 DF, p-value: 0.00469
Did anybody face the same problem and can help me out?
How can I use robust standard errors in the lm function and apply the stargazer function?
You already calculated robust standard errors, and there's an easy way to include it in the stargazeroutput:
library("sandwich")
library("plm")
library("stargazer")
data("Produc", package = "plm")
# Regression
model <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc,
index = c("state","year"),
method="pooling")
# Adjust standard errors
cov1 <- vcovHC(model, type = "HC1")
robust_se <- sqrt(diag(cov1))
# Stargazer output (with and without RSE)
stargazer(model, model, type = "text",
se = list(NULL, robust_se))
Solution found here: https://www.jakeruss.com/cheatsheets/stargazer/#robust-standard-errors-replicating-statas-robust-option
Update I'm not so much into F-Tests. People are discussing those issues, e.g. https://stats.stackexchange.com/questions/93787/f-test-formula-under-robust-standard-error
When you follow http://www3.grips.ac.jp/~yamanota/Lecture_Note_9_Heteroskedasticity
"A heteroskedasticity-robust t statistic can be obtained by dividing an OSL estimator by its robust standard error (for zero null hypotheses). The usual F-statistic, however, is invalid. Instead, we need to use the heteroskedasticity-robust Wald statistic."
and use a Wald statistic here?
This is a fairly simple solution using coeftest:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
cl_robust <- coeftest(reg1, vcov = vcovCL, type = "HC1", cluster = ~
country)
se_robust <- cl_robust[, 2]
stargazer(reg1, reg1, cl_robust, se = list(NULL, se_robust, NULL))
Note that I only included cl_robust in the output as a verification that the results are identical.
Basically, I have solved the heat equation for (x,y,t) and I want to show the variation of the temperature function with time.The program was written in Fortran 90 and the solution data was stored in a file diffeqn3D_file.txt.
This is the program:
Program diffeqn3D
Implicit none
Integer:: b,c,d,l,i,j,k,x,y,t
Real:: a,r,s,h,t1,k1,u,v,tt,p
Real,Dimension(0:500,0:500,0:500):: f1 !f=f(x,t)
!t1=time step and h=position step along x and
!k=position step along y and a=conductivity
open(7, file='diffeqn3D_file.txt', status='unknown')
a=0.024
t1=0.1
h=0.1
k1=0.1
r=(h**2)/(k1**2)
s=(h**2)/(a*t1)
l=10
tt=80.5
b=100
c=100
d=100
!The temperature is TT at x=0 and 0 at x=l.
!The rod is heated along the line x=0.
!Initial conditions to be changed as per problem..
Do x=0,b
Do y=0,c
Do t=0,d
If(x==0) Then
f1(x,y,t)=tt
Else If((x.ne.0).and.t==0) Then
f1(x,y,t)=0
End If
End Do
End Do
End Do
print *,f1(9,7,5)
print *,r
print *,a,h,t1,h**2,a*t1,(h**2)/(a*t1)
print *,f1(0,1,1)
print *,f1(3,1,1)
!num_soln_of_eqnwrite(7,*)
Do t=1,d
Do y=1,c-1
Do x=1,b-1
p=f1(x-1,y,t-1)+f1(x+1,y,t-1)+r*f1(x,y-1,t-1)+r*f1(x,y+1,t-1)-(2+2*r-s)*f1(x,y,t-1)
f1(x,y,t)=p/s
!f1(x,t)=0.5*(f1(x-1,t-1)+f1(x+1,t-1))
!print *,f1(x,t),b
End Do
End Do
End Do
Do i=0,d
Do k=0,b
Do j=0,c
u=k*h
v=j*k1
write(7,*) u,v,f1(k,j,i)
End Do
End Do
write(7,*) " "
write(7,*) " "
End Do
close(7)
End Program diffeqn3D
And after compilation and run, I enter the following code in gnuplot but it does not run, rather it hangs up or creates a gif picture, not animation.
set terminal gif animate delay 1
set output 'diffeqn3D.gif'
stats 'diffeqn3D_file.txt' nooutput
do for [i=1:int(STATS_blocks)] {
splot 'diffeqn3D_file.txt'
}
Sometimes it also puts up a warning message, citing no z-values for autoscale range.
What is wrong with my code and how should I proceed?
First, try to add some print commands for "debug" information:
set terminal gif animate delay 1
set output 'diffeqn3D.gif'
stats 'diffeqn3D_file.txt' nooutput
print int(STATS_blocks)
do for [i=1:int(STATS_blocks)] {
print i
splot 'diffeqn3D_file.txt'
}
Second, what happens?
The splot command does not have an index specifier, try to use:
splot 'diffeqn3D_file.txt' index i
Without the index i gnuplots always plots the whole file which has two consequences:
The data file is quite large. Plotting takes quite a long time and it seems that gnuplot hangs.
Gnuplot plots always the same data, there are no changes which show up in an animation.
Now gnuplot runs much faster and we will fix the autoscale error. Again, there are two points:
The index specifies a data set within the data file. The stats command counts those sets which "are separated by pairs of blank records" (from gnuplot documentation). Your data file ends with a pair of blank records - this starts a new data set in gnuplot. But this data set is empty which finally leads to the error. There are only STATS_blocks-1 data sets.
The index is zero based. The loop should start with 0 and end at STATS_blocks-2.
So we arrive at this plot command:
do for [i=0:int(STATS_blocks)-2] {
print i
splot 'diffeqn3D_file.txt' index i
}
I came cross a function of graphing cumulative return of a strategy and the peaks of the return in a great example of combining shiny and quantstrat, thanks to Simon Otziger. The source code is here. The code works fine most of time, but for some data it won't graph the peaks properly.
The code is simplified but the key logic is not changed. I ran the code with three set of data (cumPNL1, cumPNL2, cumPNL3) copied from three example strategies, in which the first data will cause the code to fail to graph peaks properly.
I ran the following codes with cumPNL1, cumPNL2, cumPNL3 separately. with both cumPNL2 and cumPNL3 the code can produce cumulative return line and peak points successfully. however, with cumPNL1 the code can only produce line, but peaks are not at the right positions.
I noticed that both peakIndex based on cumPNL2 and cumPNL3 have their first value being TRUE, so when I change the code by adding a line peakIndex[1] <- TRUE, cumPNL1 will work fine with the modified code.
Though now it works with modified code, I have no idea why it is behaving like this. Could anyone have a look? Thanks
cumPNL1 <- c(-193,-345,-406,-472,-562,-543,-450,-460,-544,-659,-581,-342,-384,276,-858,-257.99)
cumPNL2 <- c(35.64,4.95,-2.97,-6.93,11.88,-19.8,-26.73,-39.6,-49.5,-50.49,-51.48,-48.51,-50.49,-55.44,143.55,770.22,745.47,691.02,847.44,1141.47,1007.82,1392.93,1855.26,1863.18,2536.38,2778.93,2811.6,2859.12,2417.58)
cumPNL3 <- c(35.64,4.95,-2.97,-6.93,11.88,-19.8,-26.73,-39.6,-49.5,-50.49,-51.48,-48.51,-50.49,-55.44,143.55,770.22,745.47,691.02,847.44,1141.47,1007.82,1392.93,1855.26,1863.18,2536.38,2778.93,2811.6,2859.12,2417.58)
peakIndex <- c(cumPNL3[1] > 0, diff(cummax(cumPNL3)) > 0)
# peakIndex[1] <- TRUE
dev.new()
plot(cumPNL3, type='n', xlab="index of trades", ylab="returns in cash", main="cumulative returns and peaks")
grid()
lines(cumPNL3)
points(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex, ],
pch=19, col='green', cex=0.6)
legend(
x='bottomright', inset=0.1,
legend=c('Net Profit','Peaks'),
lty=c(1, NA), pch=c(NA, 19),
col=c('black','green')
)
cumPNL1 has a single peak and R reduces the dimension from a numerical matrix to a numerical vector of length 2. The points function plots the two numerical vector values on the y-axis using the x-axis index 1 and 2:
peakIndex1 <- c(cumPNL1[1] > 0, diff(cummax(cumPNL1)) > 0)
peakIndex3 <- c(cumPNL3[1] > 0, diff(cummax(cumPNL3)) > 0)
str(cbind(1 : length(cumPNL1), cumPNL1)[peakIndex1,])
str(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex3,])
Output:
> str(cbind(1 : length(cumPNL1), cumPNL1)[peakIndex1,])
num [1:12, 1:2] 1 15 16 19 20 22 23 24 25 26 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "" "cumPNL1"
> str(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex3,])
Named num [1:2] 14 276
- attr(*, "names")= chr [1:2] "" "cumPNL3"
Usually setting plot = FALSE preserves the object, e.g., str(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex3, drop = FALSE]), which somehow does not work in this case. However, changing the points line to the following fixes the problem:
points(seq_along(cumPNL3)[peakIndex], cumPNL3[peakIndex], pch = 19,
col = 'green', cex = 0.6)
Thanks for reporting the issue. I will push the fix to GitHub tomorrow.
I am using the dprint package with knitr , mainly so that I can highlight rows from a table, which I have got working, but the output image leaves a fairly large space for a footnote, and it is taking up unnecessary space.
Is there away to get rid of it?
Also since I am fairly new to dprint, if anybody has better ideas/suggestions as to how to highlight tables and make them look pretty without any footnotes... or ways to tidy up my code that would be great!
An example of the Rmd file code is below...
```{r fig.height=10, fig.width=10, dev='jpeg'}
library("dprint")
k <- data.frame(matrix(1:100, 10,10))
CBs <- style(frmt.bdy=frmt(fontfamily="HersheySans"), frmt.tbl=frmt(bty="o", lwd=1),
frmt.col=frmt(fontfamily="HersheySans", bg="khaki", fontface="bold", lwd=2, bty="_"),
frmt.grp=frmt(fontfamily="HersheySans",bg="khaki", fontface="bold"),
frmt.main=frmt(fontfamily="HersheySans", fontface="bold", fontsize=12),
frmt.ftn=frmt(fontfamily="HersheySans"),
justify="right", tbl.buf=0)
x <- dprint(~., data=k,footnote=NA, pg.dim=c(10,10), margins=c(0.2,0.2,0.2,0.2),
style=CBs, row.hl=row.hl(which(k[,1]==5), col='red'),
fit.width=TRUE, fit.height=TRUE,
showmargins=TRUE, newpage=TRUE, main="TABLE TITLE")
```
Thanks in advance!
I haven't used dprint before, but I see a couple of different things that might be causing problems:
The start of your code chunk has defined the image width and height, which dprint seems to be trying to use.
You are setting both fit.height and fit.width. I think only one of those is used (in other words, the resulting image isn't stretched to fit both height and width, but only the one that seems to make most sense, in this case, width).
After tinkering around for a minute, here's what I did that minimizes the footnote. However, I don't know if there is a more efficient way to do this.
```{r dev='jpeg'}
library("dprint")
k <- data.frame(matrix(1:100, 10,10))
CBs <- style(frmt.bdy=frmt(fontfamily="HersheySans"),
frmt.tbl=frmt(bty="o", lwd=1),
frmt.col=frmt(fontfamily="HersheySans", bg="khaki",
fontface="bold", lwd=2, bty="_"),
frmt.grp=frmt(fontfamily="HersheySans",bg="khaki",
fontface="bold"),
frmt.main=frmt(fontfamily="HersheySans", fontface="bold",
fontsize=12),
frmt.ftn=frmt(fontfamily="HersheySans"),
justify="right", tbl.buf=0)
x <- dprint(~., data=k, style=CBs, pg.dim = c(7, 4.5),
showmargins=TRUE, newpage=TRUE,
main="TABLE TITLE", fit.width=TRUE)
```
Update
Playing around to determine the sizes of the images is a total drag. But, if you run the code in R and look at the structure of x, you'll find the following:
str(x)
# List of 3
# $ cord1 : num [1:2] 0.2 6.8
# $ cord2 : Named num [1:2] 3.42 4.78
# ..- attr(*, "names")= chr [1:2] "" ""
# $ pagenum: num 2
Or, simply:
x$cord2
# 3.420247 4.782485
These are the dimensions of your resulting image, and this information can probably easily be plugged into a function to make your plots better.
Good luck!
So here's my solution...with some examples...
I've just copied and pasted my Rmd file to demonstrate how to use it.
you should be able to just copy and paste it into a blank Rmd file and then knit to HTML to see the results...
Ideally what I would have liked would have been to make it all one nice neat function rather than splitting it up into two (i.e. setup.table & print.table) but since chunk options can't be changed mid chunk as suggested by Yihui, it had to be split up into two functions...
`dprint` + `knitr` Examples to create table images
===========
```{r}
library(dprint)
# creating the sytle object to be used
CBs <- style(frmt.bdy=frmt(fontfamily="HersheySans"),
frmt.tbl=frmt(bty="o", lwd=1),
frmt.col=frmt(fontfamily="HersheySans", bg="khaki",
fontface="bold", lwd=2, bty="_"),
frmt.grp=frmt(fontfamily="HersheySans",bg="khaki",
fontface="bold"),
frmt.main=frmt(fontfamily="HersheySans", fontface="bold",
fontsize=12),
frmt.ftn=frmt(fontfamily="HersheySans"),
justify="right", tbl.buf=0)
# creating a setup function to setup printing a table (will probably put this function into my .Rprofile file)
setup.table <- function(df,width=10, style.obj='CBs'){
require(dprint)
table.style <- get(style.obj)
a <- tbl.struct(~., df)
b <- char.dim(a, style=table.style)
p <- pagelayout(dtype = "rgraphics", pg.dim = NULL, margins = NULL)
f <- size.simp(a[[1]], char.dim.obj=b, loc.y=0, pagelayout=p)
# now to work out the natural table width to height ratio (w.2.h.r) GIVEN the style
w.2.h.r <- as.numeric(f$tbl.width/(f$tbl.height +b$linespace.col+ b$linespace.main))
height <- width/w.2.h.r
table.width <- width
table.height <- height
# Setting chunk options to have right fig dimensions for the next chunk
opts_chunk$set('fig.width'=as.numeric(width+0.1))
opts_chunk$set('fig.height'=as.numeric(height+0.1))
# assigning relevant variables to be used when printing
assign("table.width",table.width, envir=.GlobalEnv)
assign("table.height",table.height, envir=.GlobalEnv)
assign("table.style", table.style, envir=.GlobalEnv)
}
# function to print the table (will probably put this function into my .Rprofile file as well)
print.table <- function(df, row.2.hl='2012-04-30', colour='lightblue',...) {
x <-dprint(~., data=df, style=table.style, pg.dim=c(table.width,table.height), ..., newpage=TRUE,fit.width=TRUE, row.hl=row.hl(which(df[,1]==row.2.hl), col=colour))
}
```
```{r}
# Giving it a go!
# Setting up two differnt size tables
small.df <- data.frame(matrix(1:100, 10,10))
big.df <- data.frame(matrix(1:800,40,20))
```
```{r}
# Using the created setup.table function
setup.table(df=small.df, width=10, style.obj='CBs')
```
```{r}
# Using the print.table function
print.table(small.df,4,'lightblue',main='table title string') # highlighting row 4
```
```{r}
setup.table(big.df,13,'CBs') # now setting up a large table
```
```{r}
print.table(big.df,38,'orange', main='the big table!') # highlighting row 38 in orange
```
```{r}
d <- style() # the default style this time will be used
setup.table(big.df,15,'d')
```
```{r}
print.table(big.df, 23, 'indianred1') # this time higlihting row 23
```