Remove "Statistics" Header in Stargazer Summary Stat - stargazer

I'm trying to use stargazer to output my summary stat, but I was wondering if there was any way to remove the "Statistic" label at the top left of the output. Thanks!
stargazer(data,
summary.stat = c("mean", "median", "sd", "min", "max"),
type="text")
==================================================================
Statistic Mean Median St. Dev. Min Max
------------------------------------------------------------------

This is not an elegant solution, but it works for my purposes when I needed to make modifications to stargazer output.
I output the stargazer output (text, latex or html) into a variable, then manipulate it with stringr::.
Like I said, it's not an elegant solution, but for me, it definitely works.
library(stringr)
table_1 <- stargazer(mtcars, summary.stat = c("mean", "median", "sd", "min", "max"),
type="text")
table_1 %>% str_replace("Statistic", " ")
gives:
[2] "==============================================="
[3] " Mean Median St. Dev. Min Max "
[4] "-----------------------------------------------"
[5] "mpg 20.091 19.2 6.027 10 34 "
As I mentioned, it does work for html and latex output, too.

Related

how to sort a list of doubles according to their names in R

I am trying to write a function to calculate R1 lexical richness measure. The formula is as follows:
R1 = 1 - ( F(h) - h*h/2N) )
where N is the number of tokens, h is the Hirsch point, and F(h) is the cumulative relative frequencies up to that point. my actual data is in the same format as the data below:
txt <- list(
a = c("The truck driver whose runaway vehicle rolled into the path of an express train and caused one of Taiwan’s worst ever rail disasters has made a tearful public apology.", "The United States is committed to advancing prosperity, security, and freedom for both Israelis and Palestinians in tangible ways in the immediate term, which is important in its own right, but also as a means to advance towards a negotiated two-state solution.","The 49-year-old is part of a team who inspects the east coast rail line for landslides and other risks.", "We believe that this UN agency for so-called refugees should not exist in its current format.","His statement comes amid an ongoing investigation into the crash, with authorities saying the train driver likely had as little as 10 seconds to react to the obstruction.", " The US president accused Palestinians of lacking “appreciation or respect.", "To create my data I had to chunk each text in an increasing manner.", "Therefore, the input is a list of chunked texts within another list.","We plan to restart US economic, development, and humanitarian assistance for the Palestinian people,” the secretary of state, Antony Blinken, said in a statement.", "The cuts were decried as catastrophic for Palestinians’ ability to provide basic healthcare, schooling, and sanitation, including by prominent Israeli establishment figures.","After Donald Trump’s row with the Palestinian leadership, President Joe Biden has sought to restart Washington’s flailing efforts to push for a two-state resolution for the Israel-Palestinian crisis, and restoring the aid is part of that.")
)
library(quanteda)
DFMs <- lapply(txt, dfm)
txt_freq <- function(x) textstat_frequency(x, groups = docnames(x), ties_method = "first")
Fs <- lapply(DFMs, txt_freq)
get_h_point <- function(DATA) {
fn_interp <- approxfun(DATA$rank, DATA$frequency)
fn_root <- function(x) fn_interp(x) - x
uniroot(fn_root, range(DATA$rank))$root
}
s_p <- function(x){split(x,x$group)}
tstat_by <- lapply(Fs, s_p)
h_values <-lapply(tstat_by, vapply, get_h_point, double(1))
str(tstat_by)
str(h_values)
F <- list()
R <- list()
temp <- list()
for( Ls in names(tstat_by) ){
for (item in names(h_values[[Ls]]) ){
temp[[Ls]][[item]] <- subset(tstat_by[[Ls]][[item]], rank <= h_values[[Ls]][[item]])
F[[Ls]][[item]] <- sum(temp[[Ls]][[item]]$frequency) / sum(tstat_by[[Ls]][[item]]$frequency)
R[[Ls]][[item]] <- 1 - ( F[[Ls]][[item]] -
h_values[[Ls]][[item]] ^ 2 /
2 * sum(tstat_by[Ls][[item]]$frequency) )
}}
I have the value I need stored in a list but in the wrong order. here is what the for loop produces:
names(R[["a"]])
[1] "text1" "text10" "text11" "text2" "text3" "text4" "text5" "text6" "text7"
[10] "text8" "text9"
but I need it to be in this natural order:
names(R[["a"]])
[1] "text1" "text2" "text3" "text4" "text5" "text6" "text7" "text8" "text9"
[10] "text10" "text11"
so the question is how do I get the values sorted based on the names they have—the numeric parts of the names need to be in order.
Order them by the integer values in the element names, after stripping the "text" part.
> R$a <- R$a[order(as.integer(gsub("text", "", names(R$a))))]
> R$a
$text1
[1] 0.8666667
$text2
[1] 0.8510638
$text3
[1] 0.9
$text4
[1] 0.9411765
$text5
[1] 0.8333333
$text6
[1] 0.9166667
$text7
[1] 0.8666667
$text8
[1] 0.8571429
$text9
[1] 0.7741935
$text10
[1] 0.8888889
$text11
[1] 0.8717949

Robust Standard Errors in lm() using stargazer()

I have read a lot about the pain of replicate the easy robust option from STATA to R to use robust standard errors. I replicated following approaches: StackExchange and Economic Theory Blog. They work but the problem I face is, if I want to print my results using the stargazer function (this prints the .tex code for Latex files).
Here is the illustration to my problem:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
stargazer(reg1)
This prints the R output as .tex code (non-robust SE) If i want to use robust SE, i can do it with the sandwich package as follow:
vcov <- vcovHC(reg1, "HC1")
if I now use stargazer(vcov) only the output of the vcovHC function is printed and not the regression output itself.
With the package lmtest() it is possible to print at least the estimator, but not the observations, R2, adj. R2, Residual, Residual St.Error and the F-Statistics.
lmtest::coeftest(reg1, vcov. = sandwich::vcovHC(reg1, type = 'HC1'))
This gives the following output:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.54923 6.85521 -0.3719 0.710611
id 0.39634 0.12376 3.2026 0.001722 **
source 1.48164 4.20183 0.3526 0.724960
country -4.00398 4.00256 -1.0004 0.319041
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How can I add or get an output with the following parameters as well?
Residual standard error: 17.43 on 127 degrees of freedom
Multiple R-squared: 0.09676, Adjusted R-squared: 0.07543
F-statistic: 4.535 on 3 and 127 DF, p-value: 0.00469
Did anybody face the same problem and can help me out?
How can I use robust standard errors in the lm function and apply the stargazer function?
You already calculated robust standard errors, and there's an easy way to include it in the stargazeroutput:
library("sandwich")
library("plm")
library("stargazer")
data("Produc", package = "plm")
# Regression
model <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc,
index = c("state","year"),
method="pooling")
# Adjust standard errors
cov1 <- vcovHC(model, type = "HC1")
robust_se <- sqrt(diag(cov1))
# Stargazer output (with and without RSE)
stargazer(model, model, type = "text",
se = list(NULL, robust_se))
Solution found here: https://www.jakeruss.com/cheatsheets/stargazer/#robust-standard-errors-replicating-statas-robust-option
Update I'm not so much into F-Tests. People are discussing those issues, e.g. https://stats.stackexchange.com/questions/93787/f-test-formula-under-robust-standard-error
When you follow http://www3.grips.ac.jp/~yamanota/Lecture_Note_9_Heteroskedasticity
"A heteroskedasticity-robust t statistic can be obtained by dividing an OSL estimator by its robust standard error (for zero null hypotheses). The usual F-statistic, however, is invalid. Instead, we need to use the heteroskedasticity-robust Wald statistic."
and use a Wald statistic here?
This is a fairly simple solution using coeftest:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
cl_robust <- coeftest(reg1, vcov = vcovCL, type = "HC1", cluster = ~
country)
se_robust <- cl_robust[, 2]
stargazer(reg1, reg1, cl_robust, se = list(NULL, se_robust, NULL))
Note that I only included cl_robust in the output as a verification that the results are identical.

volemont/insights:chart.EquityCurve.R: a bug in graphing peaks of cumulative return?

I came cross a function of graphing cumulative return of a strategy and the peaks of the return in a great example of combining shiny and quantstrat, thanks to Simon Otziger. The source code is here. The code works fine most of time, but for some data it won't graph the peaks properly.
The code is simplified but the key logic is not changed. I ran the code with three set of data (cumPNL1, cumPNL2, cumPNL3) copied from three example strategies, in which the first data will cause the code to fail to graph peaks properly.
I ran the following codes with cumPNL1, cumPNL2, cumPNL3 separately. with both cumPNL2 and cumPNL3 the code can produce cumulative return line and peak points successfully. however, with cumPNL1 the code can only produce line, but peaks are not at the right positions.
I noticed that both peakIndex based on cumPNL2 and cumPNL3 have their first value being TRUE, so when I change the code by adding a line peakIndex[1] <- TRUE, cumPNL1 will work fine with the modified code.
Though now it works with modified code, I have no idea why it is behaving like this. Could anyone have a look? Thanks
cumPNL1 <- c(-193,-345,-406,-472,-562,-543,-450,-460,-544,-659,-581,-342,-384,276,-858,-257.99)
cumPNL2 <- c(35.64,4.95,-2.97,-6.93,11.88,-19.8,-26.73,-39.6,-49.5,-50.49,-51.48,-48.51,-50.49,-55.44,143.55,770.22,745.47,691.02,847.44,1141.47,1007.82,1392.93,1855.26,1863.18,2536.38,2778.93,2811.6,2859.12,2417.58)
cumPNL3 <- c(35.64,4.95,-2.97,-6.93,11.88,-19.8,-26.73,-39.6,-49.5,-50.49,-51.48,-48.51,-50.49,-55.44,143.55,770.22,745.47,691.02,847.44,1141.47,1007.82,1392.93,1855.26,1863.18,2536.38,2778.93,2811.6,2859.12,2417.58)
peakIndex <- c(cumPNL3[1] > 0, diff(cummax(cumPNL3)) > 0)
# peakIndex[1] <- TRUE
dev.new()
plot(cumPNL3, type='n', xlab="index of trades", ylab="returns in cash", main="cumulative returns and peaks")
grid()
lines(cumPNL3)
points(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex, ],
pch=19, col='green', cex=0.6)
legend(
x='bottomright', inset=0.1,
legend=c('Net Profit','Peaks'),
lty=c(1, NA), pch=c(NA, 19),
col=c('black','green')
)
cumPNL1 has a single peak and R reduces the dimension from a numerical matrix to a numerical vector of length 2. The points function plots the two numerical vector values on the y-axis using the x-axis index 1 and 2:
peakIndex1 <- c(cumPNL1[1] > 0, diff(cummax(cumPNL1)) > 0)
peakIndex3 <- c(cumPNL3[1] > 0, diff(cummax(cumPNL3)) > 0)
str(cbind(1 : length(cumPNL1), cumPNL1)[peakIndex1,])
str(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex3,])
Output:
> str(cbind(1 : length(cumPNL1), cumPNL1)[peakIndex1,])
num [1:12, 1:2] 1 15 16 19 20 22 23 24 25 26 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "" "cumPNL1"
> str(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex3,])
Named num [1:2] 14 276
- attr(*, "names")= chr [1:2] "" "cumPNL3"
Usually setting plot = FALSE preserves the object, e.g., str(cbind(1 : length(cumPNL3), cumPNL3)[peakIndex3, drop = FALSE]), which somehow does not work in this case. However, changing the points line to the following fixes the problem:
points(seq_along(cumPNL3)[peakIndex], cumPNL3[peakIndex], pch = 19,
col = 'green', cex = 0.6)
Thanks for reporting the issue. I will push the fix to GitHub tomorrow.

r: dprint: size of image of table alteration

I am using the dprint package with knitr , mainly so that I can highlight rows from a table, which I have got working, but the output image leaves a fairly large space for a footnote, and it is taking up unnecessary space.
Is there away to get rid of it?
Also since I am fairly new to dprint, if anybody has better ideas/suggestions as to how to highlight tables and make them look pretty without any footnotes... or ways to tidy up my code that would be great!
An example of the Rmd file code is below...
```{r fig.height=10, fig.width=10, dev='jpeg'}
library("dprint")
k <- data.frame(matrix(1:100, 10,10))
CBs <- style(frmt.bdy=frmt(fontfamily="HersheySans"), frmt.tbl=frmt(bty="o", lwd=1),
frmt.col=frmt(fontfamily="HersheySans", bg="khaki", fontface="bold", lwd=2, bty="_"),
frmt.grp=frmt(fontfamily="HersheySans",bg="khaki", fontface="bold"),
frmt.main=frmt(fontfamily="HersheySans", fontface="bold", fontsize=12),
frmt.ftn=frmt(fontfamily="HersheySans"),
justify="right", tbl.buf=0)
x <- dprint(~., data=k,footnote=NA, pg.dim=c(10,10), margins=c(0.2,0.2,0.2,0.2),
style=CBs, row.hl=row.hl(which(k[,1]==5), col='red'),
fit.width=TRUE, fit.height=TRUE,
showmargins=TRUE, newpage=TRUE, main="TABLE TITLE")
```
Thanks in advance!
I haven't used dprint before, but I see a couple of different things that might be causing problems:
The start of your code chunk has defined the image width and height, which dprint seems to be trying to use.
You are setting both fit.height and fit.width. I think only one of those is used (in other words, the resulting image isn't stretched to fit both height and width, but only the one that seems to make most sense, in this case, width).
After tinkering around for a minute, here's what I did that minimizes the footnote. However, I don't know if there is a more efficient way to do this.
```{r dev='jpeg'}
library("dprint")
k <- data.frame(matrix(1:100, 10,10))
CBs <- style(frmt.bdy=frmt(fontfamily="HersheySans"),
frmt.tbl=frmt(bty="o", lwd=1),
frmt.col=frmt(fontfamily="HersheySans", bg="khaki",
fontface="bold", lwd=2, bty="_"),
frmt.grp=frmt(fontfamily="HersheySans",bg="khaki",
fontface="bold"),
frmt.main=frmt(fontfamily="HersheySans", fontface="bold",
fontsize=12),
frmt.ftn=frmt(fontfamily="HersheySans"),
justify="right", tbl.buf=0)
x <- dprint(~., data=k, style=CBs, pg.dim = c(7, 4.5),
showmargins=TRUE, newpage=TRUE,
main="TABLE TITLE", fit.width=TRUE)
```
Update
Playing around to determine the sizes of the images is a total drag. But, if you run the code in R and look at the structure of x, you'll find the following:
str(x)
# List of 3
# $ cord1 : num [1:2] 0.2 6.8
# $ cord2 : Named num [1:2] 3.42 4.78
# ..- attr(*, "names")= chr [1:2] "" ""
# $ pagenum: num 2
Or, simply:
x$cord2
# 3.420247 4.782485
These are the dimensions of your resulting image, and this information can probably easily be plugged into a function to make your plots better.
Good luck!
So here's my solution...with some examples...
I've just copied and pasted my Rmd file to demonstrate how to use it.
you should be able to just copy and paste it into a blank Rmd file and then knit to HTML to see the results...
Ideally what I would have liked would have been to make it all one nice neat function rather than splitting it up into two (i.e. setup.table & print.table) but since chunk options can't be changed mid chunk as suggested by Yihui, it had to be split up into two functions...
`dprint` + `knitr` Examples to create table images
===========
```{r}
library(dprint)
# creating the sytle object to be used
CBs <- style(frmt.bdy=frmt(fontfamily="HersheySans"),
frmt.tbl=frmt(bty="o", lwd=1),
frmt.col=frmt(fontfamily="HersheySans", bg="khaki",
fontface="bold", lwd=2, bty="_"),
frmt.grp=frmt(fontfamily="HersheySans",bg="khaki",
fontface="bold"),
frmt.main=frmt(fontfamily="HersheySans", fontface="bold",
fontsize=12),
frmt.ftn=frmt(fontfamily="HersheySans"),
justify="right", tbl.buf=0)
# creating a setup function to setup printing a table (will probably put this function into my .Rprofile file)
setup.table <- function(df,width=10, style.obj='CBs'){
require(dprint)
table.style <- get(style.obj)
a <- tbl.struct(~., df)
b <- char.dim(a, style=table.style)
p <- pagelayout(dtype = "rgraphics", pg.dim = NULL, margins = NULL)
f <- size.simp(a[[1]], char.dim.obj=b, loc.y=0, pagelayout=p)
# now to work out the natural table width to height ratio (w.2.h.r) GIVEN the style
w.2.h.r <- as.numeric(f$tbl.width/(f$tbl.height +b$linespace.col+ b$linespace.main))
height <- width/w.2.h.r
table.width <- width
table.height <- height
# Setting chunk options to have right fig dimensions for the next chunk
opts_chunk$set('fig.width'=as.numeric(width+0.1))
opts_chunk$set('fig.height'=as.numeric(height+0.1))
# assigning relevant variables to be used when printing
assign("table.width",table.width, envir=.GlobalEnv)
assign("table.height",table.height, envir=.GlobalEnv)
assign("table.style", table.style, envir=.GlobalEnv)
}
# function to print the table (will probably put this function into my .Rprofile file as well)
print.table <- function(df, row.2.hl='2012-04-30', colour='lightblue',...) {
x <-dprint(~., data=df, style=table.style, pg.dim=c(table.width,table.height), ..., newpage=TRUE,fit.width=TRUE, row.hl=row.hl(which(df[,1]==row.2.hl), col=colour))
}
```
```{r}
# Giving it a go!
# Setting up two differnt size tables
small.df <- data.frame(matrix(1:100, 10,10))
big.df <- data.frame(matrix(1:800,40,20))
```
```{r}
# Using the created setup.table function
setup.table(df=small.df, width=10, style.obj='CBs')
```
```{r}
# Using the print.table function
print.table(small.df,4,'lightblue',main='table title string') # highlighting row 4
```
```{r}
setup.table(big.df,13,'CBs') # now setting up a large table
```
```{r}
print.table(big.df,38,'orange', main='the big table!') # highlighting row 38 in orange
```
```{r}
d <- style() # the default style this time will be used
setup.table(big.df,15,'d')
```
```{r}
print.table(big.df, 23, 'indianred1') # this time higlihting row 23
```

how to calculate correlation with a sliding window?

I have a zoo object called aux with yearly data from 1961 to 2009:
x$nao x[, 2]
1961 -0.03 63.3
1962 0.20 155.9
1963 -2.98 211.0
I want to calculate the correlation between the two columns using a 20 years sliding window. I am trying to use rollapply, but I don't seem to be able to make it work. I tried several different ways of doing it but always without success...
> rollapply(aux,20, cor(aux[,1],aux[,2],method="pearson"))
Error in match.fun(FUN) : 'cor(aux[, 1], aux[, 2], method = "pearson")' is not a function, character or symbol
> rollapply(aux,20, cor,method="pearson")
Error in FUN(coredata(data)[posns], ...) : supply both 'x' and 'y' or a matrix-like 'x'
> rollapply(aux,20, cor)
Error in FUN(coredata(data)[posns], ...) : supply both 'x' and 'y' or a matrix-like 'x'
Can anybody tell me how to make rollapply work?
Thanks for helping!
Try this.
library(quantmod)
library(TTR)
#Set the seed so results can be duplicated
set.seed(123)
#Build a zoo object with typical price data
var1 <- zoo(cumprod(1+rnorm(50, 0.01, 0.05)), seq(1961, 2001, 1))
var2 <- zoo(cumprod(1+rnorm(50, 0.015, 0.1)), seq(1961, 2001, 1))
dat <- merge(var1=var1, var2=var2)
plot(dat)
grid()
#Calculate the percent returns for the two prices
del1 <- Delt(dat$var1)
del2 <- Delt(dat$var2)
dat <- merge(dat, del1=del1, del2=del2)
dimnames(dat)[[2]][3] <- "del1"
dimnames(dat)[[2]][4] <- "del2"
head(dat)
plot(dat)
#Calculate the correlation between the two returns using a 5 year sliding window
delcor <- runCor(dat$del1, dat$del2, n=5, use="all.obs", sample=TRUE, cumulative=FALSE)
dat <- merge(dat, delcor)
plot(dat$delcor, type="l", main="Sliding Window Correlation of Two Return Series", xlab="", col="red")
grid()

Resources