stargazer: line break in F Statistic / df - stargazer

when creating a table with stargazer, I would like to add a new line befor the degrees of freedom (s. below: before the opening bracket). Could someone help me with the correct call, I couldn't find it in the package documentation. (Apologies for not creating reproducible code, I don't know how to simulate a regression with fake data. I hope someone can still help me!)

As far as I know, there is no built-in functionality to show F-statistics and dfs in distinct lines. You have to hack the output of stargazer() to make a table that you want. A user-defined function in this answer (show_F_in_two_lines()) will produce a table as shown below.
library(stringr)
show_F_in_two_lines <- function(stargazer) {
# `Stringr` works better than base's regex
require(stringr)
# If you remove `capture.output()`, not only the modified LaTeX code
# but also the original code would show up
stargazer <- stargazer |>
capture.output()
# Reuse the index in which F-statistics are displayed
position_F <- str_which(stargazer, "F Statistic")
# Extract only F-statistics
Fs <- stargazer[position_F] |>
str_replace_all("\\(.*?\\)", "")
# Extract only df values and make a new line for them
dfs <- stargazer[position_F] |>
str_extract_all("\\(.*?\\)") |>
unlist() |>
(
\(dfs)
paste0(" & ", dfs, collapse = "")
)() |>
paste0(" \\\\")
# Reuse table elements that are specified
# after the index of F-statistics
after_Fs <- stargazer[-seq_len(position_F)]
c(
stargazer[seq_len(position_F - 1)],
Fs,
dfs,
after_Fs
) |>
cat(sep = "\n")
}
stargazer(
header = FALSE,
lm.out.1,
lm.out.2,
lm.out.3,
lm.out.4,
lm.out.5
) |>
show_F_in_two_lines()

Related

Performing a calculation on several data frames with a for loop

I have a group dataframes I want to create a for loop for that will perform a calculation on all of them without having to manually enter the name of the dataframe each time.
example:
df1
df2
df3
#first I try to create a list of the dataframe names to iterate through
dflist <- list(c(df1, df2, df3))
Then I attempt to iterate through it including the calculation. Simplified version here:
for (i in 1:length(dflist)) {
x <- dflist[i]$columnone[1] %>%
y <- dflist[i]$columntwo[1] %>%
z <- mean(dflist[i]$columnthree) %>%
paste0("result_",i) <- x-y/z
}
I keep being told that z cannot be found.
What am I doing wrong?
(the paste0 line at the end is meant to store the result for each dataframe as its own new variable but is not the focus of the question)

Input f into play3d() and movie3d() in the rgl package in R

I don't understand the input f expected by play3d and movie3d in the rgl package.
library(rgl)
nobs<-10
x<-runif(nobs)
y<-runif(nobs)
z<-runif(nobs)
n<-rep(1:nobs)
df<-as.data.frame(cbind(x,y,z,n))
listofobs<-split(df,n)
plot3d(df[,1],df[,2],df[,3], type = "n", radius = .2 )
myplotfunction<-function(x) {
rgl.spheres(x=x$x,y=x$y,z=x$z, type="s", r=0.025)
}
When executing the 2 lines below, the animation does play but both lines (play3d() and movie3d()) trigger the error displayed below:
play3d(f=lapply(listofobs,myplotfunction), fps=1 )
movie3d(f=lapply(listofobs,myplotfunction), fps=1 , duration=20)
I am hoping someone can correct my code and help me understand the f input to play3d and movie3d.
Question 1: Why is the play3d line above correct enough that the animation does display correctly?
Question 2: Why is the play3d line above incorrect enough that it triggers the error?
Question 3: What is wrong with the movie3d line that it does not produce a video output?
As the docs say, f is "A function returning a list that may be passed to par3d". It's not a list, which is what your usage passes.
To answer the questions:
R evaluates the lapply call which does the animation, then play3d looks at the result and dies because it's not a function.
f needs to be a function, as described in the help page.
It dies when it looks at f, because it's not a function.
This looks like it will do what you want:
library(rgl)
nobs<-10
x<-runif(nobs)
y<-runif(nobs)
z<-runif(nobs)
df<-data.frame(x,y,z)
plot3d(df, type = "n" )
id <- NA
myplotfunction<-function(time) {
index <- round(time)
# For a 3x faster display, use index <- round(3*time)
# To cycle through the points several times, use
# index <- round(3*time) %% nobs + 1
if (!is.na(id))
pop3d(id = id) # Delete previous item
id <<- spheres3d(df[index,], r=0.025)
list()
}
play3d(myplotfunction, startTime = 1, duration = nobs - 1)
movie3d(myplotfunction, startTime = 1, duration = nobs - 1, fps = 1)
This will leave a GIF in file.path(tempdir(), "movie.gif").
Some other notes:
don't call rgl.spheres. It will cause you immense pain later. Use spheres3d, or never call any *3d function, and never upgrade rgl: you're living in the past using the rgl.* functions. The *3d functions and the rgl.* functions don't play nicely together.
to construct a dataframe, just use the data.frame() function, don't convert
a matrix.
you don't need all those contortions to extract points from the dataframe.
Most rgl functions can handle a dataframe with x, y, and z columns.
You might notice the plot3d frame move a little: spheres are bigger than points, so it will adjust to accommodate them. You could use xlim, ylim and zlim to set the original frame a little bigger if you don't like this.

For Loop in Shiny Server: How to Not Overwrite Values with Each ActionButton Press?

I am trying to create an app in which part of the UI displays a wordcloud generated by words/strings inputted by the user. To do this, I pass the input to a for loop which is supposed to then store every input in an empty vector with ever press of the action button. However, I am encountering a couple problems, though: one in that no word cloud is displaying, with no error indicated, and another in that the for loop will just overwrite the vector each time the button is pressed, such that it always only has one word in it instead of gradually adding more words. I figured the lack of display is because there is only one word, and it seems like wordcloud needs at least two words to print anything: so how can I get the for loop to work as intended with Shiny?
library(shiny)
library(stringr)
library(stringi)
library(wordcloud2)
ui <- fluidPage(
titlePanel("Strings Sim"),
sidebarLayout(
sidebarPanel(
textInput("string.input", "Create a string:", placeholder = "string <-"),
actionButton("go1", "GO!")
),
mainPanel(
textOutput("dummy"),
wordcloud2Output("the.cloud")
)
)
)
server <- function(input, output, session) {
observeEvent(input$go1, {
high.strung <- as.vector(input$string.input)
empty.words <- NULL
for (i in high.strung) {
empty.words <- c(empty.words, i)
}
word.vector <-matrix(empty.words, nrow = length(empty.words),ncol=1)
num.vector <- matrix(sample(1000), nrow=length(empty.words),ncol=1)
prelim <- cbind(word.vector, num.vector)
prelim.data <- as.data.frame(prelim)
prelim.data$V2 <- as.numeric(as.character(prelim.data$V2))
output$the.cloud <- renderWordcloud2(
wordcloud2(prelim.data)
)
print(empty.words)
})
}
shinyApp(ui=ui,server=server)
The operation works as intended when I run it without Shiny code; I basically just use a string in place of the input, run through the for loop a few times to generate the dataframe to be used by word cloud, and get something like the attached picture, which is what I am after:
Functional code without Shiny:
empty.words <- NULL
#Rerun below here to populate vector with more words and regenerate wordcloud
high.strung <- as.vector("gumbo")
for (i in high.strung) {
empty.words <- c(empty.words, i)
return(empty.words)
}
word.vector <-matrix(empty.words, nrow = length(empty.words),ncol=1)
num.vector <- matrix(sample(1000), nrow=length(empty.words),ncol=1)
prelim <- cbind(word.vector, num.vector)
prelim.data <- as.data.frame(prelim)
prelim.data$V2 <- as.numeric(as.character(prelim.data$V2))
str(prelim.data)
wordcloud2(prelim.data)
Any help is much appreciated!
Edit: More pictures of the desired output using the non-Shiny code. (I editted the dataframe output to overlay the wordcloud just to show the cloud and frame in one picture, i.e. don't need them to display in that way). With each press of the button, the inputted word(s) should be added to the dataframe that builds the cloud, gradually making it larger.The random number vector which determines the size doesn't have to stay the same with each press, but each inputted word should be preserved in a vector.
Your app is missing reactivity. You can read about that concept here. You can input strings and as soon as at least two words are in the dataframe the wordcloud is rendered. If you don't want multi-word strings to be split just take out the str_split() function.
library(shiny)
library(stringr)
library(stringi)
library(wordcloud2)
ui <- fluidPage(
titlePanel("Strings Sim"),
sidebarLayout(
sidebarPanel(
textInput("string.input", "Create a string:", placeholder = "string <-"),
actionButton("go1", "GO!")
),
mainPanel(
textOutput("dummy"),
wordcloud2Output("the.cloud")
)
)
)
server <- function(input, output, session) {
rv <- reactiveValues(high.strung = NULL)
observeEvent(input$go1, {
rv$high.strung <- c(rv$high.strung,str_split(c(input$string.input), pattern = " ") %>% unlist)
})
prelim.data <- reactive({
prelim <- data.frame(
word.vector = rv$high.strung,
num.vector = sample(1000, length(rv$high.strung), replace = TRUE)
)
})
output$the.cloud <- renderWordcloud2(
if (length(rv$high.strung) > 0)
wordcloud2(prelim.data())
)
}
shinyApp(ui=ui,server=server)

efficiently reading a large file into a Map

I'm trying to write code to perform the following simple task in Haskell: looking up the etymologies of words using this dictionary, stored as a large tsv file (http://www1.icsi.berkeley.edu/~demelo/etymwn/). I thought I'd parse (with attoparsec) the tsv file into a Map, which I could then use to look up etymologies efficiently, as required (and do some other stuff with).
This was my code:
{-# LANGUAGE OverloadedStrings #-}
import Control.Arrow
import qualified Data.Map as M
import Control.Applicative
import qualified Data.Text as DT
import qualified Data.Text.Lazy.IO as DTLIO
import qualified Data.Text.Lazy as DTL
import qualified Data.Attoparsec.Text.Lazy as ATL
import Data.Monoid
text = do
x <- DTLIO.readFile "../../../../etymwn.tsv"
return $ DTL.take 10000 x
--parsers
wordpair = do
x <- ATL.takeTill (== ':')
ATL.char ':' *> (ATL.many' $ ATL.char ' ')
y <- ATL.takeTill (\x -> x `elem` ['\t','\n'])
ATL.char '\n' <|> ATL.char '\t'
return (x,y)
--line of file
line = do
a <- (ATL.count 3 wordpair)
case (rel (a !! 2)) of
True -> return . (\[a,b,c] -> [(a,c)]) $ a
False -> return . (\[a,b,c] -> [(c,a)]) $ a
where rel x = if x == ("rel","etymological_origin_of") then False else True
tsv = do
x <- ATL.many1 line
return $ fmap M.fromList x
main = (putStrLn . show . ATL.parse tsv) =<< text
It works for small amounts of input, but quickly grows too inefficient. I'm not quite clear on where the problem is, and soon realized that even trivial tasks like viewing the last character of the file were taking too long when I tried, e.g. with
foo = fmap DTL.last $ DTLIO.readFile "../../../../etymwn.tsv
So my questions are: what are the main things that I'm doing wrong, in terms of approach and execution? Any tips for more Haskelly/better code?
Thanks,
Reuben
Note that the file you want to load has 6 million lines and
the text you are interested in storing comprises approx. 120 MB.
Lower Bounds
To establish some lower bounds I first created another .tsv file containing
the preprocessed contents of the etymwn.tsv file. I then timed how it
took for this perl program to read that file:
my %H;
while (<>) {
chomp;
my ($a,$b) = split("\t", $_, 2);
$H{$a} = $b;
}
This took approx. 17 secs., so I would expect any Haskell program to
take about that about of time.
If this start-up time is unacceptable, consider the following options:
Work in ghci and use the "live reloading" technique to save the map
using the Foreign.Store package
so that it persists through ghci code reloads.
That way you only have to load the map data once as you iterate your code.
Use a persistent key-value store (such as sqlite, gdbm, BerkeleyDB)
Access the data through a client-server store
Reduce the number of key-value pairs you store (do you need all 6 million?)
Option 1 is discussed in this blog post by Chris Done:
Reload Running Code in GHCI
Options 2 and 3 will require you to work in the IO monad.
Parsing
First of all, check the type of your tsv function:
tsv :: Data.Attoparsec.Internal.Types.Parser
DT.Text [M.Map (DT.Text, DT.Text) (DT.Text, DT.Text)]
You are returning a list of maps instead of just one map. This doesn't look
right.
Secondly, as #chi suggested, I doubt that using attoparsec is lazy.
In partcular, it has to verify that the entire parse succeeds,
so I can't see how it cannot avoid creating all of the parsed lines
before returning.
To truely parse the input lazily, take the following approach:
toPair :: DT.Text -> (Key, Value)
toPair input = ...
main = do
all_lines <- fmap DTL.lines $ DTLIO.getContent
let m = M.fromList $ map toPair all_lines
print $ M.lookup "foobar" m
You can still use attoparsec to implement toPair, but you'll be using it
on a line-by-line basis instead of on the entire input.
ByteString vs. Text
In my experience working with ByteStrings is much faster than working with Text.
This version of toPair for ByteStrings is about 4 times faster than the corresponding
version for Text:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Attoparsec.ByteString.Char8 as A
import qualified Data.Attoparsec.ByteString.Lazy as AL
toPair :: L.ByteString -> (L.ByteString, L.ByteString)
toPair bs =
case AL.maybeResult (AL.parse parseLine bs) of
Nothing -> error "bad line"
Just (a,b) -> (a,b)
where parseLine = do
A.skipWhile (/= ' ')
A.skipWhile (== ' ')
a <- A.takeWhile (/= '\t')
A.skipWhile (== '\t')
rel <- A.takeWhile (/= '\t')
A.skipWhile (== '\t')
A.skipWhile (/= ' ')
A.skipWhile (== ' ')
c <- A.takeWhile (const True)
if rel == "rel:etymological_origin_of"
then return (c,a)
else return (a,c)
Or, just use plain ByteString functions:
fields :: L.ByteString -> [L.ByteString]
fields = L.splitWith (== '\t')
snipSpace = L.ByteString -> L.ByteString
snipSpace = L.dropWhile (== ' ') . L.dropWhile (/=' ')
toPair'' bs =
let fs = fields bs
case fields line of
(x:y:z:_) -> let a = snipSpace x
c = snipSpace z
in
if y == "rel:etymological_origin_of"
then (c,a)
else (a,c)
_ -> error "bad line"
Most of the time spent loading the map is in parsing the lines.
For ByteStrings this is about 14 sec. to load all 6 million lines
vs. 50 secs. for Text.
To add to this answer, I'd like to note that attoparsec actually has very good support for "pull-based" incremental parsing. You can use this directly with the convenient parseWith function. For even finer control, you can feed the parser by hand with parse and feed. If you don't want to worry about any of this, you should be able to use something like pipes-attoparsec, but personally I find pipes a bit hard to understand.

Erlang upper case and lower case sort

the question about a comparison of the upper and lower case..how can i do that in my sort function.any idea?
Ex: Inputfile : " I am Happy! "
Outputfile:
Happy!
I
am
thats what's happen with my program, but i would like so have:
am
I
Happy
My code:
-module(wp)
-compile([export_all]). % Open the File
sortFile(File1,File2) ->
{ok, File_Read} = file:read_file(File1),
% making a list
Liste = string:tokens(binary_to_list(File_Read), "\n "),
% isort List
Sort_List = isort(Liste),
ISort = string:join(Sort_List,"\n"),
%Written in the File.
{ok,Datei_Schreiben} = file:open(File2, write),
file:write(File_Write, Isort),
file:close(File_Write).
isort([]) -> [];
isort([X|XS])-> insert(X, isort(XS)).
insert(Elem, []) -> [Elem];
insert(Elem, [X|XS]) when Elem= [Elem,X|XS];
insert(Elem, [X|XS]) -> [X|insert(Elem,XS)].
how about something like this:
qsort1([]) -> [];
qsort1([H|T]) ->
qsort1([X || X <- T, string:to_lower(X) < string:to_lower(H)])
++ [H]
++ qsort1([X || X <- T, string:to_lower(X) >= string:to_lower(H)]).
7> qsort1(["I", "am","Happy"]).
["am","Happy","I"]
I believe that "happy" sorts less than "i"
8> "happy" < "i".
true
which is why my sorted order is a little differenct than your original post.
When there is at least N*log2(N) comparisons in sorting there is not necessary to make N*log2(N) but only N case transformations. (Almost all perl developers knows this trick.)
{ok, Bin} = file:read_file(?INPUT_FILE),
Toks = string:tokens(binary_to_list(Bin),"\n "),
Result = [[X,$\n] || {_,X} <- lists:sort([{string:to_lower(X), X} || X<-Toks])],
file:write_file(?OUTPUT_FILE, Result).
BTW lists:sort/1 merge sort has granted N*log2(N) and is pretty efficient in contrary to concise but less efficient quick sort implementation. What worse, quick sort has N^2 worst case.
Now, depending on whether you are on Windows or Unix/Linux, the lines in the files will be ended with different characters. Lets go with windows where its normally \r\n. Now assuming the input files are not too big, we can read them at once into a binary. The stream of data we get must be split into lines, then each line split into words (spaces). If the input file is very big and cannot fit in memory, then you will have to read it, line by line, in which case you might need an IN-Memory buffer to hold all the words ready for sorting, this would require ETS Table, or Memcached (an option i wont illustrate here). Lets write the code
-module(sick_sort).
-compile(export_all).
-define(INPUT_FILE,"C:/SICK_SORT/input.txt").
-define(OUTPUT_FILE_PATH,"C:/SICK_SORT/").
-define(OUTPUT_FILENAME,"output.txt").
start()->
case file:read_file(?INPUT_FILE) of
{ok,Binary} ->
%% input file read
AllLines = string:tokens(binary_to_list(Binary),"\r\n"),
SortedText = lists:flatten([XX ++ "\r\n" || XX <- lists:sort(string:tokens(AllLines," "))]),
EndFile = filename:join(?OUTPUT_FILE_PATH,?OUTPUT_FILENAME),
file:write_file(EndFile,SortedText),
ok;
Error -> {error,Error}
end.
That should work. Change the macros in the source file to suit your settings and then, just run sick_sort:start().
you have to compare low cap in your sort function:
(nitrogen#127.0.0.1)25> F= fun(X,Y) -> string:to_lower(X) < string:to_lower(Y) end.
#Fun<erl_eval.12.111823515>
(nitrogen#127.0.0.1)26> lists:sort(F,["I","am","Happy"]).
["am","Happy","I"]
(nitrogen#127.0.0.1)27>
EDIT:
In your code, the function that allows to sort the list are the operators > and < (if you want to see replicated string one of them should include =, otherwise you will do a usort). If you want to use a different comparison you can define it in a normal or anonymous function and then use it in the quicksort:
mycompare(X,Y) ->
string:to_lower(X) < string:to_lower(Y).
quicksort ([])->[];
([X|XS])-> quicksort([Y||Y<-XS,mycompare(X,Y)])++[X]++quicksort([Y||Y<-XS,mycompare(X,Y) == false]).

Resources