copy stargazer regression output to excel: values in () become - - stargazer

I choose type ="html" and then copy and paste the output to excel.
All work perfectly except the () in surround a value transfers the value into a negative one.
For example, (2.451) that presents the standard errors becomes -2.451
Is there a way to fix this? Thanks!

Excel interprets the parentheses around a number as an indication it should be negative. This is inconvenient for our standard errors and p-value reporting from stargazer, because (0.04) is not meant to be -0.04.
To work around this, you could:
Select all cells in your spreadsheet
Change the data type of all of the cells to "Text", so that they are not interpreted as numbers. (See screenshot below)
Then you can paste parenthesised numbers in, and they are treated as text (see below).
If you still need to work with some numbers as numbers, you can set those cells to "Number".

Related

How to separate numbers from text using SPLIT/LEFT/RIGHT function in Google Sheets

I have a google sheets documents with data in this format:
Some data 10:5 Somemore Data
I am trying to separate the text from the numbers in separate columns based on the colon sign so that the output looks like this:
Some data | 10 | 5 | Somemore Data
I tried the SPLIT and RIGHT/LEFT functions but I can't get it to work.
This is what I have so far
=LEFT(C2,FIND(":",C2)-3)
This separates the text on the LEFT but using it on the right side doesn't work. My formula also doesn't separate the numbers. Looking for a formula that can achieve the above desired result.
My spreadsheet - https://docs.google.com/spreadsheets/d/1EmL4kzCGxRbwvNJntwMokqgt8yjjAqnZuUidTbZe6Z8/edit?usp=sharing
Thanks.
There is already a solution in your shared sheet with SPLIT and REGEXREPLACE.
Here is one a bit simpler with REGEXEXTRACT:
=ARRAYFORMULA(IF(A2:A="", "", REGEXEXTRACT(A2:A,"^(.+?)[ ]+(\d+)[ ]*:[ ]*(\d+)[ ]+(.+)$")))
Every group will be a cell in a row to the right.
Regex description and demo: link.
Edit: stripped spaces. You have a nasty chars in your strings - nonbreaking space bar which is indistinguishable from the regular space. Could not understand why a simpler regex (^(.+?)\s+(\d+)\s*:\s*(\d+)\s+(.+)$) did not work. All because of this nbsp (char 160). Thus [ ] (nbsp and a regular space) instead of just \s.

How can I refactor an existing source code file to normalize all use of tab?

Sometimes I find myself editing a C source file which sees both use of tab as four spaces, and regular tab.
Is there any tool that attempts to parse the file and "normalize" this, i.e. convert all occurrences of four spaces to regular tab, or all occurrences of tab to four spaces, to keep it consistent?
I assume something like this can be done even with just a simple vim one-liner?
There's :retab and :retab! which can help, but there are caveats.
It's easier if you're using spaces for indentation, then just set 'expandtab' and execute :retab, then all your tabs will be converted to spaces at the appropriate tab stops (which default to 8.) That's easy and there are no traps in this method!
If you want to use 4 space indentation, then keep 'expandtab' enabled and set 'softtabstop' to 4. (Avoid modifying the 'tabstop' option, it should always stay at 8.)
If you want to do the inverse and convert to tabs instead, you could set 'noexpandtab' and then use :retab! (which will also look at sequences of spaces and try to convert them back to tabs.) The main problem with this approach is that it won't just consider indentation for conversion, but also sequences of spaces in the middle of lines, which can cause the operation to affect strings inside your code, which would be highly undesirable.
Perhaps a better approach for replacing spaces with tabs for indentation is to use the following substitute command:
:%s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)#
Yeah it's a mouthful... It's matching whitespace at the beginning of the lines, then using the indent() function to find the total indentation (that function calculates indentation taking tab stops in consideration), then dividing that by the 'tabstop' to decide how many tabs and how many spaces a specific line needs.
If this command works for you, you might want to consider adding a mapping or :command for it, to keep it handy. For example:
command! -range=% Retab <line1>,<line2>s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)
This also allows you to "Retab" a range of the file, including one you select with a visual selection.
Finally, one last alternative to :retab is that to ask Vim to "reformat" your code completely, using the = command, which will use the current 'indentexpr' or other indentation configurations such as 'cindent' to completely reindent the block. That typically respects your 'noexpandtab' and 'smarttabstop' options, so it use tabs and spaces for indentation consistently. The downside of this approach is that it will completely reformat your code, including changing indentation in places. The upside is that it typically has a semantic understanding of the language and will be able to take that in consideration when reindenting the code block.

Recoding variable in SPSS with syntax

I have a dataset with 1500+ codes for medical treatments (Verrichtingcode). They are coded such as 337409A, 339830E or 336690. This is a string variable. I want to use the syntax to change those into, let's say: laparoscopic abdominal surgery. I have the translation for all these codes standing by.
If I use the syntax:
VALUE LABELS
Verrichtingcode
339985B 'Sedation'.
EXECUTE.
What comes out is: 339985 = "B 'Sedation'"
Which doesn't work.
I then tried to Recode >
RECODE Verrichtingcode
(339985B= AA339985BBB).
EXECUTE.
RECODE Verrichtingcode
(339985B= AA339985BBB).
EXECUTE.
This works fine, until you get to a code with an E at the end.
RECODE Verrichtingcode
(336070D= AA336070DBB)
(333698E= AA333698EBB).
EXECUTE.
RECODE Verrichtingcode
(336070D= AA336070DBB)
(333698E= AA333698EBB).
EXECUTE.
What I get is:
>Warning # 203 in column 2. Text: 333698E
>An 'E', beginning the exponent portion of a number, was not followed by any
>digits.
>The symbol will be treated as an invalid special character.
>Error # 4654 in column 2. Text: 339993
>The RECODE command attempts to test a string variable for having a numeric
>value. Note that LOWEST, HIGHEST, and SYSMIS are considered to be numeric
>values.
>Execution of this command stops.
EXECUTE.
I could off course do it all by hand in the Variable view, but with 1500+ procedures it's gonna take some time ;)
If any of you would be so kind to help me, It would be really appreciated. If you need more info, I would be happy to deliver.
With 1500 codes, you really don't want to do this with IF or RECODE. Assigning value labels is the usual way of dealing with this. If you actually need the variable values to be the identifying strings, a table lookup would be vastly better. MATCH FILES with the TABLE subcommand can handle this. You would create a dataset with the keys and labels, sort it and the regular data, and then use MATCH with TABLE.
Since this is a string variable you need to use quotation marks around the values in both commands you used:
VALUE LABELS Verrichtingcode
'339985B' 'Sedation'.
or
RECODE Verrichtingcode ('339985B'= 'AA339985BBB').
EXECUTE.

GS1-128 barcode with ZPL does not put the AI in ()

i was expecting this command
^FO15,240^BY3,2:1^BCN,100,Y,N,Y,^FD>:>842011118888^FS
to generate a
(420) 11118888
interpretation line, instead it generates
~n42011118888
anyone have idea how to generate the expected output?
TIA!
Joey
If the firmware is up to date, D mode can be used.
^BCo,h,f,g,e,m
^XA
^FO15,240
^BY3,2:1
^BCN,100,Y,N,Y,D
^FD(420)11118888^FS
^XZ
D = UCC/EAN Mode (x.11.x and newer firmware)
This allows dealing with UCC/EAN with and without chained
application identifiers. The code starts in the appropriate subset
followed by FNC1 to indicate a UCC/EAN 128 bar code. The printer
automatically strips out parentheses and spaces for encoding, but
prints them in the human-readable section. The printer automatically
determines if a check digit is required, calculate it, and print it.
Automatically sizes the human readable.
The ^BC command's "interpretation line" feature does not support auto-insertion of the parentheses. (I think it's safe to assume this is partly because it has no way of determining what your data identifier is by just looking at the data provided - it could be 420, could be 4, could be any other portion of the data starting from the first character.)
My recommendation is that you create a separate text field which handles the logic for the parentheses, and place it just above or below the barcode itself. This is the way I've always approached these in the past - I prefer this method because I have direct control over the font, font size, and formatting of the interpretation line.

Putting spaces back into a string of text with unreliable space information

I need to parse some text from pdfs but the pdf formatting results in extremely unreliable spacing. The result is that I have to ignore the spaces and have a continuous stream of non-space characters.
Any suggestions on how to parse the string and put spaces back into the string by guessing?
I'm using ruby. Or should I say I'musingruby?
Edit: I've pulled the text out using pdf-reader. Some of the pdf files are nicely formatted and some are not. An example of text mixed with positioning:
.7aspe-5.5cts-715.1o0.6f-708.5f-0.4aces-721.4that-716.3are-720.0i-1.8mportant-716.3in-713.9soc-5.5i-1.8alcommunica6.6tion6.3.-711.6Althoug6.3h-708.1m-1.9od6.3els-709.3o6.4f-702.8f5.4ace-707.9proc6.6essing-708.2haveproposed-611.2ways-615.5to-614.7deal-613.2with-613.0these-613.9diff10.4erent-613.7tasks,-611.9it-617.1remainsunclear-448.0how-450.7these-443.2mechanisms-451.7might-446.7be-447.7implemented-447.2in-450.3visualOne-418.9model-418.8of-417.3human-416.4face-421.9processing-417.5proposes-422.7that-419.8informa-tion-584.5is-578.0processed-586.1in-583.1specialised-584.7modules-577.0(Breen-584.4et-582.9al.,-582.32002;Bruce-382.1and-384.0Y92.0oung,-380.21986;-379.2Haxby-379.9et-380.5al.,-
and if I print just string data (I added returns at the end of each line to keep it from
messing up the layout here:
'Distinctrepresentationsforfacialidentityandchangeableaspectsoffacesinthehumantemporal
lobeTimothyJ.Andrews*andMichaelP.EwbankDepartmentofPsychology,WolfsonResearchInstitute,
UniversityofDurham,UKReceived23December2003;revised26March2004;accepted27July2004Availab
leonline14October2004Theneuralsystemunderlyingfaceperceptionmustrepresenttheunchanging
featuresofafacethatspecifyidentity,aswellasthechangeableaspectsofafacethatfacilitates
ocialcommunication.However,thewayinformationaboutfacesisrepresentedinthebrainremainsc
ontroversial.Inthisstudy,weusedfMRadaptation(thereductioninfMRIactivitythatfollowsthe
repeatedpresentationofidenticalimages)toaskhowdifferentface-andobject-selectiveregionsofvisualcortexcontributetospecificaspectsoffaceperception'
The data is spit out by callbacks so if I print each string as it is returned it looks like this:
'The
-571.3
neural
-573.7
system
-577.4
underly
13.9
ing
-577.2
face
-573.0
perc
13.7
eption
-574.9
must
-572.1
repr
20.8
esent
-577.0
the
unchangin
14.4
g
-538.5
featur
16.5
es
-529.5
of
-536.6
a
-531.4
face
'
On examination it looks like the true spaces are large negative numbers < -300 and the false spaces are much smaller positive numbers. Thanks guys. Just getting to the point where i am asking the question clearly helped me answer it!
Hmmmm... I'd have to say that guessing is never a good idea. Looking at the problem root cause and solving that is the answer, anything else is a kludge.
If the spacing is unreliable from the PDF, how is it unreliable? The PDF viewer needs to be able to reliably space the text so the data is there somewhere, you just need to find it.
EDIT following comment:
The idea of parsing the file using a dictionary (your only other option really, apart from randomly inserting spaces and hoping for the best) and inserting spaces at identified word boundaries (a real problem when dealing with punctuation, plurals that don't alter the base word i.e. plural, etc) would, I believe, be a much greater programming challenge than correctly parsing the PDF in the first place. After all, PDF is clearly defined whereas English is somewhat wooly.
Why not look down the route of existing solutions like ps2ascii in linux, call the function from your Ruby and pick up the result.
PDF doesn't only store spaces as space characters, but also uses layout commands for spacing (so it doesn't print a space, but moves the "pen" to the right). Perhaps you should have a look at the PDF reference (the big PDF on the bottom of the site), Chapter 9 "Text" should be what you're looking for.
EDIT: After reading your comment to Lazarus' answer, this doesn't seem to be what you're looking for. I think you should try to get a word list from somewhere and try to split your text using it. A good strategy would be to do that using recursion, because for example:
"meandyou"
The first word could be "me" or "mean", but if you try "mean", "dyou" doesn't make sense, so it will be "me", same for the next word that could be "a" or "an" or "and", only "and" makes sense.
If it were me I'd go back to the source PDFs and try a different method of extracting the text, such as iText (for Java) or maybe some kind of PDF-to-HTML to text conversion software method.

Resources