I have a dataset with 1500+ codes for medical treatments (Verrichtingcode). They are coded such as 337409A, 339830E or 336690. This is a string variable. I want to use the syntax to change those into, let's say: laparoscopic abdominal surgery. I have the translation for all these codes standing by.
If I use the syntax:
VALUE LABELS
Verrichtingcode
339985B 'Sedation'.
EXECUTE.
What comes out is: 339985 = "B 'Sedation'"
Which doesn't work.
I then tried to Recode >
RECODE Verrichtingcode
(339985B= AA339985BBB).
EXECUTE.
RECODE Verrichtingcode
(339985B= AA339985BBB).
EXECUTE.
This works fine, until you get to a code with an E at the end.
RECODE Verrichtingcode
(336070D= AA336070DBB)
(333698E= AA333698EBB).
EXECUTE.
RECODE Verrichtingcode
(336070D= AA336070DBB)
(333698E= AA333698EBB).
EXECUTE.
What I get is:
>Warning # 203 in column 2. Text: 333698E
>An 'E', beginning the exponent portion of a number, was not followed by any
>digits.
>The symbol will be treated as an invalid special character.
>Error # 4654 in column 2. Text: 339993
>The RECODE command attempts to test a string variable for having a numeric
>value. Note that LOWEST, HIGHEST, and SYSMIS are considered to be numeric
>values.
>Execution of this command stops.
EXECUTE.
I could off course do it all by hand in the Variable view, but with 1500+ procedures it's gonna take some time ;)
If any of you would be so kind to help me, It would be really appreciated. If you need more info, I would be happy to deliver.
With 1500 codes, you really don't want to do this with IF or RECODE. Assigning value labels is the usual way of dealing with this. If you actually need the variable values to be the identifying strings, a table lookup would be vastly better. MATCH FILES with the TABLE subcommand can handle this. You would create a dataset with the keys and labels, sort it and the regular data, and then use MATCH with TABLE.
Since this is a string variable you need to use quotation marks around the values in both commands you used:
VALUE LABELS Verrichtingcode
'339985B' 'Sedation'.
or
RECODE Verrichtingcode ('339985B'= 'AA339985BBB').
EXECUTE.
Related
Let's say you have a table called Employee and one of the employee names begins or includes the [$] or [#] sign within the String, like Hel$lo or like #heLLo. How can you call the value?
Is it possible to select and trim the value in a single command?
Kind regards
If you want to select the names, but with special characters $ and # removed, you can use the TRANSLATE function. Add more characters to the list if you need to.
select translate(name, 'A$#', 'A') from employee;
The function will "translate" the character 'A' to itself, '$' and '#' to nothing (simply removing them from the string), and it will leave all other characters - other than A, $ and # - unchanged. It may seem odd that you need the 'A' in this whole business; and you really don't need 'A' specifically, but you do need some character that you want to keep. The reason for that is Oracle's idiotic handling of null; don't worry about the reason, just remember the technique.
You may need to remove characters but you don't know in advance what they will be. That can be done too, but you need to be careful not to remove legitimate characters, like the dot (A. C. Green), dash (John Connor-Smith), apostrophe (Betty O'Rourke) etc. You can then do it either with regular expressions (easy to write, but not the most efficient) or with TRANSLATE as above (it looks uglier, but it will run faster). Something like this:
select regexp_replace(name, [^[:alpha:].'-]) from employee
This will replace any character that is not "alpha" (letters) or one of the characters specifically enumerated (dot, apostrophe, dash) with nothing, effectively removing them. Note that dash has a special meaning in character classes, so it must be the last one in the enumeration.
If you need to make the changes in the table itself, you can use an update statement, using TRANSLATE or REGEXP_REPLACE as shown above.
I choose type ="html" and then copy and paste the output to excel.
All work perfectly except the () in surround a value transfers the value into a negative one.
For example, (2.451) that presents the standard errors becomes -2.451
Is there a way to fix this? Thanks!
Excel interprets the parentheses around a number as an indication it should be negative. This is inconvenient for our standard errors and p-value reporting from stargazer, because (0.04) is not meant to be -0.04.
To work around this, you could:
Select all cells in your spreadsheet
Change the data type of all of the cells to "Text", so that they are not interpreted as numbers. (See screenshot below)
Then you can paste parenthesised numbers in, and they are treated as text (see below).
If you still need to work with some numbers as numbers, you can set those cells to "Number".
I am looping over a set of scalars which contain quarterly sif values. I would like to convert them to hrf format and keep them stored in scalars.
However, I found that format %tq only accepts variables. Hence, the only workaround seems to i) convert the scalar to a variable ii) apply format %tq iii) convert the variable to a scalar.
Is there a more elegant and faster way to do this? (I am using Stata MP 15.1.)
You can have string scalars, so you can do this. I can't see why it would be useful, but that could be failure of imagination; you could enlighten us on why you want this.
. scalar foo = yq(2018, 4)
. scalar foo = string(scalar(foo), "%tq")
. scalar list
foo = 2018q4
What is quite different for scalars is that there is no sense whatsoever in which a display format is attached to or associated with a scalar. You can hold a numeric date or a string date in a scalar, but those are the only choices. You can't have a numeric value with a format on the side that Stata will use for display when suitable. You found that out when you attempted to format a scalar.
Goodness knows whether this is faster (than what?) or more elegant (who decides?). The major difference is that a variable manifestly can contain many dates and a changed format made just once with format can apply to them consistently, whereas changing how you show a bunch of scalars requires a loop every time you do it so far as I can see. Further, it follows from above that you might need to keep two sets of scalars, one numeric for calculation and one string for display.
I've used date constants and typically found that either I use them directly (subtracting 2000 as base doesn't requiring putting it into anything) or I use local macros to hold them. But I can't see anything wrong with using scalars, except possibly indirection.
I need to validate that something is an Excel cell range in Ruby, i.e: "A4:A6". By looking at it, the requirement I am looking for is:
<Alphabetical, Capitalised><Integer>:<Integer><Alphabetical, Capitalised>
I am not sure how to form a RegExp for this.
I would appreciate a small explanation for a solution, as opposed to purely a solution.
A bonus would be to check that the range is restricted to within a row or column. I think this would be out of scope of Regular Expressions though.
I have tried /[A-Z]+[0-9]+:[A-Z]+[0-9]+/ this works but allows extra characters on the ends.
This does not work because it allows extra's to be added on to the beginning or end:
"HELLOAA3:A7".match(/\A[A-Z]+[0-9]+:[A-Z]+[0-9]+\z/) also returns a match, but is more on the right track.
How would I limit the number range to 10000?
How would I limit the number of characters to 3?
This is my solution:
(?:(?:\'?(?:\[(?<wbook>.+)\])?(?<sheet>.+?)\'?!)?(?<colabs>\$)?(?<col>[a-zA-Z]+)(?<rowabs>\$)?(?<row>\d+)(?::(?<col2abs>\$)?(?<col2>[a-zA-Z]+)(?<row2abs>\$)?(?<row2>\d+))?|(?<name>[A-Za-z]+[A-Za-z\d]*))
It includes named ranges, but the R1C1 notation is not supported.
The pattern is written in perl compatible regex dialect (i.e. can also be used with C#), I'm not familiar with Ruby, so I can't tell the difference, but you may want to look here: What is the difference between Regex syntax in Ruby vs Perl?
This will do both: match Excel range and that they must be same row or column. Stub
^([A-Z]+)(\d+):(\1\d+|[A-Z]+\2)$
A4:A6 // ok
A5:B10 // not ok
B5:Z5 // ok
AZ100:B100hello // not ok
The magic here is the back-reference group:
([A-Z]+)(\d+) -- column is in capture group 1, row in group 2
(\1\d+|[A-Z]+\2) -- the first column followed by any number; or
-- the first row preceded by any character
Say I have a field on a datawindow that is the value of a database column ("Insert > Column). It has conditions in which it needs to be protected (Properties>General>Protect).
I want to have the field background grey when it's protect. At the moment, the only way I can work out how to do this is to copy the protect conditional, no matter how complex, substituting the 1 (protect) and 0 (not protect) for colour values.
Is there some sort of syntax I can use in the Expression field for the column's background colour that references the protect value of the column? I tried
if (column.protect=1, Grey, White)
but it returns errorous saying it expects a TRUE/FALSE condition.
Is what I'm after impossible, or is it just a matter of getting the right syntax.
Cheers.
Wow. You like complex, layered questions.
The first problem is accessing the value, which isn't done as directly as you described. As a matter of fact, you use a Describe() to get the value. The only problem with that is that it comes back as a string in the following format, with quotes around (note that we're using standard PowerScript string notation where ~t is a tab)
"<DefaultValue>~t<Expression>"
You want the expression, so you'll have to parse it out, dropping the quotes as well.
Once you've got the expression, you'll need to evaluate it for the given row. That can be done with another Describe () call, particularly:
Describe ("Evaluate('<expression>', <rownum>)")
The row number that an expression is being evaluated on can be had with the GetRow() function.
This may sound like it needs PowerScript and some interim value storage, but as long as you're willing to make redundant function calls to get a given value more than once, you can do this in an expression, something like (for an example column b):
if (Describe ("Evaluate (~"" + Mid (Describe ("b.protect"),
Pos (Describe ("b.protect"), "~t")+1,
Len (Describe ("b.protect")) - Pos (Describe ("b.protect"), "~t") - 1)
+ "~", " + String (GetRow()) + ")")='1',
rgb(128, 128, 128),
rgb(255,255,255))
This looks complex, but if you put the Mid() expression in a compute field so you can see the result, you'll see that simply parses out the Protect expression and puts it into the Describe (Evaluate()) syntax described above.
I have put one cheat into my code for simplicity. I used the knowledge that I only had single quotes in my Protect expression, and chose to put the Evaluate() expression string in double quotes. If I was trying to do this generically for any column, and couldn't assume an absence of double quotes in my Protect expression, I'd have use a global function to do a replace of any double quotes in the Protect expression with escaped quotes (~"), which I believe in your code would look like a triple tilde and a quote. Then again, if I had to make a global function call (note that global function calls in expressions can have a significant performance impact if there are a lot of rows), I'd just pass it the Describe ("column.protect") and GetRow() and build the entire expression in PowerScript, which would be easier to understand and maintain.
Good luck,
Terry.