Avoiding Spaces in Keywords - whitespace

I am designing a language. I am being troubled by what to call "else if". My language uses indentation for blocks, so I need a keyword for "else if".
Python uses "elif" (meh...) and Ruby uses "elsif" (yuck!). Personally, I hate to use abbreviations, so I don't want to use either of these. Instead, I am thinking of just using "else if", where an arbitrary number of spaces can appear between "else" and "if".
I've noticed that this doesn't occur very often in programming languages. C# has "yield return" as a keyword, but that's the only example I can think of.
Is there an implementation concern behind this? I've created my lex file and it accepts the keyword with no issues. I am worried there is something I haven't thought of.

As long as you don't allow inline comments/newlines, there's nothing wrong with multi-word keywords. The only thing is that else if might be confusing for your language users, tempting them to write else while or else for. You'll have hard time explaining them that your else if is a keyword and not two statements following each other.

Out of curiosity, why bother having an "else if" keyword? You can just have an "else" keyword and the next thing is an expression... an "if" expression.
<expression> := IF | somethingelse
IF <expression> THEN <expression> ELSE <expression>
The idea being that the if in "else if" is just the start of the expression after the "else" keyword.
A sample of what I mean, per the comments
if (x == 1)
return 5
else return 4
if (x == 1)
return 5
else if (x == 2)
return 6
else
return 7
The concept above being that the return 4 is no different than the if and it's "arguments". As long as you allot the start of the next command on the same line (even if it's not considered good form), you can treat the if in else if as just the start of the next expression.

Related

Is 'X = someFunction() + 2` a statement or an expression?

I read that an expression is anything that gives some value, like 2 + X, while a statement is any instruction to the computer to execute something, like print("hi").
What about the following line of code?
X = someFunction() + 2
someFunction() returns some numerical value (I think a lot of languages wouldn't compile this code if it didn't), and thus someFunction() + 2 is 'something that yields some value' - aka an expression.
But, someFunction() is code to be executed, thus a statement.
My question:
There are often lines of code that equal some value, but are also an instruction to be executed. What are these lines of code considered?
In certain computer languages called "functional languages", everything--including the code that prints "hi"--is an expression. At the other extreme, you can write code in machine language (so you, not a compiler, are deciding exactly what sequence of bytes should compose the executable program), and at that level practically everything (even adding 2 to something) is an "instruction to the computer to execute something".
I've used a lot of different computer languages, and a far as I can recall, in each case there was documentation somewhere defining what makes a statement in that particular language (if indeed the language even has a concept of "statement"). The definition is based on syntax, not so much on what the code does.
For example, in C or C++, if you write
{ x + 2; }
then technically the "x + 2;" is a statement. It is a useless statement that doesn't do anything, but syntactically, it is a statement nevertheless. In fact, one way to write a statement in C is to just append a semicolon to an expression (http://msdn.microsoft.com/en-us/library/1t054cy7.aspx). You don't even need the expression; a semicolon by itself can be a statement (http://msdn.microsoft.com/en-us/library/h7zyw61x.aspx).
By the way, in C++, the '+' in an expression such as (x + 2) may actually be a function call. So if you say anything that calls a function is a statement, then (x + 2) would be, or at least could be, a statement in C++. But I don't know any authority who defines it that way.
It varies by language, but ultimately: a statement is anything you can't embed inside another (simple, i.e. not a block) statement.
In C, your example is an expression, because you can do this:
while (X = someFunction() + 2) {
// ...
}
But in Python, the same thing is a syntax error, because = can only be a statement:
# nope!
while X = someFunction() + 2:
pass
In most languages, any expression can also be used as a statement by itself, though this may or may not be useful.
Calling a statement an "instruction to execute something" is a poor way to think about it, though. All code is an instruction to execute something.
A statement is more like a single complete thought. It's really just part of the syntax; depending on the language/compiler/runtime, a statement or expression may become very many machine instructions, or several statements might be reduced to just one instruction.
tldr; It is a statement when it is parsed as statement, and an expression when it is parsed as an expression. The rules of which depend upon the particular language in question.
Expressions and statements should not be confused with "what actually happens" underneath, but merely as describing the syntax constructs of a language's grammar.
Because the grammar and parsing rules [generally] depend on the program as a whole, taking part of an expression and using it as a statement, where such is allowed, does not indicate that it is a statement, much less when it appears in an expression context.
As for the particular example given, it depends on programming language and where the construct appears. Some languages support assignments as expressions, while others do not.
For instance, consider this JavaScript (see Appendix A of ES5 for the grammar rules).
{ x = y = f() + 2 }
In this case, the block is a statement (BlockStatement) and x = .. is also considered a "statement" (although it is really an Expression via Statement -> ExpressionStatement) while y = .. is an expression. Likewise, f() is an expression (technically, f is also an expression in JavaScript) and 2 is an expression and f() + 2 is an expression.
However, the following is invalid Pascal because Pascal's syntax does not support := (assignment) in an expression and an assignment is always a statement.
X := Y := F() + 2
Some languages also forbid general expressions as statements, which further throws off the notion that, in y = EXPR, it is correct to consider EXPR a valid statement. The following is invalid C#, but is dubiously valid in JavaScript and many other languages.
{ f() + 2; }
I would say that the "someFunction() + 2" is an expression being evaluated. Then I would say that "x = someFunction() + 2" is a statement, because most languages would generally evaluate the function's return plus two, and then assign that value to x.

Ruby case/when vs if/elsif

The case/when statements remind me of try/catch statements in Python, which are fairly expensive operations. Is this similar with the Ruby case/when statements? What advantages do they have, other than perhaps being more concise, to if/elsif Ruby statements? When would I use one over the other?
The case expression is not at all like a try/catch block. The Ruby equivalents to try and catch are begin and rescue.
In general, the case expression is used when you want to test one value for several conditions. For example:
case x
when String
"You passed a string but X is supposed to be a number. What were you thinking?"
when 0
"X is zero"
when 1..5
"X is between 1 and 5"
else
"X isn't a number we're interested in"
end
The case expression is orthogonal to the switch statement that exists in many other languages (e.g. C, Java, JavaScript), though Python doesn't include any such thing. The main difference with case is that it is an expression rather than a statement (so it yields a value) and it uses the === operator for equality, which allows us to express interesting things like "Is this value a String? Is it 0? Is it in the range 1..5?"
Ruby's begin/rescue/end is more similar to Python's try/catch (assuming Python's try/catch is similar to Javascript, Java, etc.). In both of the above the code runs, catches errors and continues.
case/when is like C's switch and ignoring the === operator that bjhaid mentions operates very much like if/elseif/end. Which you use is up to you, but there are some advantages to using case when the number of conditionals gets long. No one likes /if/elsif/elsif/elsif/elsif/elsif/end :-)
Ruby has some other magical things involving that === operator that can make case nice, but I'll leave that to the documentation which explains it better than I can.

string parsing optimization : ruby

I am working on a parser that is currently way too slow for my needs (like 40x slower than I would like) and would like advice on methods to increase my speed. I have tried and am currently using a custom regex parser, aswell as a custom parser using strscanner class. Ive heard a lot of positive comments on treetop, and have considered trying to combine the regex into one huge regex that would cover all matches, but would like to get some feedback w/ experience before I rewrite my parser yet again.
The basic rules of the strings that I am parsing are:
3 segments (BoL operators, message, EoL operators)
~6 BoL operators
BoL operators can be in any order
2 EoL operators EoL operators can be in any order
Quantity of any specific operator can be 0, 1, or >1 but only 1 is used rest are removed and discarded
Operators in the 'message' section of the string are not captured / removed
Whitespaces is allowed before & after operators but not required
Some BoL operators can have whitespace in the setting
My current Regex parser works by running the string through a loop that checks for BoL or EoL operators 1 at a time and cutting them out, ending the loop when there are no more operators of the given type as so...
loop{
if input =~ /^\s+/ then input.gsub!(/^\s+/,'') end
if input =~ /reges for operator_a/ #sets
sets operator_a
input.gsub!(/regex for operator_a)/, '')
elsif input =~ /regex for operator_b/
sets operator_b
input.gsub!(/regex for operator_b/,'')
elsif input =~ /regex for operator_c/
sets operator_c
etc .. etc .. etc..
else
break
end
}
The question I have, What would be the best way to optimize this code? Treetop, another library/gem that I have not found yet, combining the loops into one huge regex, something else?
Please restrict all answers and input to the Ruby language, I know that it is not the 'best' tool for this job, it is the language that I use.
More specific grammer / examples if that helps.
This is for parsing communication commands sent to a game by users, so far the only commands are say, and whisper. The begenning of line operators accepted are ::{target}, :{adverb}, ={verb}, and #{direction of}. The end of line operators are {emoticon (aka. :D :( :)}, which sets adverb if not already set, and end of line puncutation which sets verb if not already set.
the character ' is an alias for say, and sayto is an alias for say::
examples :
':happy::my sword=as# my helm Bol command operators work.
{:action=>:say, :adverb=>"happily", :verb=>"ask", :direction=>"my helm", :message=>"Bol command operators work."}
say yep say works
{:action=>:say, :message=>" yep say works"}
sayto my sword yep sayto works as do EoL operators!:)
{:action=>:say, :target=>"my sword", :adverb=>"happily", :verb=>"say", :message=>"yep sayto works as do EoL operators!"}
whisper::my friend : happy Bol command operators work with
whisper.
{:action=>:whisper, :target=>"my friend", :adverb=>"happily", :message=>"Bol command operators work with whisper."}
whisp:happy::tinkerbell and they work in a different order.
{:action=>:whisper, :adverb=>"happily", :target=>"tinkerbell", :message=>"and they work in a different order."}
':bash=exclaim::hammer BoL operators work in this order too.
{:action=>:say, :adverb=>"bashfully", :verb=>"exclaim", :target=>"hammer", :message=>"BoL operators work in this order too."}
sayto bells =say :sad #wontwork Bol > Eol and directed !work with
directional? :)
{:action=>:say, :verb=>"say", :adverb=>"sadly", :direction=>"wontwork", :message=>"Bol > Eol and directed !work with directional?"}
'all EoL removed closest to end used and reinserted. !!??!?....... :)
? :(
{:action=>:say, :adverb=>"sadly", :verb=>"ask", :message=>"all EoL removed closest to end used and reinserted?"}
Maybe this syntax is useful in your case:
emoti_convert = { ":)" => "happily", ":(" => "sadly" }
re_emoti = Regexp.union(emoti_convert.keys)
str = "It does not work :(. Oh, it does :)!"
p str.gsub(re_emoti, emoti_convert)
#=> "It does not work sadly. Oh, it does happily!"
But if you are trying to define a grammar, this is not the way to go (agreeing with #Dave Newton's comments).

style opinion re. empty If block

I'm trying to curb some of the bad habits of a self-proclaimed "senior programmer." He insists on writing If blocks like this:
if (expression) {}
else {
statements
}
Or as he usually writes it in classic ASP VBScript:
If expression Then
Else
statements
End If
The expression could be something as easily negated as:
if (x == 0) {}
else {
statements
}
Other than clarity of coding style, what other reasons can I provide for my opinion that the following is preferred?
if (x != 0) {
statements
}
Or even the more general case (again in VBScript):
If Not expression Then
statements
End If
Reasons that come to my mind for supporting your opinion (which I agree with BTW) are:
Easier to read (which implies easier to understand)
Easier to maintain (because of point #1)
Consistent with 'established' coding styles in most major programming languages
I have NEVER come across the coding-style/form that your co-worker insists on using.
I've tried it both ways. McConnell in Code Complete says one should always include both the then and the else to demonstrate that one has thought about both conditions, even if the operation is nothing (NOP). It looks like your friend is doing this.
I've found this practice to add no value in the field because unit testing handles this or it is unnecessary. YMMV, of course.
If you really want to burn his bacon, calculate how much time he's spending writing the empty statements, multiply by 1.5 (for testing) and then multiply that number by his hourly rate. Send him a bill for the amount.
As an aside, I'd move the close curly bracket to the else line:
if (expression) {
} else {
statements
}
The reason being that it is tempting to (or easy to accidentally) add some statement outside the block.
For this reason, I abhor single-line (bare) statements, of the form
if (expression)
statement
Because it can get fugly (and buggy) really fast
if (expression)
statement1
statement2
statement2 will always run, even though it might look like it should be subject to expression. Getting in the habit of always using brackets will kill this stumbling point dead.

Is this idiom pythonic? (someBool and "True Result" or "False Result")

I just came across this idiom in some open-source Python, and I choked on my drink.
Rather than:
if isUp:
return "Up"
else:
return "Down"
or even:
return "Up" if isUp else "Down"
the code read:
return isUp and "Up" or "Down"
I can see this is the same result, but is this a typical idiom in Python? If so, is it some performance hack that runs fast? Or is it just a once-off that needs a code review?
The "a and b or c" idiom was the canonical way to express the ternary arithmetic if in Python, before PEP 308 was written and implemented. This idiom fails the "b" answer is false itself; to support the general case, you could write
return (a and [b] or [c])[0]
An alternative way of spelling it was
return (b,c)[not a]
which, with the introduction of the bool type, could be rewritten as
return (c,b)[bool(a)]
(in case it isn't clear: the conversion to bool, and the not operator, is necessary if a is not known to be bool already)
Today, the conditional expression syntax should be used if the thing must be an expression; else I recommend to use the if statement.
You should read Using the and-or trick (section 4.6.1) of Dive Into Python by Mark Pilgrim. It turns out that the and-or trick has major pitfalls you should be aware of.
That code is a big fugly and clever for my tastes, but I suppose there's not anything wrong with it per se. I think this is really just a case of "make it all fit in one line" syndrome.
I personally would have opted for the first form though.
Yikes. Not readable at all. For me pythonic means easy to read.
return isUp and "Up" or "Down"
Sounds something you would do in perl.
No, it is not.
I had a somehow similar question the other day.
if the construct
val if cond else alt
Was not very welcome ( at least by the SO community ) and the preferred one was:
if cond:
val
else:
alt
You can get your own conclusion. :)

Resources