I'm wondering why calls to operator methods don't require a dot? Or rather, why can't normal methods be called without a dot?
Example
class Foo
def +(object)
puts "this will work"
end
def plus(object)
puts "this won't"
end
end
f = Foo.new
f + "anything" # "this will work"
f plus "anything" # NoMethodError: undefined method `plus' for main:Object
The answer to this question, as to pretty much every language design question is: "Just because". Language design is a series of mostly subjective trade-offs. And for most of those subjective trade-offs, the only correct answer to the question why something is the way it is, is simply "because Matz said so".
There are certainly other choices:
Lisp doesn't have operators at all. +, -, ::, >, = and so on are simply normal legal function names (variable names, actually), just like foo or bar?
(plus 1 2)
(+ 1 2)
Smalltalk almost doesn't have operators. The only special casing Smalltalk has is that methods which consist only of operator characters do not have to end with a colon. In particular, since there are no operators, all method calls have the same precedence and are evaluated strictly left-to-right: 2 + 3 * 4 is 20, not 14.
1 plus: 2
1 + 2
Scala almost doesn't have operators. Just like Lisp and Smalltalk, *, -, #::: and so on are simply legal method names. (Actually, they are also legal class, trait, type and field names.) Any method can be called either with or without a dot. If you use the form without the dot and the method takes only a single argument, then you can leave off the brackets as well. Scala does have precedence, though, although it is not user-definable; it is simply determined by the first character of the name. As an added twist, operator method names that end with a colon are inverted or right-associative, i.e. a :: b is equivalent to b.::(a) and not a.::(b).
1.plus(2)
1 plus(2)
1 plus 2
1.+(2)
1 +(2)
1 + 2
In Haskell, any function whose name consists of operator symbols is considered an operator. Any function can be treated as an operator by enclosing it in backticks and any operator can be treated as a function by enclosing it in brackets. In addition, the programmer can freely define associativity, fixity and precedence for user-defined operators.
plus 1 2
1 `plus` 2
(+) 1 2
1 + 2
There is no particular reason why Ruby couldn't support user-defined operators in a style similar to Scala. There is a reason why Ruby can't support arbitrary methods in operator position, simply because
foo plus bar
is already legal, and thus this would be a backwards-incompatible change.
Another thing to consider is that Ruby wasn't actually fully designed in advance. It was designed through its implementation. Which means that in a lot of places, the implementation is leaking through. For example, there is absolutely no logical reason why
puts(!true)
is legal but
puts(not true)
isn't. The only reason why this is so, is because Matz used an LALR(1) parser to parse a non-LALR(1) language. If he had designed the language first, he would have never picked an LALR(1) parser in the first place, and the expression would be legal.
The Refinement feature currently being discussed on ruby-core is another example. The way it is currently specified, will make it impossible to optimize method calls and inline methods, even if the program in question doesn't actually use Refinements at all. With just a simple tweak, it can be just as expressive and powerful, and ensure that the pessimization cost is only incurred for scopes that actually use Refinements. Apparently, the sole reason why it was specified this way, is that a) it was easier to prototype this way, and b) YARV doesn't have an optimizer, so nobody even bothered to think about the implications (well, nobody except Charles Oliver Nutter).
So, for basically any question you have about Ruby's design, the answer will almost always be either "because Matz said so" or "because in 1993 it was easier to implement that way".
The implementation doesn't have the additional complexity that would be needed to allow generic definition of new operators.
Instead, Ruby has a Yacc parser that uses a statically defined grammar. You get the built-in operators and that's it. Symbols occur in a fixed set of sentences in the grammar. As you have noted, the operators can be overloaded, which is more than most languages offer.
Certainly it's not because Matz was lazy.
Ruby actually has a fiendishly complex grammar that is roughly at the limit of what can be accomplished in Yacc. To get more complex would require using a less portable compiler generator or it would have required writing the parser by hand in C, and doing that would have limited future implementation portability in its own way as well as not providing the world with the Yacc input. That would be a problem because Ruby's Yacc source code is the only Ruby grammar documentation and is therefore "the standard".
Because Ruby has "syntax sugar" that allows for a variety of convenient syntax for preset situations. For example:
class Foo
def bar=( o ); end
end
# This is actually calling the bar= method with a parameter, not assigning a value
Foo.new.bar = 42
Here's a list of the operator expressions that may be implemented as methods in Ruby.
Because Ruby's syntax was designed to look roughly like popular OO languages, and those use the dot operator to call methods. The language it borrowed its object model from, Smalltalk, didn't use dots for messages, and in fact had a fairly "weird" syntax that many people found off-putting. Ruby has been called "Smalltalk with an Algol syntax," where Algol is the language that gave us the conventions you're talking about here. (Of course, there are actually more differences than just the Algol syntax.)
Missing braces was some "advantage" for ruby 1.8, but with ruby 1.9 you can't even write method_0 method_1 some param it will be rejected, so the language goes rather to the strict version instead of freeforms.
You would think that this would be an easy question, but I can't find the answer anywhere. >_<
Will Ruby throw syntax errors if my code is indented incorrectly? For example, would code like this work?
if str.blank?
str = "Hello World"
no_input = true
end
Obviously, this is bad style and I should indent correctly regardless. I want to know whether I can rule it out as the cause of a bug during debugging sessions.
Yes, it would work. Ruby only looks for the line breaks.
But since code readability is also very important, I'd say you should take care of whitespace if only for that sake.
Indentation is (Usually) a Stylistic Choice
In Ruby, indentation per se is not relevant, although the location of linebreaks and other whitespace may cause ambiguity for the parser or cause it to consider certain things as separate expressions when you didn't mean for them to be. Here-documents and multi-line strings are also areas where indentation will matter.
In all cases, the real question is "what does the parser see?" In your example, it should be functionally equivalent to the properly-indented code. However, if you really want to know what's going on under the hood, take a look at Ruby's Ripper module to see how your code is actually being parsed.
Ruby is not whitespace sensitive. Your code, although not pretty, will work.
However: http://www.ruby-forum.com/topic/56039
irb(main):001:0> a = ( 4 + 5 )
=> 9
irb(main):002:0> a = ( 4
irb(main):003:1> + 5 )
=> 5
irb(main):004:0> a = ( 4 +
irb(main):005:1* 5 )
=> 9
The question is: Can I define my own custom operator in Ruby, except for the ones found in
"Operator Expressions"?
For example: 1 %! 2
Yes, custom operators can be created, although there are some caveats. Ruby itself doesn't directly support it, but the superators gem does a clever trick where it chains operators together. This allows you to create your own operators, with a few limitations:
$ gem install superators19
Then:
require 'superators19'
class Array
superator "%~" do |operand|
"#{self} percent-tilde #{operand}"
end
end
puts [1] %~ [2]
# Outputs: [1] percent-tilde [2]
Due to the aforementioned limitations, I couldn't do your 1 %! 2 example. The Documentation has full details, but Fixnums can't be given a superator, and ! can't be in a superator.
No. You can only define operators already specified in ruby, +,-,!,/,%, etc. (you saw the list)
You can see for yourself this won't work
def HI
def %!
puts "wow"
end
end
This is largely due to the fact that the syntax parser would have to be extended to accept any code using your new operator.
As Darshan mentions this example alone may not be enough to realize the underlying problem. Instead let us take a closer look at how the parser could possibly handle some example code using this operator.
3 %! 0
While with my spacing it may seem obvious that this should be 3.%!(0) without spacing it becomes harder to see.
3%! can also be seen as 3.%(0.!) The parser has no idea which to chose. Currently, there is no way easy way to tell it. Instead, we could possibly hope to override the meaning of 3.%(0.!) but this isn't exactly defining a new operator, as we are still only limited to ruby's parsable symbols
You probably can't do this within Ruby, but only by modifying Ruby itself. I think modifying parse.y would be your best bet. parse.y famtour
I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.
I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When have you run into syntax that might be dated, never used or just plain obfuscated that you couldn't understand for the life of you.
For example, I never knew that comma is an actual operator in C. So when I saw the code
if(Foo(), Bar())
I just about blew a gasket trying to figure out what was going on there.
I'm curious what little never-dusted corners might exist in other languages.
C++'s syntax for a default constructor on a local variable. At first I wrote the following.
Student student(); // error
Student student("foo"); // compiles
This lead me to about an hour of reading through a cryptic C++ error message. Eventually a non-C++ newbie dropped by, laughed and pointed out my mistake.
Student student;
This is always jarring:
std::vector <std::vector <int> >
^
mandatory space.
When using the System.DirectoryServices name space to bind to an ADAM (Active Directory Application Mode; now called AD LDS, I think), I lost an entire day trying to debug this simple code:
DirectoryEntry rootDSE = new DirectoryEntry(
"ldap://192.168.10.78:50000/RootDSE",
login,
password,
AuthenticationTypes.None);
When I ran the code, I kept getting a COMException with error 0x80005000, which helpfully mapped to "Unknown error."
I could use the login and password and bind to the port via ADSI Edit. But this simple line of code didn't work. Bizarre firewall permission? Something screwed in configuration? Some COM object not registered correctly? Why on earth wasn't it working?
The answer? It's LDAP://, not ldap://.
And this is why we drink.
C++
class Foo
{
// Lots of stuff here.
} bar;
The declaration of bar is VERY difficult to see. More commonly found in C, but especially annoying in C++.
Perl's syntax caused me a bad day a while ago:
%table = {
foo => 1,
bar => 2
};
Without proper warnings (which are unavailable on the platform I was using), this creates a one-element hash with a key as the given hash reference and value undef. Note the subtle use of {}, which creates a new hash reference, and not (), which is an array used to populate the %table hash.
I was shocked Python's quasi-ternary operator wasn't a syntax error the first time I saw it:
X if Y else Z
This is stupid and common, but this syntax:
if ( x = y ) {
// do something
}
Has caught me about three times in the past year in a couple of different languages. I really like the R language's convention of using <- for assignment, like this:
x <- y
If the x = y syntax were made to mean x == y, and x <- y to mean assignment, my brain would make a smoother transition to and from math and programming.
C/C++'s bitvector syntax. The worst part about this is trying to google for it simply based on the syntax.
struct C {
unsigned int v1 : 12;
unsigned int v2 : 1;
};
C#'s ?? operator threw me for a loop the first time I saw it. Essentially it will return the LHS if it's non-null and the RHS if the LHS is null.
object bar = null;
object foo = bar ?? new Student(); // gets new Student()
Powershell's function calling semantics
function foo() {
params ($count, $name);
...
}
foo (5, "name")
For the non powershellers out there. This will work but not how you expect it to. It actually creates an array and passes it as the first argument. The second argument has no explicit value. The correct version is
foo 5 "name"
The first time I saw a function pointer in C++ I was confused. Worse, because the syntax has no key words, it was really hard to look up. What exactly does one type into a search engine for this?
int (*Foo)(float, char, char);
I ended up having to ask the local C++ guru what it was.
VB's (yeah yeah, I have to use it) "And" keyword - as in:
If Object IsNot Nothing And Object.Property Then
See that Object.Property reference, after I've made sure the object isn't NULL? Well, VB's "And" keyword * does * not * block * further * evaluation and so the code will fail.
VB does have, however, another keyword - AndAlso:
If Object IsNot Nothing AndAlso Object.Property Then
That will work as you'd expect and not explode when run.
I was once very confused by some C++ code that declared a reference to a local variable, but never used it. Something like
MyLock &foo;
(Cut me some slack on the syntax, I haven't done C++ in nearly 8 years)
Taking that seemingly unused variable out made the program start dying in obscure ways seemingly unrelated to this "unused" variable. So I did some digging, and found out that the default ctor for that class grabbed a thread lock, and the dtor released it. This variable was guarding the code against simultaneous updates without seemingly doing anything.
Javascript: This syntax ...
for(i in someArray)
... is for looping through arrays, or so I thought. Everything worked fine until another team member dropped in MooTools, and then all my loops were broken because the for(i in ...) syntax also goes over extra methods that have been added to the array object.
Had to translate some scientific code from old FORTRAN to C. A few things that ruined my day(s):
Punch-card indentation. The first 6 characters of every line were reserved for control characters, goto labels, comments, etc:
^^^^^^[code starts here]
c [commented line]
Goto-style numbering for loops (coupled with 6 space indentation):
do 20, i=0,10
do 10, j=0,10
do_stuff(i,j)
10 continue
20 continue
Now imagine there are multiple nested loops (i.e., do 20 to do 30) which have no differentiating indentation to know what context you are in. Oh, and the terminating statements are hundreds of lines away.
Format statement, again using goto labels. The code wrote to files (helpfully referred to by numbers 1,2,etc). To write the values of a,b,c to file we had:
write (1,51) a,b,c
So this writes a,b,c to file 1 using a format statement at the line marked with label 51:
51 format (f10.3,f10.3,f10.3)
These format lines were hundreds of lines away from where they were called. This was complicated by the author's decision to print newlines using:
write (1,51) [nothing here]
I am reliably informed by a lecturer in the group that I got off easy.
C's comma operator doesn't seem very obscure to me: I see it all the time, and if I hadn't, I could just look up "comma" in the index of K&R.
Now, trigraphs are another matter...
void main() { printf("wat??!\n"); } // doesn't print "wat??!"
Wikipedia has some great examples, from the genuinely confusing:
// Will the next line be executed????????????????/
a++;
to the bizarrely valid:
/??/
* A comment *??/
/
And don't even get me started on digraphs. I would be surprised if there's somebody here who can fully explain C's digraphs from memory. Quick, what digraphs does C have, and how do they differ from trigraphs in parsing?
Syntax like this in C++ with /clr enabled. Trying to create a Managed Dictionary object in C++.
gcroot<Dictionary<System::String^, MyObj^>^> m_myObjs;
An oldie:
In PL/1 there are no reserved words, so you can define variables, methods, etc. with the same name as the language keywords.
This can be a valid line of code:
IF ELSE THEN IF ELSE THEN
(Where ELSE is a boolean, and IF and THEN are functions, obviously.)
Iif(condition, expression, expression) is a function call, not an operator.
Both sides of the conditional are ALWAYS evaluated.
It always ruines my day if I have to read/write some kind of Polish notation as used in a lot of HP calculators...
PHP's ternary operator associates left to right. This caused me much anguish one day when I was learning PHP. For the previous 10 years I had been programming in C/C++ in which the ternary operator associates right to left.
I am still a little curious as to why the designers of PHP chose to do that when, in many other respects, the syntax of PHP matches that C/C++ fairly closely.
EDIT: nowadays I only work with PHP under duress.
Not really obscure, but whenever I code too much in one language, and go back to another, I start messing up the syntax of the latter. I always chuckle at myself when I realize that "#if" in C is not a comment (but rather something far more deadly), and that lines in Python do not need to end in a semicolon.
While performing maintentnace on a bit of C++ code I once spotted that someone had done something like this:
for (i=0; i<10; i++)
{
MyNumber += 1;
}
Yes, they had a loop to add 1 to a number 10 times.
Why did it ruin my day? The perpetrator had long since left, and I was having to bug fix their module. I thought that if they were doing something like this, goodness knows what else I was going to encounter!
AT&T assembler syntax >:(
This counter-intuitive, obscure syntax has ruined many of my days, for example, the simple Intel syntax assembly instruction:
mov dword es:[ebp-5], 1 /* Cool, put the value 1 into the
* location of ebp minus five.
* this is so obvious and readable, and hard to mistake
* for anything else */
translates into this in AT&T syntax
movl $1, %es:-4(%ebp) /* huh? what's "l"? 4 bytes? 8 bytes? arch specific??
* wait, why are we moving 1 into -4 times ebp?
* or is this moving -4 * ebp into memory at address 0x01?
* oh wait, YES, I magically know that this is
* really setting 4 bytes at ebp-5 to 1!
More...
mov dword [foo + eax*4], 123 /* Intel */
mov $123, foo(, %eax, 4) /* AT&T, looks like a function call...
* there's no way in hell I'd know what this does
* without reading a full manual on this syntax */
And one of my favorites.
It's as if they took the opcode encoding scheme and tried to incorporate it into the programming syntax (read: scale/index/base), but also tried to add a layer of abstraction on the data types, and merge that abstraction into the opcode names to cause even more confusion. I don't see how anyone can program seriously with this.
In a scripting language (Concordance Programming Language) for stand alone database software (Concordance) used for litigation document review, arrays were 0 indexed while (some) string functions were 1 indexed. I haven't touched it since.
This. I had my run in with it more then once.
GNU extensions are often fun:
my_label:
unsigned char *ptr = (unsigned char *)&&my_label;
*ptr = 5; // Will it segfault? Finding out is half the fun...
The syntax for member pointers also causes me grief, more because I don't use it often enough than because there's anything really tricky about it:
template<typename T, int T::* P>
function(T& t)
{
t.*P = 5;
}
But, really, who needs to discuss the obscure syntax in C++? With operator overloading, you can invent your own!