I know that the Fixnum class inherits from the Integer class. But what is the actual difference between them? Are there any use cases where we sometimes use Fixnum, and sometimes use Integer instead?
UPDATE: As of Ruby 2.4, the Fixnum and Bignum classes are gone, there is only Integer. The exact same optimizations still exist, but they are treated as "proper" compiler optimizations, i.e. behind the scenes, invisible to the programmer.
This is somewhat confusing. Integer is the real class that you should think about. Fixnum is basically a performance optimization that should never have been made visible to the programmer in the first place. (Compare this with flonums in YARV, which are implemented entirely as an optimization inside the VM, and never exposed to the programmer.)
Basically, Fixnums are fast and Bignums are slow(er), and the implementation automatically switches back and forth between them. You never ask for one of those directly, you will just get one or the other, depending on whether your integer fits into the restricted size of a Fixnum or not.
You never "use" Integer. It is an abstract class whose job is to endow its children (Fixnum and Bignum) with methods. Under effectively no circumstances will you ever ask for an object's class and be told that it is an Integer.
Related
I had a look through the 0.methods output in irb and couldn't see what path the ruby interpreter would take when it was passed 0.15 as opposed to 0.to_s
I've tried reading up on how ruby determines the difference between a floating point number being defined and a method being called on an integer but I haven't come to any conclusions.
The best guess I have is that because Ruby doesn't allow for a digit to lead a method name, it simply checks whether the character following the . is numeric or alphabetical.
I don't like guessing though, assumptions can lead to misunderstandings. Can someone clear this up for me?
How well can you read Yacc files? (Rhetorical question)
https://github.com/ruby/ruby/blob/trunk/parse.y#L7380 I believe this is where the Ruby parser handles floating point tokenisation.
Disclaimer: parse.y hurts my head.
As Methods in Ruby cannot begin with numbers, it's pretty easy to determine that 6.foo is a Method call and 6.12 is a Float.
You can distinguish both of them by pretty simple regular grammar specs, which is all what a Lexer needs to tokenize the source code.
I don't know for sure, but I think it is safe to assume that the two are distinguished by method names being unable to start with a number.
I don't see that it's an especially interesting or useful thing to know, and I think your curiosity is best directed elsewhere.
The changelog for version 0.8 of vector lists the following change with a warning:
Functor, Monad, Applicative, Alternative, Foldable and Traversable
instances for boxed vectors (WARNING: they tend to be slow and are
only provided for completeness).
Could someone explain why this is the case? Is it just the normal cost of typeclass specialization, or something more interesting?
Update: Looking at some particular instances, one sees for example:
instance Foldable.Foldable Vector where
{-# INLINE foldr #-}
foldr = foldr
and similarly for the other folds. Does this mean that folding is slow for Vectors in general? If not, what makes a non-specialized fold slow enough to warrant a warning?
I submitted the original set of these instances to Roman a year and a half ago and have maintained vector-instances since then. (I had to remove these instances from vector-instances once they migrated into vector, and now maintain it solely for the really exotic stuff). His concern was that if folks used these instances polymorphically then the RULES that make Vectors fuse away can't fire unless the polymorphic function gets inlined and monomorphized.
They exist because not every bit of code on the planet is Vector-specific and even then it is nice to sometimes use the common names.
Slow here is relative. The worst case is they perform like anybody else's folds, binds, etc. but Roman takes every single boxed value as a personal insult. :)
I've just had a quick look at the source code and the implementations don't look excessively slow. I'd argue the authors added this warning because when you're writing a program in the Vector monad, you're working from such a high-level point of view that it's easy to forget that every >>= is, in fact, a concatMap, which tends to be inherently slow.
Another thing: Vector is particularly fast for unboxed types. So a user might be attracted to use the monad notation (for convenience), while he should actually be using an unboxed type (for speed).
The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.
Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example
type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"
In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).
The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).
So why have symbols not been added to the language? Or am I thinking about this the wrong way?
I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.
The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.
Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.
In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.
I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:
Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".
Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).
Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.
Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.
So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.
Atoms aren't provided by the language, but can be implemented reasonably as a library:
http://hackage.haskell.org/package/simple-atom
There are a few other libs on hackage, but this one looks the most recent and well-maintained.
Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.
The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.
I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.
An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
Use Enum instead.
data FileType = GZipped | BZipped | Plain
deriving Enum
descr ft = ["compressed with gzip",
"compressed with bzip2",
"uncompressed"] !! fromEnum ft
I'm looking for examples of why it's not a good idea to extend base classes in ruby. I need to show some people why it's a weapon to be wielded carefully.
Any horror stories you can share?
There was a pretty famous example of monkey-patching going horribly wrong about 2.5 years ago in Rubinius.
The interesting thing about this case is that both the offending code and the victim were highly visible and highly unusual. Usually, the offender is some piece of code written by some PHP script kiddy who got drunk on his 1337 metaprogramming h4X0r skillz. And the failure mode is a simple ArgumentError exception, because the original method and the monkeypatch have different arity.
However, in this case, the offender was a library in the stdlib (mathn) and the failure mode was the Rubinius VM completely blowing up.
So, what happened? Well, mathn monkeypatches the Fixnum class and changes how Fixnum arithmetic works. In particular, it changes both the results and the types of several core methods. E.g.:
r = 4/3 # => 1
r.class # => Fixnum
require 'mathn'
r = 4/3 # => (4/3)
r.class # => Rational
The problem is of course that in Rubinius, the entire Ruby compiler, the entire Ruby kernel, large parts of the Ruby core library, some parts of the Rubinius VM and other parts of the Rubinius infrastructure, are all written in Ruby. And of course, all of those use Fixnum arithmetic all over the place.
The Hash class is written in Ruby and it uses Fixnum arithmetic to compute the size of the hash buckets, compute the hash function and so on. Array is written in Ruby and needs to compute element sizes and array lengths. The FFI library is written in Ruby and needs to compute memory addresses(!) and structure sizes. Many parts of Rubinius assume that they can do some Fixnum arithmetic and then pass the result to some C function as a pointer or int.
And since Ruby doesn't support any kind of selector namespacing or class boxing or similar (although something like that is planned for Ruby 2.0), as soon as some random user code requires the mathn library, all of those pieces just spectacularly explode, because all of a sudden, the result of a Fixnum operation is no longer a Fixnum (which is basically identical to a machine int and can be passed around as such), but a Rational (which is a full-fledged Ruby object).
Basically, what would happen, is that some code would require 'mathn' (or you would type that into IRb), and immediately the VM would just die.
The solution, in this case, was the safe math plugin for the compiler: when the compiler detects that it is compiling the kernel or other core parts of Rubinius, it automatically rewrites calls to Fixnum methods into calls to private immutable copies of those methods. [Note: I think in current versions of Rubinius, the problem is solved in a different way.]
The Trifecta of FAIL; or, how to patch Rails 2.0 for Ruby 1.8.7 has an example of Rails (which is a large, well-scrutinized project) causing problems because they monkeypatched String to add the method chars.
One obvious pitfall would be name collisions - if two or more packages choose the same name for a method that behaves differently.
Why does Ruby expose symbols for explicit use? Isn't that the sort of optimisation that's usually handled by the interpreter/compiler?
Part of the issue is that Ruby strings are mutable. Since every string Ruby allocates must be independent (it can't cache short/common ones), it's convenient to have a Symbol type to let the programmer have what are essentially immutable, memory-efficient strings.
Also, they share many characteristics with enum's, but with less pain for the programmer.
Ruby symbols are used in lieu of string constants in other similar languages. Besides the performance benefit, they can be used to semantically distinguish between string data and a more abstract symbol. Being syntactically different, they can clearly be distinguished in code.
Have a look at Ruby symbols post.