Can I change the encoding of a frozen String without copying it?

Can I change the encoding of a frozen String without copying it? - ruby

Can a String and its duplicate share the same underlying memory? Is there copy-on-write in Ruby?
I have a large, frozen String and I want to change its encoding. But I don't want to copy the whole String just to do that. For context, this is to pass values to a Google Protocol Buffer which has the bytes type and only accepts Encoding::ASCII_8BIT.
big_string.freeze
MyProtobuf::SomeMessage.new(
# I would prefer not to have to copy the whole string just to
# change the encoding.
value: big_string.dup.force_encoding(Encoding::ASCII_8BIT)
)

It seems to work just fine for me: (using MRI/YARV 1.9, 2.x, 3.x)
require 'objspace'
big_string = Random.bytes(1_000_000).force_encoding(Encoding::UTF_8)
big_string.encoding #=> #<Encoding:UTF-8>
big_string.bytesize #=> 1000000
ObjectSpace.memsize_of(big_string) #=> 1000041
dup_string = big_string.dup.force_encoding(Encoding::ASCII_8BIT)
dup_string.encoding #=> #<Encoding:ASCII-8BIT>
dup_string.bytesize #=> 1000000
ObjectSpace.memsize_of(dup_string) #=> 40
Those 40 bytes are the size to hold an object (RVALUE) in Ruby.
Note that instead of dup / force_encoding(Encoding::ASCII_8BIT) there's also b which returns a copy in binary encoding right away.
For more in-depth information, here's a blog post from 2012 (Ruby 1.9) about copy-on-write / shared strings in Ruby:
Seeing double: how Ruby shares string values
From the author's book Ruby Under a Microscope: (p. 265)
Internally, both JRuby and MRI use an optimization called copy-on-write for strings and other data. This trick allows two identical string values to share the same data buffer, which saves both memory and time because Ruby avoids making separate copies of the same string data unnecessarily.

Can a String and its duplicate share the same underlying memory? Is there copy-on-write in Ruby?
There is nothing in the Ruby Language Specification that prevents that. There is also nothing in the Ruby Language Specification that enforces that.
In general, the Ruby Language Specification tries to stay silent on all things related to memory management, space complexity, step complexity, or time complexity. This is not exclusive to the Ruby Language Specification, most Language Specifications try to leave the implementors as much leeway as possible. In other words, Language Specifications tend to specify Syntax and Semantics and leave the Pragmatics up to the implementor. (C++ is somewhat of an exception in that it specifies space and time complexity for the algorithms in the standard library.) Even C, which is typically thought of as a language which gives you full control over everything, doesn't actually specify things like memory layouts precisely – for example, due to the definition of the term width in the standard, a uint16_t is actually allowed to occupy more than 16 bits!
Every implementor is free to implement strings however they want, as long as they comply with the semantics defined in the Ruby Language Specification.
If I remember correctly, both Rubinius and TruffleRuby did, at one point, experiment with a String implementation based on Ropes. Chris Seaton, TruffleRuby's lead developer, wrote a paper about that implementation. However, I don't know if they are still using it. (I know TruffleRuby switched to Truffle Strings recently, and I am not sure what their underlying representation is … or whether they are even guaranteeing a specific underlying representation.)
There is problem with the answer "you have to look at the specification", though: unfortunately, unlike many other programming languages, the Ruby Language Specification does not exist as a single document in a single place. Ruby does not have a single formal specification that defines what certain language constructs mean.
There are several resources, the sum of which can be considered kind of a specification for the Ruby programming language.
Some of these resources are:
The ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification – Note that the ISO Ruby Specification was written around 2009–2010 with the specific goal that all existing Ruby implementations at the time would easily be compliant. Since YARV and MacRuby only implement Ruby 1.9+ and MRI only implements Ruby 1.8 and lower and JRuby, XRuby, Ruby.NET, and IronRuby (at the time) only implemented a subset of Ruby 1.8, this means that the ISO Ruby Specification only contains features that are common to both Ruby 1.8 and Ruby 1.9. Also, the ISO Ruby Specification was specifically intended to be minimal and only contain the features that are absolutely required for writing Ruby programs. Because of that, it does for example only specify Strings very broadly (since they have changed significantly between Ruby 1.8 and Ruby 1.9). It obviously also does not specify features which were added after the ISO Ruby Specification was written, such as Ractors or Pattern Matching.
The Ruby Spec Suite aka ruby/spec – Note that the ruby/spec is unfortunately far from complete. However, I quite like it because it is written in Ruby instead of "ISO-standardese", which is much easier to read for a Rubyist, and it doubles as an executable conformance test suite.
The Ruby Programming Language by David Flanagan and Yukihiro 'matz' Matsumoto – This book was written by David Flanagan together with Ruby's creator matz to serve as a Language Reference for Ruby.
Programming Ruby by Dave Thomas, Andy Hunt, and Chad Fowler – This book was the first English book about Ruby and served as the standard introduction and description of Ruby for a long time. This book also first documented the Ruby core library and standard library, and the authors donated that documentation back to the community.
The Ruby Issue Tracking System, specifically, the Feature sub-tracker – However, please note that unfortunately, the community is really, really bad at distinguishing between Tickets about the Ruby Programming Language and Tickets about the YARV Ruby Implementation: they both get intermingled in the tracker.
The Meeting Logs of the Ruby Developer Meetings. (Same problem: Ruby and YARV get intermingled.)
New features are often discussed on the mailing lists, in particular the ruby-core (English) and ruby-dev (Japanese) mailing lists. (Same problem again.)
The Ruby documentation – Again, be aware that this documentation is generated from the source code of YARV and does not distinguish between features of Ruby and features of YARV.
In the past, there were a couple of attempts of formalizing changes to the Ruby Specification, such as the Ruby Change Request (RCR) and Ruby Enhancement Proposal (REP) processes, both of which were unsuccessful.
If all else fails, you need to check the source code of the popular Ruby implementations to see what they actually do. Please note the plural: you have to look at multiple, ideally all, implementations to figure out what the consensus is. Only looking at one implementation cannot possibly tell you whether what you are looking at is an implementation quirk of this particular implementation or is a universally agreed-upon behavior of the Ruby Language.

Related

What does Ruby's execution stack look like?

I found one webpage that describes how Ruby's execution stack looks like. It says that Ruby has seven stacks:
Is this article true?

This article focuses on the way ruby works in versions from 1.7 to 1.8. With introduction of YARV things have changed a lot. To better understand how Ruby works internally I'd recommend Ruby Under a Microscope. There are chapters on how Ruby execution stack works

No, this does not describe how Ruby works. This describes how MRI works. MRI is only one of many implementations of Ruby. The Ruby Programming Language does not specify any particular implementation strategy for memory management. It is perfectly valid to implement Ruby without any stack at all.
There are many implementations of Ruby. The most widely-used one currently is YARV, but there's also MRuby, JRuby, MagLev, Ruby+OMR, TruffleRuby, Rubinius (those last three are the most interesting IMO). MRI isn't even maintained any more. In the past, there were also IronRuby, IronRuby (yes, actually, there were two different implementations with that name), Ruby.NET, tinyrb, XRuby, SmallRuby, BlueRuby, Cardinal, and many others.
AFAIK, none of those works in the way that is described here, only MRI does.

Is Ruby a scripting language or an interpreted language?

I just noticed that in the wikipedia page of Ruby, this language is defined as interpreted language.
I understood that probably there's something missing in my background. I have always known the difference between an interpreted language that doesn't need a compiler and a compiled language (who require to be compiled before the execution of programs), but what characterize a scripting language ?
Is Ruby definable as a scripting language ?
Thank you and forgive me for the black out

Things aren't just black and white. At the very least, they're also big and small, loud and quiet, blue and orange, grey and gray, long and short, right and wrong, etc.
Interpreted/compiled is just one way to categorize languages, and it's completely independent from (among countless other things) whether you call the same language a "scripting language" or not. To top it off, it's also a broken classification:
Interpreted/compiled depends on the language implementation, not on the language (this is not just theory, there are indeed quite a few languages for which both interpreters and compilers exist)
There are language implementations (lots of them, including most Ruby implementations) that are compilers, but "only" compile to bytecode and interpret that bytecode.
There are also implementations that switch between interpreting and compiling to native code (JIT compilers).
You see, reality is a complex beast ;) Ruby is, as mentioned above, frequently compiled. The output of that compilation is then interpreted, at least in some cases - there are also implementations that JIT-compile (Rubinius, and IIRC JRuby compiles to Java bytecode after a while). The reference implementation has been a compiler for a long time, and IIRC still is. So is Ruby interpreted or compiled? Neither term is meaningful unless you define it ;)
But back to the question: "Scripting language" isn't a property of the language either, it depends on how the language is used - namely, whether the language is used for scripting tasks. If you're looking for a definition, the Wikipedia page on "Scripting language" may help (just don't let them confuse you with the notes on implementation details such as that scripts are usually interpreted). There are indeed a few programs that use Ruby for scripting tasks, and there are doubtless numerous free-standing Ruby programs that would likely qualify as scripts (web scraping, system administration, etc).
So yes, I guess one can call Ruby a scripting language. Of course that doesn't mean a ruby on rails web app is just a script.

Yes.
Detailed response:
A scripting language is typically used to control applications that are often not written in this language. For example, shell scripts etc. can call arbitrary console applications.
Ruby is a general purpose dynamic language that is frequently used for scripting.
You can make arbitrary system calls using backtick notation like below.
`<system command>`
There are also many excellent Ruby gems such as Watir and RAutomation for automating web and native GUIs.
For definition of scripting language, see here.

The term 'scripting language' is very broad, and it can include both interpreted and compiled languages. Ruby in particular, might compiled or interpreted depending on what particular implementation we're using - for instance, JRuby gets compiled to bytecode, whereas CRuby (Ruby's reference implementation) is interpreted

NoSQL DB written in Ruby?

Was curious, but are any NoSQL DBMS written in Ruby?
And if not, would it be unwise to create one in Ruby?

Was curious, but are any NoSQL DBMS written in Ruby?
In 2007, Anthony Eden played around with RDDB, a CouchDB-inspired document-oriented database. He still keeps a copy of the code in his GitHub account.
I vaguely remember that at or around the same time, someone else was also playing around with a database in Ruby. I think it was either inspired by or a reaction to RDDB.
Last but not least, there is the PStore library in the stdlib, which – depending on your definition – may or may not count as a database.
And if not, would it be unwise to create one in Ruby?
The biggest problem I see in Ruby are its concurrency primitives. Threads and locks are so 1960s. If you want to support multiple concurrent users, then you obviously need concurrency, although if you want to build an embedded in-process database, then this is much less of a concern.
Other than that, there are some not-so-stellar implementations of Ruby, but that is not a limitation of Ruby but of those particular implementations, and it applies to pretty much every other programming language as well. Rubinius (especially the current development trunk, which adds Ruby 1.9 compatibility and removes the Global Interpreter Lock) and JRuby would both be fine choices.
As an added bonus, Rubinius comes with a built-in actors library for concurrency and JRuby gives you access to e.g. Clojure's concurrency libraries or the Akka actors library.
Performance isn't really much of a concern, I think. Rubinius's Hash class, which is written in 100% pure Ruby, performs comparably to YARV's Hash class, which is written in 100% hand-optimized C. This shows you that Ruby code, at least when it is carefully written, can be just as fast as C, especially since databases tend to be long-running and thus Rubinius's or JRuby's (and in the latter case specifically also the JVM's) dynamic optimizers (which C compilers typically do not have) can really get to work.

Ruby is just too slow for any type of DBMS
c/c++/erlang are generally the best choice.

You generally shouldn't care in what programming language was a DBMS implemented as long it has all the features and is available for use from your application programming language of choice.
So, the real question here is do you need one written in Ruby or available for use in Ruby.
In first case, I doubt you'll find a DBMS natively written in Ruby (any correction of this statement will be appreciated).
In second case, you should be able to find Ruby bindings/wrappers for any decent DBMS relational or not.

Convert Ruby to low level languages?

I have all kind of scripting with Ruby:
rails (symfony)
ruby (php, bash)
rb-appscript (applescript)
Is it possible to replace low level languages with Ruby too?
I write in Ruby and it converts it to java, c++ or c.
Cause People say that when it comes to more performance critical tasks in Ruby, you could extend it with C. But the word extend means that you write C files that you just call in your Ruby code. I wonder, could I instead use Ruby and convert it to C source code which will be compiled to machine code. Then I could "extend" it with C but in Ruby code.
That is what this post is about. Write everything in Ruby but get the performance of C (or Java).
The second advantage is that you don't have to learn other languages.
Just like HipHop for PHP.
Are there implementations for this?

Such a compiler would be an enormous piece of work. Even if it works, it still has to
include the ruby runtime
include the standard library (which wasn't built for performance but for usability)
allow for metaprogramming
do dynamic dispatch
etc.
All of these inflict tremendous runtime penalties, because a C compiler can neither understand nor optimize such abstractions. Ruby and other dynamic languages are not only slower because they are interpreted (or compiled to bytecode which is then interpreted), but also because they are dynamic.
Example
In C++, a method call can be inlined in most cases, because the compiler knows the exact type of this. If a subtype is passed, the method still can't change unless it is virtual, in which case a still very efficient lookup table is used.
In Ruby, classes and methods can change in any way at any time, thus a (relatively expensive) lookup is required every time.
Languages like Ruby, Python or Perl have many features that simply are expensive, and most if not all relevant programs rely heavily on these features (of course, they are extremely useful!), so they cannot be removed or inlined.
Simply put: Dynamic languages are very hard to optimize, simply doing what an interpreter would do and compiling that to machine code doesn't cut it. It's possible to get incredible speed out of dynamic languages, as V8 proves, but you have to throw huge piles of money and offices full of clever programmers at it.

There is https://github.com/seattlerb/ruby_to_c Ruby To C compiler. It actually only takes in a subset of Ruby though. I believe the main missing part is the Meta Programming features

In a recent interview (November 16th, 2012) Yukihiro "Matz" Matsumoto (creator of Ruby) talked about compiling Ruby to C
(...) In University of Tokyo a research student is working on an academic research project that compiles Ruby code to C code before compiling the binary code. The process involves techniques such as type inference, and in optimal scenarios the speed could reach up to 90% of typical hand-written C code. So far there is only a paper published, no open source code yet, but I’m hoping next year everything will be revealed... (from interview)
Just one student is not much, but it might be an interesting project. Probably a long way to go to full support of Ruby.

"Low level" is highly subjective. Many people draw the line differently, so for the sake of this argument, I'm just going to assume you mean compiling Ruby down to an intermediate form which can then be turned into machine code for your particular platform. I.e., compiling ruby to C or LLVM IR, or something of that nature.
The short answer is yes this is possible.
The longer answer goes something like this:
Several languages (Objective-C most notably) exist as a thin layer over other languages. ObjC syntax is really just a loose wrapper around the objc_*() libobjc runtime calls, for all practical purposes.
Knowing this, then what does the compiler do? Well, it basically works as any C compiler would, but also takes the objc-specific stuff, and generates the appropriate C function calls to interact with the objc runtime.
A ruby compiler could be implemented in similar terms.
It should also be noted however, that just by converting one language to a lower level form does not mean that language is instantly going to perform better, though it does not mean it will perform worse either. You really have to ask yourself why you're wanting to do it, and if that is a good reason.

There is also JRuby, if you still consider Java a low level language. Actually, the language itself has little to do here: it is possible to compile to JVM bytecode, which is independent of the language.

Performance doesn't come solely from "low level" compiled languages. Cross-compiling your Ruby program to convoluted, automatically generated C code isn't going to help either. This will likely just confuse things, include long compile times, etc. And there are much better ways.
But you first say "low level languages" and then mention Java. Java is not a low-level language. It's just one step below Ruby in terms of high- or low-level languages. But if you look at how Java works, the JVM, bytecode and just-in-time compilation, you can see how high level languages can be fast(er). Ruby is currently doing something similar. MRI 1.8 was an interpreted language, and had some performance problems. 1.9 is much faster, it's using a bytecode interpreter. I'm not sure if it'll ever happen on MRI, but Ruby is just one step away from JIT on MRI.
I'm not sure about the technologies behind jRuby and IronRuby, but they may already be doing this. However, both have their own advantages and disadvantages. I tend to stick with MRI, it's fast enough and it works just fine.

It is probably feasible to design a compiler that converts Ruby source code to C++. Ruby programs can be compiled to Python using the unholy compiler, so they could be compiled from Python to C++ using the Nuitka compiler.
The unholy compiler was developed more than a decade ago, but it might still work with current versions of Python.
Ruby2Cextension is another compiler that translates a subset of Ruby to C++, though it hasn't been updated since 2008.

If Java people go to Scala, C# go to F#, where do Ruby people go for functional nirvana? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know a lot of Java people have started looking at Scala since it runs on the JVM, and a lot of people in the Microsoft world are looking at F#, but what does Ruby have as a natural functional successor?
In a pure FP sense Ruby doesn't lack anything, instead it has too much some may say. A functional language forces the programmer to not use global variables and other idioms so much (although it is possible to use globals in functional languages)

There's two very different definitions of what "functional programming" means. You can kind-of do the one in Ruby, but you cannot do the other.
Those two definitions are:
programming with first-class functions and
programming with mathematical functions
You can kind-of program with first-class functions in Ruby. It has support for first-class functions. In fact, it has too much support for them: there is Proc.new, proc, lambda, Method, UnboundMethod, blocks, #to_proc and ->() (and probably some others that I forget).
All of these behave slightly differently, have slightly different syntax, slightly different behavior and slightly different restrictions. For example: the only one of these which is syntactically lightweight enough that you can actually use it densely, is blocks. But blocks have some rather severe restrictions: you can only pass one block to a method, blocks aren't objects (which in an object-oriented language in wich "everything is an object" is a very severe restriction) and at least in Ruby 1.8 there are also some restrictions w.r.t parameters.
Referring to a method is another thing that is fairly awkward. In Python or ECMAScript for example, I can just say baz = foo.bar to refer to the bar method of the foo object. In Ruby, foo.bar is a method call, if I want to refer to the bar method of foo, I have to say baz = foo.method(:bar). And if I now want to call that method, I cannot just say baz(), I have to say baz.call or baz[] or (in Ruby 1.9) baz.().
So, first-class functions in Ruby aren't really first-class. They are much better than second-class, and they are good enough™, but they aren't fully first-class.
But generally, Rubyists do not leave Ruby just for first-class functions. Ruby's support is good enough that any advantages you might gain from better support in another language usually is eaten up by the training effort for the new language or by something else that you are accustomed to that you must now give up. Like, say RubyGems or tight Unix integration or Ruby on Rails or syntax or …
However, the second definition of FP is where Ruby falls flat on its face. If you want to do programming with mathematical functions in Ruby, you are in for a world of pain. You cannot use the absolute majority of Ruby libraries, because most of them are stateful, effectful, encourage mutation or are otherwise impure. You cannot use the standard library for the same reasons. You cannot use the core library. You cannot use any of the core datatypes, because they are all mutable. You could just say "I don't care that they are mutable, I will simply not mutate them and always copy them", but the problem is: someone else still can mutate them. Also, because they are mutable, Ruby cannot optimize the copying and the garbage collector isn't tuned for that kind of workload.
It just doesn't work.
There is also a couple of features that have really nothing to do with functional programming but that most functional languages tend to have, that Ruby is missing. Pattern matching, for example. Laziness also was not that easy to achieve before Enumerators were more aggressively used in Ruby 1.9. And there's still some stuff that works with strict Enumerables or Arrays but not with lazy Enumerators, although there's actually no reason for them to require strictness.
And for this definition of FP, it definitely makes sense to leave Ruby behind.
The two main languages that Rubyists have been flocking to, are Erlang and Clojure. These are both relatively good matches for Ruby, because they are both dynamically typed, have a similar REPL culture as Ruby, and (this is more a Rails thing than a Ruby thing) are also very good on the web. They have still pretty small and welcoming communities, the original language creators are still active in the community, there is a strong focus on doing new, exciting and edgy things, all of which are traits that the Ruby community also has.
The interest in Erlang started, when someone showed the original 1993 introduction video "Erlang: The Movie" at RubyConf 2006. A couple of high-profile Rails projects started using Erlang, for example PowerSet and GitHub. Erlang is also easy to master for Rubyists, because it doesn't take purity quite as far as Haskell or Clean. The inside of an actor is pretty pure, but the act of sending messages itself is of course a side-effect. Another thing that makes Erlang easy to grasp, is that Actors and Objects are actually the same thing, when you follow Alan Kay's definition of object-oriented programming.
Clojure has been a recent addition to the Rubyist's toolbelt. Its popularity is I guess mostly driven by the fact that the Ruby community has finally warmed up to the idea that JVM ≠ Java and embraced JRuby and then they started to look around what other interesting stuff there was on the JVM. And again, Clojure is much more pragmatic than both other functional languages like Haskell and other Lisps like Scheme and much simpler and more modern than CommonLisp, so it is a natural fit for Rubyists.
Another cool thing about Clojure is that because both Clojure and Ruby run on the JVM, you can combine them.
The author of "Programming Clojure" (Stuart Halloway) is a (former?) Rubyist, for example, as is Phil Hagelberg, the author of the Leiningen build tool for Clojure.
However, Rubyists are also looking at both Scala (as one of the more pragmatic statically typed FP languages) and Haskell (as one of the more elegant ones). Then there is projects like Scuby and Hubris which are bridges that let you integrate Ruby with Scala and Haskell, respectively. Twitter's decision to move part of their low-level messaging infrastructure first from MySQL to Ruby, then from Ruby to Scala is also pretty widely known.
F# doesn't seem to play any role at all, possibly due to an irrational fear towards all things Microsoft the Ruby community has. (Which, BTW, seems mostly unfounded, given that the F# team has always made versions available for Mono.)

Java people are using a language on the JVM and want a more functional one compatible with their runtime, so they go to Scala.
C# people are using a language on the CLR and want a more functional one compatible with their runtime, so they go to F#.
Ruby people are using a language that's already pretty functional, and they're using it on a number of underlying runtimes (JRuby, IronRuby, MRI, MacRuby, Rubinius, etc...). I don't think it has a natural functional successor, or even needs one.

Any version of Lisp should be fine.

Ruby it self is a kind of Functional programming language, So I don't see any special dialects for FP using ruby.

In hype level, Haskell.

Assuming Ruby people don't just go to the JVM themselves, I think most would adopt Erlang, being another dynamically typed language.

Ruby isn't as functional as say Lisp, but it is just functional enough that you can do some functional programming in a good fun way. (unlike trying to do functional programming in something like C#)
Also, it actually forces you into functional paradigms in some of its syntax, such as the heavy use of blocks and yield. (which I fell in love with after learning Ruby).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio