Ruby: How to break a potentially unicode string into bytes

Ruby: How to break a potentially unicode string into bytes - ruby

I'm writing a game which is taking user input and rendering it on-screen. The engine I'm using for this is entirely unicode-friendly, so I'd like to keep that if at all possible. The problem is that the rendering loop looks like this:
"string".each_byte do |c|
render_this_letter(c)
end
I don't know a whole lot about i18n, but I know enough to know the above code is only ever going to work for me and people who speak my language. I'd prefer something like:
"unicode string".each_unicode_letter do |u|
render_unicode_letter(u)
end
Does this exist in the core distribution? I'm somewhat averse to adding additional requirements to the install, but if it's the only way to do it, I'll live.
For extra fun, I have no way of knowing if the string is, in fact, a unicode string.
EDIT: The library I'm using can indeed render entire strings, however I'm letting the user edit what comes up on the fly - if they hit 'backspace', essentially, I need to know how many bytes to chop off the end.

Unfortunately ruby 1.8.x has poor unicode support. It's being addressed in 1.9. But in the mean time, libraries like this one (http://snippets.dzone.com/posts/show/4527) are a good solution. Using the linked library, your code would look something like this:
"unicode_string".each_utf8_char do |u|
render_unicode_letter(u)
end

You could try including the ActiveSupport::CoreExtensions::String::Unicode module from the rails codebase.

Related

Where is Ruby code, generated via metaprogramming, stored, and is it viewable/printable?

I've just started learning about metaprogramming in Ruby, and found myself wondering if it was possible to view (in someway) the code that had been generated. I want to, as a coding exercise, write a short method that will generate a Ruby file containing either a few method definitions or, ideally, an entire class or module definition.
I was thinking that perhaps just building up a string representation of the file and then merely writing it out might be a way to accomplish that, but that way doesn't really necessitate the use of metaprogramming, and since my goal is a metaprogramming exercise, I would like to figure out a way to incorporate it into that process or else do it another way.
I guess, if I was to take the string-building approach, I would like to start with something like
klass_string = "class GeneratedClass\n\t<BODY>\nend"
and then somehow save the output of something like this
define_method( :initialize ) do
instance_variable_set("#var", "some_value")
end
in a string that could replace '' in klass_string and then written out to a file. I know I could just put the above code snippet directly into the string, and it would workout fine, but I would like to have the output in a more standard format, as if it'd been written by hand and not generated:
class GeneratedClass
def initialize
#var = 'some_value'
end
end
Could someone point me in the right direction?

I agree with your comment that this question isn't really about metaprogramming so much as dynamic code generation / execution and introspection. Those are interesting topics, but not really metaprogramming. In particular your question about outputting ruby code to strings is about introspection, where as your string injection question is about dynamic code (just to try give you the words to google about what you're interested in).
Since your question is general and really around introspection and dynamic code, I'm going to reference you to some canonical and useful projects that can help you learn more..
ParseTree & Ruby Parser and Sourcify
Ruby Parser is a pure ruby implementation of ParseTree, so I'd recommend starting there to learn how to examine and "stringify" Ruby code. Play around with all of those, and in particular learn how they examine code in Ruby to generate their results. You'll learn a ton about how things work under the hood. Eric Hodel among others is real smart about this stuff.. Be warned though, this is really advanced stuff, but if that's where you want to build expertise, hopefully those references will help!

Javascript, IE, Strings, and Performance problems

So we have this product, and it's really slow in IE.
We've already applied a lot of the practices advised by the IE guys themselves (like this, and this), and try to sacrifice clean code for performance in the critical parts like DOM manipulation.
However, as you can see in this IE profiler screenshot..
Just "String" is the biggest offender. Almost 750ms of exclusive time.
Does this mean IE is spending 750ms just instantiating Strings? I also read this stuff on the Opera dev blog:
A build script can remove whitespace,
comments, replace strings with Array
lookups (to avoid MSIE creating a
string object for every single
instance of a string — even in
conditions)
But no more info regarding this. Anyone can clarify? It seems like IE has to create a full String instance every time you have " " in your code, which could explain this, but I don't know what the array lookup optimization would look like.
BTW- we don't really do much of string concatenation anywhere in the code.
The library we use is MooTools 1.2.4
Any suggestions will be appreciated! Thx
UPDATE- I'm particularly interested in the tip mentioned above about "array lookup optimization". Our library is big (1MB) so it has a lot of strings in it, like any other JS code. But since our library is bigger than most, these Strings are actually causing speed issues.
Also, does anybody know if adding stuff to the String.prototype makes every instance slower?

I'd grab a profiler that will give you a deeper view, you can see exactly what about String is taking so long. For IE specifically there's dynaTrace AJAX Edition (yes, it's free).
I'd fire up your same pages in there, it'll give you a tree breakdown so you can see what's going on...along with a hot spots view of exactly what low-level functions are taking the longest.

Strings are immutable in Javascript. Meaning when you do something like this:
alert("hello" + " world");
three strings are being created:
hello
word
hello world
Finding such instances and fixing those can be helpful. Like Nick said, using a profiler to pin down exactly what specific code with Strings is causing trouble is likely the best way to go.

How to structure a large Ruby application?

I'm considering writing a (large) desktop application in Ruby (actually a game, think something like Angband or Nethack using Gtk+ for the GUI). I'm coming from a C#/.NET background, so I'm a little at a lost for how to structure things.
In C#, I'd create many namespaces, like Application.Core, Application.Gui, etc). Parts of the application that didn't need the Gui wouldn't reference it with a using statement. From what I understand, in Ruby, the require statement basically does a textual insert that avoids duplicated code. What I'm concerned about, through the use of require statements, every file/class will have access everything else, because the ordering of the require statements.
I've read some ruby code that uses modules as namespaces. How does that work and how does it help?
Not sure what I'm getting at here... Does anyone have any good pointers on how to structure a large Ruby application? How about some non-trivial (and non-Rails) apps that use Ruby?

Ruby is no different from any other language when it comes to structuring your code. Do what feels right and it will probably work. I'm not entirely sure what problem you are anticipating. Are you worried about name clashes?
Modules are a good way to get pseudo namespaces, eg.
module Core
class Blah
self.def method
end
end
end
Core::Blah.method

Some of your problem isn't particular to Ruby, it's just about circular dependencies. Does Core depend on Gui or does Gui depend on Core? Or both?
A good way around some of this problem is with a very small "runner" component that depends on the core, the data access components, the gui, and so on, and ties these all together.

Designing a poker parser in Ruby

I'm writing a small program in Ruby to parse a hand history log from a poker site.
The log is split over several lines and looks a bit like this:
Table 123456 NL Hold'em $1/$2
5 Players
Seat 3 is the button
Seat 1: randomGuy112 $152.56
Seat 2: randomGirl99 $200
Seat 3: PokerPro $357.12
Seat 4: FishCake556 $57.19
Seat 6: MooMoo $188.98
Dealt to MooMoo [Ah, Ks]
randomGuy112 folds
randomGirl99 raises to $7
etc.. etc..
I want to summarise this information in an object which then might, for example,
render it differently or save it to database.
When I originally thought of this problem I thought I'd just have one realativly straight forward class with a number of regexes and several if/else statements. I then realised this could turn into quite a large method and potentially be a nightmare to debug/maintain. Keep in mind it needs to loop at each stage of the game (preflop,flop etc) to collect player's actions.
I also want to tackle this with a TDD approach, but the 'one long method' way means that the tests for with checking later input will kind of rely on earlier tests.
I'm quite new to Ruby and havn't yet clicked on the 'Ruby way' to do things. I'm catching myself writing C# code
in a different language.
Can you give me some pointers on how to design the parser so it isn't one huge mess of if/else statements and more testable?

Use Treetop
It does look like you are on the borderline between what ad hoc string matching and RE's are good for, and what requires an actual parser.
There is nothing wrong with handwritten parsers, and as long as you keep your methods short, without a lot of complexity in any given one, it's OK to have as many if statements in total as the parser requires.
I'm not sure 10 lines with incomprehensible regular expressions is any better than 30 lines of nice looking code.
Now, Ruby does have an advanced PEG parser generator. I think in this case I wouldn't worry about whether it was overkill, I would just go ahead and use Treetop.

State Machine, anyone?
At any point in the play of a poker hand there is a clearly-defined set of possible next actions. I'd think you could encapsulate them into a state machine. There are a few around, amongst which (no recommendations, I'm afraid - not enough experience with any) are
Alter Ego (updated July this year)
ruby-state-machine (seems also to be alive)
statemachine (looks a bit stale)

You can checkout this open source poker game hand parser
It looks like they created a hash of regular expressions and then they probably iterate over the regex data structures. It is a more simple machine than a parser and probably a more light weight approach.

I wrote hand history parser for PokerStars log files https://github.com/malikbakt/pokerstars

You may want to look at: StringScanner.

I have two different pointers for you, which will point you to the solution, on how to write code in the ruby way.
Get a ruby book. The ruby book will have a lot of examples on how to write code in the ruby way. From my personal expirience I can recommend you the pixake(is this spelled right?) book: http://www.ruby-doc.org/docs/ProgrammingRuby/html/index.html
Read existing ruby code. You seem to know enough ruby to write code? Then you should certainly be able to read existing code. I assume you already have installed ruby on your system. If so, you will find plenty of sourcecode on your harddrive. If not just use the internet.

I'd recommend the book Refactoring by Martin Fowler (available in both dead-tree and electronic formats, IIRC). He covers object-oriented remedies for exactly the design problems you're asking about, all in a test-driven context. This is one of those books that everyone in the profession should read.

How can I program defensively in Ruby?

Here's a perfect example of the problem: Classifier gem breaks Rails.
** Original question: **
One thing that concerns me as a security professional is that Ruby doesn't have a parallel of Java's package-privacy. That is, this isn't valid Ruby:
public module Foo
public module Bar
# factory method for new Bar implementations
def self.new(...)
SimpleBarImplementation.new(...)
end
def baz
raise NotImplementedError.new('Implementing Classes MUST redefine #baz')
end
end
private class SimpleBarImplementation
include Bar
def baz
...
end
end
end
It'd be really nice to be able to prevent monkey-patching of Foo::BarImpl. That way, people who rely on the library know that nobody has messed with it. Imagine if somebody changed the implementation of MD5 or SHA1 on you! I can call freeze on these classes, but I have to do it on a class-by-class basis, and other scripts might modify them before I finish securing my application if I'm not very careful about load order.
Java provides lots of other tools for defensive programming, many of which are not possible in Ruby. (See Josh Bloch's book for a good list.) Is this really a concern? Should I just stop complaining and use Ruby for lightweight things and not hope for "enterprise-ready" solutions?
(And no, core classes are not frozen by default in Ruby. See below:)
require 'md5'
# => true
MD5.frozen?
# => false

I don't think this is a concern.
Yes, the mythical "somebody" can replace the implementation of MD5 with something insecure. But in order to do that, the mythical somebody must actually be able to get his code into the Ruby process. And if he can do that, then he presumably could also inject his code into a Java process and e.g. rewrite the bytecode for the MD5 operation. Or just intercept the keypresses and not actually bother with fiddling with the cryptography code at all.
One of the typical concerns is: I'm writing this awesome library, which is supposed to be used like so:
require 'awesome'
# Do something awesome.
But what if someone uses it like so:
require 'evil_cracker_lib_from_russian_pr0n_site'
# Overrides crypto functions and sends all data to mafia
require 'awesome'
# Now everything is insecure because awesome lib uses
# cracker lib instead of builtin
And the simple solution is: don't do that! Educate your users that they shouldn't run untrusted code they downloaded from obscure sources in their security critical applications. And if they do, they probably deserve it.
To come back to your Java example: it's true that in Java you can make your crypto code private and final and what not. However, someone can still replace your crypto implementation! In fact, someone actually did: many open-source Java implementations use OpenSSL to implement their cryptographic routines. And, as you probably know, Debian shipped with a broken, insecure version of OpenSSL for years. So, all Java programs running on Debian for the past couple of years actually did run with insecure crypto!

Java provides lots of other tools for defensive programming
Initially I thought you were talking about normal defensive programming,
wherein the idea is to defend the program (or your subset of it, or your single function) from invalid data input.
That's a great thing, and I encourage everyone to go read that article.
However it seems you are actually talking about "defending your code from other programmers."
In my opinion, this is a completely pointless goal, as no matter what you do, a malicious programmer can always run your program under a debugger, or use dll injection or any number of other techniques.
If you are merely seeking to protect your code from incompetent co-workers, this is ridiculous. Educate your co-workers, or get better co-workers.
At any rate, if such things are of great concern to you, ruby is not the programming language for you. Monkeypatching is in there by design, and to disallow it goes against the whole point of the feature.

Check out Immutable by Garry Dolley.
You can prevent redefinition of individual methods.

I guess Ruby has that a feature - valued more over it being a security issue. Ducktyping too.
E.g. I can add my own methods to the Ruby String class rather than extending or wrapping it.

"Educate your co-workers, or get better co-workers" works great for a small software startup, and it works great for the big guns like Google and Amazon. It's ridiculous to think that every lowly developer contracted in for some small medical charts application in a doctor's office in a minor city.
I'm not saying we should build for the lowest common denominator, but we have to be realistic that there are lots of mediocre programmers out there who will pull in any library that gets the job done, paying no attention to security. How could they pay attention to security? Maybe the took an algorithms and data structures class. Maybe they took a compilers class. They almost certainly didn't take an encryption protocols class. They definitely haven't all read Schneier or any of the others out there who practically have to beg and plead with even very good programmers to consider security when building software.
I'm not worried about this:
require 'evil_cracker_lib_from_russian_pr0n_site'
require 'awesome'
I'm worried about awesome requiring foobar and fazbot, and foobar requiring has_gumption, and ... eventually two of these conflict in some obscure way that undoes an important security aspect.
One important security principle is "defense in depth" -- adding these extra layers of security help you from accidentally shooting yourself in the foot. They can't completely prevent it; nothing can. But they help.

If monkey patching is your concen, you can use the Immutable module (or one of similar function).
Immutable

You could take a look at Why the Lucky Stiff's "Sandbox"project, which you can use if you worry about potentially running unsafe code.
http://code.whytheluckystiff.net/sandbox/
An example (online TicTacToe):
http://www.elctech.com/blog/safely-exposing-your-app-to-a-ruby-sandbox

Raganwald has a recent post about this. In the end, he builds the following:
class Module
def anonymous_module(&block)
self.send :include, Module.new(&block)
end
end
class Acronym
anonymous_module do
fu = lambda { 'fu' }
bar = lambda { 'bar' }
define_method :fubar do
fu.call + bar.call
end
end
end
That exposes fubar as a public method on Acronyms, but keeps the internal guts (fu and bar) private and hides helper module from outside view.

If someone monkeypatched an object or a module, then you need to look at 2 cases: He added a new method. If he is the only one adding this meyhod (which is very likely), then no problems arise. If he is not the only one, you need to see if both methods do the same and tell the library developer about this severe problem.
If they change a method, you should start to research why the method was changed. Did they change it due to some edge case behaviour or did they actually fix a bug? especially in the latter case, the monkeypatch is a god thing, because it fixes a bug in many places.
Besides that, you are using a very dynamic language with the assumption that programmers use this freedom in a sane way. The only way to remove this assumption is not to use a dynamic language.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio