Match unicode text with Ruby 1.8.7 - ruby

I have a regex that is used for matching unicode string and works pretty cool with all versions of Ruby newer than 1.8.7:
/[\p{L}\p{Space}]+/u
How it can be achieved with Ruby 1.8.7?

Unicode properties were added in Ruby with version 1.9, so in older versions you have to use Posix classes like [:space:] or [:alpha:]
See POSIX Bracket Expressions for more details.

Related

Ruby: magic comments "frozen_string_literal: true" vs "immutable: string"

In ruby one can freeze all constant strings in a file via two different magic comments at the beginning of a file:
# frozen_string_literal: true
and
# -*- immutable: string -*-
I have no idea what the differences are.
Are there any?
The 1st syntax is the magic comment for Ruby 2.3+ versions to freeze string literals, otherwise you have to use the String method like this:
'hello world!'.freeze
The 2nd syntax is not implemented in Ruby, however it is the way that variables are specified for files in the Emacs text editor.
For example, the following comment in Emacs would declare that the file is a Ruby file and needs Ruby syntax highlighting, and that the variable immutable is set to the value string.
# -*- mode: ruby; immutable: string -*-
After searching around, it looks like that does nothing and is not used by any Ruby syntax highlighting mode.
So you do not need the 2nd syntax.
Digging for anything on the 2nd version, it looks like they had the same intention but the 2nd magic comment syntax does not to appear to have been adopted as of Ruby 2.1.0.
See https://github.com/ruby/ruby/pull/487
The first version # frozen_string_literal: true was adopted in Ruby 2.3.0
I tried the latter version in a few versions of ruby but didn't work. I would guess it should not be used or trusted to work in any version of >= 2.3 but probably no versions support it. In fact, I was not able to find any reference to that version in the open source code on github searching that syntax
https://github.com/ruby/ruby/search?q=immutable%3A+string&unscoped_q=immutable%3A+string

Are Ruby regular expressions Perl or POSIX-compatible?

Are Ruby regular expressions PCREs and/or POSIX-compatible basic/extended regexes?
Are Ruby regular expressions PCREs and/or POSIX basic/extended regexes?
No, they are Ruby Regexps.

Regex error in Ruby 1.8.7 but not 2.0?

In Ruby 1.8.7 the following regex warning: nested repeat operator + and * was replaced with '*'.
^(\w+\.\w+)\|(\w+\.\w+)\n+*$
It does work in Ruby 2.0 though?
http://rubular.com/r/nRUSP5LNZA
A nested operator works, but is warned because it is useless. \n+* means:
Zero or more repeatition of
One or more repeatition of
\n
which is equivalent to a more simple expression \n*, which means:
Zero or more repeatition of
\n
There is no reason to use \n+*. Ruby regex engine was replaced in Ruby 1.9 and in Ruby 2.0, and if there are any differences, then it is simply that the newer engine does not check for warnings as the older one did.

Iconv and Kconv on Ruby (1.9.2)

I know that Iconv is used to convert strings' encoding.
From my understandings Kconv is for the same purpose (am I wrong?).
My question is: what is the difference between them, and what should I use for encoding conversions.
btw found some info that Iconv will be deprecated from 1.9.3 version.
As https://stackoverflow.com/users/23649/jtbandes says, it looks Kconv is like Iconv but specialized for Kanji ("the logographic Chinese characters that are used in the modern Japanese writing system along with hiragana" http://en.wikipedia.org/wiki/Kanji). Unless you are working on something specifically Japanese, I'm guessing you don't need Kconv.
If you're using Ruby 1.9, you can use the built-in encoding support most of the time instead of Iconv. I tried for hours to understand what I was doing until I read this:
http://www.joelonsoftware.com/articles/Unicode.html
Then you can start to use stuff like
String#encode # Ruby 1.9
String#encode! # Ruby 1.9
String#force_encoding # Ruby 1.9
with confidence. If you have more complex needs, do read http://blog.grayproductions.net/categories/character_encodings
UPDATED Thanks to JohnZ in the comments
Iconv is still useful in Ruby 1.9 because it can transliterate characters (something that String#encode et al. can't do). Here's an example of how to extend String with a function that transliterates to UTF-8:
require 'iconv'
class ::String
# Return a new String that has been transliterated into UTF-8
# Should work in Ruby 1.8 and Ruby 1.9 thanks to http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
def as_utf8(from_encoding = 'UTF-8')
::Iconv.conv('UTF-8//TRANSLIT', from_encoding, self + ' ')[0..-2]
end
end
"foo".as_utf8 #=> "foo"
"foo".as_utf8('ISO-8859-1') #=> "foo"
Thanks JohnZ!

Shellwords.shellescape implementation for Ruby 1.8

While the build of 1.8.7 I have seems to have a backported version of Shellwords::shellescape, I know that method is a 1.9 feature and definitely isn't supported in earlier versions of 1.8. Does anyone know where I can find, either in Gem form or just as a snippet, a robust standalone implementation of Bourne-shell command escaping for Ruby?
You might as well just copy what you want from shellwords.rb in the trunk of Ruby's subversion repository (which is GPLv2'd):
def shellescape(str)
# An empty argument will be skipped, so return empty quotes.
return "''" if str.empty?
str = str.dup
# Process as a single byte sequence because not all shell
# implementations are multibyte aware.
str.gsub!(/([^A-Za-z0-9_\-.,:\/#\n])/n, "\\\\\\1")
# A LF cannot be escaped with a backslash because a backslash + LF
# combo is regarded as line continuation and simply ignored.
str.gsub!(/\n/, "'\n'")
return str
end
I wound up going with the Escape gem, which has the additional feature of using quotes by default, and only backslash-escaping when necessary.

Resources