Ruby and integer in method name - ruby

I have to name methods based on the index number of some external documentation:
def 51_bic
end
This is wrong, as shown by the color with the syntax highlighting. And also the code fails with trailing `_' in number (SyntaxError).
Using bic_51 works just fine. But why is that? What's the nature of the fact that I can't use integer + underscore + string? My understanding is that everything after a def is just a method name as a string.

Identifiers can have numbers in them, but can't start with a number. This is how it is in most programming languages (that I heard of).
What's the nature of the problem that I can't use integer + underscore + string?
Because if you allow identifiers to start with a number, you must then mandate that they contain a letter after (to differentiate them from numbers). Now, food for thought. Imagine you can start identifiers with numbers. Which of these are method calls, local variables and which are number literals?
0xa0 + 0b10_100 + 3_456

Related

Serialization delimiter in Ruby

I'm writing a serialize method that converts a Tree into a string for storage. I was looking for a delimiter to use in the serialization and wasn't sure what to use.
I can't use , because that might exist as a data value in a node. e.g.
A
/ \
B ,
would serialize to A, B, ,, and break my deserialization method. Can I use non-printable ASCII characters, or should I just guess what character(s) are unlikely to show up as input and use those as my delimiters?
Here's what my serialize method looks like, if you're curious:
def serialize(root)
if root.nil?
""
else
root.val + DELIMITER +
serialize(root.left) + DELIMITER +
serialize(root.right)
end
end
There are several common methods I can think of:
Escaping: you define an escape symbol that "escapes" from the "special" interpretation. Think about how \ acts as an escape character in Ruby string literals.
Fixed Fields / Length Encoding: you know in advance where a field begins and ends. (Fixed fields are basically a special-case of length encoding where you can leave out the length because it is always the same.)
Example for escaping:
def serialize(root)
if root.nil?
""
else
"#{escape(root.val)},#{serialize(root.left)},#{serialize(root.right)}" # using ,
end
end
private def escape(str) str.gsub('\', '\\').gsub(',', '\,') end
Example for length encoding:
def serialize(root)
if root.nil?
"0,"
else
"#{root.val.size},#{root.val}#{serialize(root.left)}#{serialize(root.right)}" # using length encoding
end
end
Any , you find within size characters belongs to the value. Fixed fields would basically just concatenate the values and assume that they are all the same fixed length.
You might want to look at how existing serialization formats handle it, like OGDL: Ordered Graph Data Language, YAML: YAML Ain't Markup Language, JSON, CSV (Character-Separated Values), XML (eXtensible Markup Language).
If you want to look at binary formats, you can check out Ruby's Marshal format or ASN.1.
Your idea of finding a seldom-used character is good, even if you use escaping, you will still need less escaping with a less used character. Just imaginee what it would look likee if 'ee' was the eescapee characteer. However, I think using a non-printable character goes too far: unless you specifically want to design a binary format (such as Ruby's Marshal, Python's Pickle, or Java's Serialization), "less debuggability" (i.e. debugging by simply inspecting the output with less) is a nice property to have and one that you should not give up easily.

Explanation of trailing character %

In an ancient PowerBasic file, I found this in the code:
%AppendRec= 1% '^a Write/Append Btrieve record to named file
%PrtBar= 2% '^b Print a Bar Code
My question deals with the numbers after the = sign. I assume the trailing % has a meaning, but I can't figure out what that meaning is.
I know that in QB, % denotes an Integer type but that normally leads the variable as shown at the beginning of the code lines. The trailing % has me confused.
It is used to specify the type of the constant, so you don't have e.g. "1" evaluate to a float and then have to be converted to an int.
This page shows you the defaults that PB uses if you don't explicitly specify the type of constants.
Like Charles said it defines the type as int. Prefixing it with % like the variable on the left is defining a numeric equate/constant. The page link is old now though and their new help is at PowerBasicHelp
Look for Numeric Equates and Data Types.

Significance of an ampersand in VB6 function name?

I just got a bunch of legacy VB6 (!) code dumped on me and I keep seeing functions declared with an ampersand at the end of the name, for example, Private Declare Function ShellExecute& . . ..
I've been unable to find an answer to the significance of this, nor have I been able to detect any pattern in use or signature of the functions that have been named thusly.
Anyone know if those trailing ampersands mean anything to the compiler, or at least if there's some convention that I'm missing? So far, I'm writing it off as a strange programmer, but I'd like to know for sure if there's any meaning behind it.
It means that the function returns a Long (i.e. 32-bit integer) value.
It is equivalent to
Declare Function ShellExecute(...) As Long
The full list of suffixes is as follows:
Integer %
Long &
Single !
Double #
Currency #
String $
As Philip Sheard has said it is an indentifier type for a Long. They are still present in .Net, see this MSDN link and this VB6 article
From the second article:
The rules for forming a valid VB variable name are as follows:
(1) The first character must be a letter A through Z (uppercase or
lowercase letters may be used). Succeeding characters can be letters,
digits, or the underscore (_) character (no spaces or other characters
allowed).
(2) The final character can be a "type-declaration character". Only
some of the variable types can use them, as shown below:
Data Type Type Declaration Character
String $
Integer %
Long &
Single !
Double #
Currency #
Use of type-declaration
characters in VB is not encouraged; the modern style is to use the
"As" clause in a data declaration statement.

User-defined Literals suffix, with *_digit..."?

A user-defined literal suffix in C++0x should be an identifier that
starts with _ (underscore) (17.6.4.3.5)
should not begin with _ followed by uppercase letter (17.6.4.3.2)
Each name that [...] begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
Is there any reason, why such a suffix may not start _ followed by a digit? I.E. _4 or _3musketeers?
Musketeer dartagnan = "d'Artagnan"_3musketeers;
int num = 123123_4; // to be interpreted in base4 system?
string s = "gdDadndJdOhsl2"_64; // base64decoder
The precedent for identifiers of the form _<number> is the function argument placeholder object mechanism in std::placeholders (§20.8.9.1.3), which defines an implementation-defined number of such symbols.
This is a good thing, because it means the user cannot #define any identifier of that form. §17.6.4.3.1/1:
A translation unit that includes a standard library header shall not #define or #undef names declared in any standard library header.
The name of the user-defined literal function is operator "" _123, not simply _123, so there is no direct conflict between your name and the library name if presence of the using namespace std::placeholders;.
My 2¢, though, is that you would be better off with an operator "" _baseconv and encoding the base within the literal, "123123_4"_baseconv.
Edit: Looking at Johannes' (deleted) answer, there is There may be concern that _123 could be used as a macro by the implementation. This is certainly the realm of theory, as the implementation would have little to gain by such preprocessor use. Furthermore, if I'm not mistaken, the reason for hiding these symbols in std::placeholders, not std itself, is that such names are more likely to be used by the user, such as by inclusion of Boost Bind (which does not hide them inside a named namespace).
The tokens are not reserved for use by the implementation globally (17.6.4.3.2), and there is precedent for their use, so they are at least as safe as, say, forward.
"can" vs "may".
can denotes ability where may denotes permission.
Is there a reason why you would not have permission to the start a user-defined literal suffix with _ followed by a digit?
Permission implies coding standards or best-practices. The examples you provides seem to show that _\d would fine suffixes if used correctly (to denote numeric base). Unfortunately your question can't have a well thought out answer as no one has experience with this new language feature yet.
Just to be clear user-defined literal suffixes can start with _\d.
An underscore followed by a digit is a legal user-defined literal suffix.
The function signature would be:
operator"" _4();
so it couldn;t get eaten by a placeholder.
The literal would be a single preprocessor token:
123123_4;
so the _4 would not get clobbered by a placeholder or a preprocessor symbol.
My reading of 17.6.4.3.5 is that suffixes not containing a leading underscore risk collision with the implementation or future library additions. They also collide with existing suffixes: F, L, ULL, etc. One of the rationales for user-defined literals is that a new type (such as decimals for example) could be defined as a pure library extension including literals with suffuxes d, df, dl.
Then there's the question of style and readability. Personally, I think I would loose sight of the suffix 1234_3; Maybe, maybe not.
Finally, there was some idea that didn't make it into the standard (but I kind of like) to have _ be a literal separator for numbers like in Ada and Ruby. So you could have 123_456_789 to visually separate thousands for example. Your suffix would break if that ever went through.
I knew I had some papers on this subject:
Digital Separators describes a proposal to use _ as a digit separator in numeric literals
Ambiguity and Insecurity with User-Defined literals Describes the evolution of ideas about literal suffix naming and namespace reservation and efforts to deconflict user-defined literals against a future digit separator.
It just doesn't look that good for the _ digit separator.
I had an idea though: how about either a backslash or a backtick for digit separator? It isn't as nice as _ but I don't think there would be any collision as long as the backslash was inside the stream of digits. The backtick has no lexical use currently that I know of.
i = 123\456\789;
j = 0xface\beef;
or
i = 123`456`789;
j = 0xface`beef;
This would leave _123 as a literal suffix.

How to get a Ruby substring of a Unicode string?

I have a field in my Rails model that has max length 255.
I'm importing data into it, and some times the imported data has a length > 255. I'm willing to simply chop it off so that I end up with the largest possible valid string that fits.
I originally tried to do field[0,255] in order to get this, but this will actually chop trailing Unicode right through a character. When I then go to save this into the database, it throws an error telling me I have an invalid character due to the character that's been halved or quartered.
What's the recommended way to chop off Unicode characters to get them to fit in my space, without chopping up individual characters?
Uh. Seems like truncate and friends like to play with chars, but not their little cousins bytes. Here's a quick answer for your problem, but I don't know if there's a more straighforward and elegant question I mean answer
def truncate_bytes(string, size)
count = 0
string.chars.take_while{|c| (a += c.bytes.to_a.length) <= size }.join
end
Give a look at the Chars class of ActiveSupport.
Use the multibyte proxy method (mb_chars) before manipulating the string:
str.mb_chars[0,255]
See http://api.rubyonrails.org/classes/String.html#method-i-mb_chars.
Note that until Rails 2.1 the method was "chars".

Resources