What is the difference between "hello".length and "hello" .length? - ruby

I am surprised when I run the following examples in ruby console. They both produce the same output.
"hello".length
and
"hello" .length
How does the ruby console remove the space and provide the right output?

You can put spaces wherever you want, the interpreter looks for the end of the line. For example:
Valid
"hello".
length
Invalid
"hello"
.length
The interpreter sees the dot at the end of the line and knows something has to follow it up. While in the second case it thinks the line is finished. The same goes for the amount of spaces in one line. Does it matter how the interpreter removes the spaces? What matters is that you know the behavior.
If you want you can even
"hello" . length
and it will still work.
I know this is not an answer to you question, but does the "how" matter?
EDIT: I was corrected in the comments below. The examples with multiple lines given above are both valid when run in a script instead of IRB. I was mixed them up with the operators. Where the following also applies when running a script:
Valid
result = true || false
Valid
result = true ||
false
Invalid
result = true
|| false

This doesn't have as much to do with the console as it has to do with how the language itself is parsed by the compiler.
Most languages are parsed in such a way that items to be parsed are first grouped into TOKENS. Then the compiler is defined to expect a certain SEQUENCE of tokens in order to interpret each programming statement.
Because the compiler is only looking for a TOKEN SEQUENCE, it doesn't matter if there is space in between or not.
In this case the compiler is looking for:
STRING DOT METHOD_NAME
So it won't matter if you write "hello".length, or even "hello" . length. The same sequence of tokens are present in both, and that is all that matters to the compiler.
If you are curious how these token sequences are defined in the Ruby source code, you can look at parse.y starting around line 1042:
https://github.com/ruby/ruby/blob/trunk/parse.y#L1042
This is a file that is written using the YACC language, which is a language used to define parsers with.
Even without knowing anything about YACC, you should already be able to get some clues on how it works by just looking around the file a bit.

Related

Sphinx issues mysterious error in literal blocks

In Sphinx (the ReStructuredText publishing system), are there any obscure rules that limit what a literal block can contain?
Background: My document contains many literal blocks that follow a double-colon paragraph, like this:
Background:... follow a double-colon paragraph, like this::
$ sudo su
# echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc
This block (with a different preceding paragraph) is one of the ones that issues an error: "WARNING: Inconsistent literal block quoting." The message indicates that the error is in the "echo" line. In the HTML output the literal block contains only the "sudo" line; the "echo" line is treated as ordinary text.
I haven't been able to identify any common property in the lines that report errors, or anything that distinguishes them, as a class, from lines in other literal blocks that don't get errors.
I stripped down the project to isolate the problem, and I identified it that way.
I had a numbered list item that contained a double-colon literal block that was indented only as far as the list item's text, like this:
2. Set up the... directory::
$ A Linux command
$ Another Linux command
$ And ANOTHER Linux command
$ etc.
When I indented the literal block further, the problem went away.
I was misled by two things:
The message does not point to the first line in the literal block, but to some apparently random line within it. In the case above, it pointed to the fifth line (out of eight) in the block!
In most cases this form of indention, although incorrect, works just fine.
Isolating the problem is a brute-force method of solving it, but is often effective when deduction fails. I'll keep that in mind in the future.

What does the "%" mean in tcl?

In a situation like this for example:
[% $create_port %]
or [list [% $RTL_LIST %]]
I realized it had to do with the brackets, but what confuses me is that sometimes it is used with the brackets and variable followed, and sometimes you have brackets with variables inside without the %.
So i'm not sure what it is used for.
Any help is appreciated.
% is not a metacharacter in the Tcl language core, but it still has a few meanings in Tcl. In particular, it's the modulus operator in expr and a substitution field specifier in format, scan, clock format and clock scan. (It's also the default prompt character, and I have a trivial pass-through % command in my ~/.tclshrc to make cut-n-pasting code easier, but nobody else in the world needs to follow my lead there!)
But the code you have written does not appear to be any of those (because it would be a syntax error in all of the commands I've mentioned). It looks like it is some sort of directive processing scheme (with the special sequences being [% and %], with the brackets) though not one I recognise such as doctools or rivet. Because a program that embeds a Tcl interpreter could do an arbitrary transformation to scripts before executing them, it's extremely difficult to guess what it might really be.

Using phrase_from_file to read a file's lines

I've been trying to parse a file containing lines of integers using phrase_from_file with the grammar rules
line --> I,line,{integer(I)}.
line --> ['\n'].
thusly: phrase_from_file(line,'input.txt').
It fails, and I got lost very quickly trying to trace it.
I've even tried to print I, but it doesn't even get there.
EDIT::
As none of the solutions below really fit my needs (using read/1 assumes you're reading terms, and sometimes writing that DCG might just take too long), I cannibalized this code I googled, the main changes being the addition of:
read_rest(-1,[]):-!.
read_word(C,[],C) :- ( C=32 ;
C=(-1)
) , !.
If you are using phrase_from_file/2 there is a very simple way to test your programs prior to reading actual files. Simply call the very same non-terminal with phrase/2. Thus, a goal
phrase(line,"1\n2").
is the same as calling
phrase_from_file(line,fichier)
when fichier is a file containing above 3 characters. So you can test and experiment in a very compact manner with phrase/2.
There are further issues #Jan Burse already mentioned. SWI reads in character codes. So you have to write
newline --> "\n".
for a newline. And then you still have to parse integers yourself. But all that is tested much easier with phrase/2. The nice thing is that you can then switch to reading files without changing the actual DCG code.
I guess there is a conceptional problem here. Although I don't know the details of phrase_from_file/2, i.e. which Prolog system you are using, I nevertheless assume that it will produce character codes. So for an integer 123 in the file you will get the character codes 0'1, 0'2 and 0'3. This is probably not what you want.
If you would like to process the characters, you would need to use a non-terminal instead of a bare bone variable I, to fetch them. And instead of the integer test, you would need a character test, and you can do the test earlier:
line --> [I], {0'0=<I, I=<0'9}, line.
Best Regards
P.S.: Instead of going the DCG way, you could also use term read operations. See also:
read numbers from file in prolog and sorting

ALL CAPS to Normal case

I'm trying to find an elegant solution on how to convert something like this
ALL CAPS TEXT. "WHY ANYONE WOULD USE IT?" THIS IS RIDICULOUS! HELP.
...to regular-case. I could more or less find all sentence-starting characters with:
(?<=^|(\. \"?)|(! ))[A-Z] #this regex sure should be more complex
but (standard) Ruby neither allows lookbehinds, nor it is possible to apply .capitalize to, say, gsub replacements. I wish I could do this:
"mytext".gsub(/my(regex)/, '\1'.capitalize)
but the current working solution would be to
"mytext".split(/\. /).each {|x| p x.capitalize } #but this solution sucks
First of all, notice that what you are trying to do will only be an approximation.
You cannot correctly tell where the sentence boundaries are. You can approximate it as The beginning of the entire string or right after a period, question mark, or exclamation mark followed by spaces. But then, you will incorrectly capitalize "economy" in "U.S. economy".
You cannot correctly tell which words should be capitalized. For example, "John" will be "john".
You may want to do some natural language processing to give you a close-to-correct result in many cases, but these methods are only probablistically correct. You will never get a perfect result.
Understanding these limitations, you might want to do:
mytext.gsub(/.*?(?:[.?!]\s+|\z)/, &:capitalize)

Ruby -- looking for some sort of "Regexp unescape" method

I have a bunch of string with special escape codes that I want to store unescaped- eg, the interpreter shows
"\\014\"\\000\"\\016smoothing\"\\011mean\"\\022color\"\\011zero#\\016"
but I want it to show (when inspected) as
"\014\"\000\"\016smoothing\"\011mean\"\022color\"\011zero#\016"
What's the method to unescape them? I imagine that I could make a regex to remove 1 backslash from every consecutive n backslashes, but I don't have a lot of regex experience and it seems there ought to be a "more elegant" way to do it.
For example, when I puts MyString it displays the output I'd like, but I don't know how I might capture that into a variable.
Thanks!
Edited to add context: I have this class that is being used to marshal / restore some stuff, but when I restore some old strings it spits out a type error which I've determined is because they weren't -- for some inexplicable reason -- stored as base64. They instead appear to have just been escaped, which I don't want, because trying to restore them similarly gives the TypeError
TypeError: incompatible marshal file format (can't be read)
format version 4.8 required; 92.48 given
because Marshal looks at the first characters of the string to determine the format.
require 'base64'
class MarshaledStuff < ActiveRecord::Base
validates_presence_of :marshaled_obj
def contents
obj = self.marshaled_obj
return Marshal.restore(Base64.decode64(obj))
end
def contents=(newcontents)
self.marshaled_obj = Base64.encode64(Marshal.dump(newcontents))
end
end
Edit 2: Changed wording -- I was thinking they were "double-escaped" but it was only single-escaped. Whoops!
If your strings give you the correct output when you print them then they are already escaped correctly. The extra backslashes you see are probably because you are displaying them in the interactive interpreter which adds extra backslashes for you when you display variables to make them less ambiguous.
> x
=> "\\"
> puts x
\
=> nil
> x.length
=> 1
Note that even though it looks like x contains two backslashes, the length of the string is one. The extra backslash is added by the interpreter and is not really part of the string.
If you still think there's a problem, please be more specific about how you are displaying the strings that you mentioned in your question.
Edit: In your example the only thing that need unescaping are octal escape codes. You could try this:
x = x.gsub(/\\[0-2][0-7]{2}/){ |c| c[1,3].to_i(8).chr }

Resources