Why and when should a comma be used at the end of a block? - syntax

There many cases in Rust when a block of code can end with or without comma.
For example:
enum WithoutComma
{
x,
y
}
or
enum WithComma
{
x,
y,
}
There are also other examples with match, etc. It seems that both variants lead to the same result. The only case I know where adding or removing a comma changes behaviour is the 1-element tuple declaration (which isn't a block):
let just_int = (5);
let tuple = (5,);
Why can one use a comma or not at the end of a block? Why is there such dualism in thelanguage and what are the reasons for it?

As you say, the only time a trailing comma is required is the 1-tuple pattern, type and construction let (x,): (Type,) = (1,). Everywhere else, trailing commas are optional, have no effect, but are allowed for a few reasons:
it makes macros easier: no need to be careful to not insert a comma at the very end of a sequence of items.
it makes diffs nicer when extending a list of things, e.g. adding a variant to
enum Foo {
Bar
}
gives
enum Foo {
Bar,
Baz
}
which is changing two lines (i.e. tools like git will display the Bar line as modified, as well as the inserted line), even though only the second actually had anything interesting in the change. If Bar started out with a trailing comma, then inserting Baz, after it is fine, with only one line changed.
They're not required (other than the 1-tuple) because that would be fairly strange (IMO), e.g.
fn foo(x: u16,) -> (u8, u8,) {
(bar(x,), baz(x,),)
}
(I guess it would look less strange for enum/struct declarations, but still, it's nice to be able to omit it.)

Related

Ruby if ... any? ... include? syntax

I need to check if any elements of a large (60,000+ elements) array are present in a long string of text. My current code looks like this:
if $TARGET_PARTLIST.any? { |target_pn| pdf_content_string.include? target_pn }
self.last_match_code = target_pn
self.is_a_match = true
end
I get a syntax error undefined local variable or method target_pn.
Could someone let me know the correct syntax to use for this block of code? Also, if anyone knows of a quicker way to do this, I'm all ears!
In this case, all your syntax is correct, you've just got a logic error. While target_pn is defined (as a parameter) inside the block passed to any?, it is not defined in the block of the if statement because the scope of the any?-block ends with the closing curly brace, and target_pn is not available outside its scope. A correct (and more idiomatic) version of your code would look like this:
self.is_a_match = $TARGET_PARTLIST.any? do |target_pn|
included = pdf_content_string.include? target_pn
self.last_match_code = target_pn if included
included
end
Alternately, as jvillian so kindly suggests, one could turn the string into an array of words, then do an intersection and see if the resulting set is nonempty. Like this:
self.is_a_match = !($TARGET_PARTLIST &
pdf_content_string.gsub(/[^A-Za-z ]/,"")
.split).empty?
Unfortunately, this approach loses self.last_match_code. As a note, pointed out by Sergio, if you're dealing with non-English languages, the above regex will have to be changed.
Hope that helps!
You should use Enumerable#find rather than Enumerable#any?.
found = $TARGET_PARTLIST.find { |target_pn| pdf_content_string.include? target_pn }
if found
self.last_match_code = found
self.is_a_match = true
end
Note this does not ensure that the string contains a word that is an element of $TARGET_PARTLIST. For example, if $TARGET_PARTLIST contains the word "able", that string will be found in the string, "Are you comfortable?". If you only want to match words, you could do the following.
found = $TARGET_PARTLIST.find { |target_pn| pdf_content_string[/\b#{target_pn}\b/] }
Note this uses the method String#[].
\b is a word break in the regular expression, meaning that the first (last) character of the matched cannot be preceded (followed) by a word character (a letter, digit or underscore).
If speed is important it may be faster to use the following.
found = $TARGET_PARTLIST.find { |target_pn|
pdf_content_string.include?(target_on) && pdf_content_string[/\b#{target_pn}\b/] }
A probably more performant way would be to move all this into native code by letting Regexp search for it.
# needed only once
TARGET_PARTLIST_RE = Regexp.new("\\b(?:#{$TARGET_PARTLIST.sort.map { |pl| Regexp.escape(pl) }.join('|')})\\b")
# to check
self.last_match_code = pdf_content_string[TARGET_PARTLIST_RE]
self.is_a_match = !self.last_match_code.nil?
A much more performant way would be to build a prefix tree and create the regexp using the prefix tree (this optimises the regexp lookup), but this is a bit more work :)

Why is there a comma in this Golang struct creation?

I have a struct:
type nameSorter struct {
names []Name
by func(s1, s2 *Name) bool
Which is used in this method. What is going on with that comma? If I remove it there is a syntax error.
func (by By) Sort(names []Name) {
sorter := &nameSorter{
names: names,
by: by, //why does there have to be a comma here?
}
sort.Sort(sorter)
Also, the code below works perfectly fine and seems to be more clear.
func (by By) Sort(names []Name) {
sorter := &nameSorter{names, by}
sort.Sort(sorter)
For more context this code is part of a series of declarations for sorting of a custom type that looks like this:
By(lastNameSort).Sort(Names)
This is how go works, and go is strict with things like comma and parentheses.
The good thing about this notion is that when adding or deleting a line, it does not affect other line. Suppose the last comma can be omitted, if you want to add a field after it, you have to add the comma back.
See this post: https://dave.cheney.net/2014/10/04/that-trailing-comma.
From https://golang.org/doc/effective_go.html#semicolons:
the lexer uses a simple rule to insert semicolons automatically as it scans, so the input text is mostly free of them
In other words, the programmer is unburdened from using semicolons, but Go still uses them under the hood, prior to compilation.
Semicolons are inserted after the:
last token before a newline is an identifier (which includes words like int and float64), a basic literal such as a number or string constant, or one of the tokens break continue fallthrough return ++ -- ) }
Thus, without a comma, the lexer would insert a semicolon and cause a syntax error:
&nameSorter{
names: names,
by: by; // semicolon inserted after identifier, syntax error due to unclosed braces
}

Why do programmers put spaces inside braces?

In my experience, it's common to see spaces put inside braces for one-line definitions, e.g. this function in JavaScript:
function(a, b) { return a * b; }
Is there any technical/historical reason that most programmers seem to do this, particularly given that spaces are not included inside parentheses?
Besides readability, in some languages, such as Verilog, identifiers can be escaped (by a \ at their beginning) so that they use special characters in their names. For example, the following names are legal identifier names in Verilog:
q
\q~ //escaped version which uses ~ in the name
\element[32] //a single variable (not part of an array) whose name is \element[32]
Such identifiers, should always terminate by space, otherwise the character after them would be considered as the identifier's name:
{ d, \q~ } // Concatenating d and \q~ in a vector
{ d, \q~} // Concatenating d and \q~} in a vector. Will generate a missing brace error.
Spaces are mostly used for readability. Most of the coding styles will tell you to judiciously use whitespace in your code to make it more readable.
As an example, look at this statement: *a = *b + *c;. Think how it will seem without the whitespace.

ruby extract string between two string

I am having a string as below:
str1='"{\"#Network\":{\"command\":\"Connect\",\"data\":
{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"'
I wanted to extract the somename string from the above string. Values of xx:xx:xx:xx:xx:xx, somename and 123456789 can change but the syntax will remain same as above.
I saw similar posts on this site but don't know how to use regex in the above case.
Any ideas how to extract the above string.
Parse the string to JSON and get the values that way.
require 'json'
str = "{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
json = JSON.parse(str.strip)
name = json["#Network"]["data"]["Name"]
pwd = json["#Network"]["data"]["Pwd"]
Since you don't know regex, let's leave them out for now and try manual parsing which is a bit easier to understand.
Your original input, without the outer apostrophes and name of variable is:
"{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
You say that you need to get the 'somename' value and that the 'grammar will not change'. Cool!.
First, look at what delimits that value: it has quotes, then there's a colon to the left and comma to the right. However, looking at other parts, such layout is also used near the command and near the pwd. So, colon-quote-data-quote-comma is not enough. Looking further to the sides, there's a \"Name\". It never occurs anywhere in the input data except this place. This is just great! That means, that we can quickly find the whereabouts of the data just by searching for the \"Name\" text:
inputdata = .....
estposition = inputdata.index('\"Name\"')
raise "well-known marker wa not found in the input" unless estposition
now, we know:
where the part starts
and that after the "Name" text there's always a colon, a quote, and then the-interesting-data
and that there's always a quote after the interesting-data
let's find all of them:
colonquote = inputdata.index(':\"', estposition)
datastart = colonquote+3
lastquote = inputdata.index('\"', datastart)
dataend = lastquote-1
The index returns the start position of the match, so it would return the position of : and position of \. Since we want to get the text between them, we must add/subtract a few positions to move past the :\" at begining or move back from \" at end.
Then, fetch the data from between them:
value = inputdata[datastart..dataend]
And that's it.
Now, step back and look at the input data once again. You say that grammar is always the same. The various bits are obviously separated by colons and commas. Let's try using it directly:
parts = inputdata.split(/[:,]/)
=> ["\"{\\\"#Network\\\"",
"{\\\"command\\\"",
"\\\"Connect\\\"",
"\\\"data\\\"",
"\n{\\\"Id\\\"",
"\\\"xx",
"xx",
"xx",
"xx",
"xx",
"xx\\\"",
"\\\"Name\\\"",
"\\\"somename\\\"",
"\\\"Pwd\\\"",
"\\\"123456789\\\"}}}\\0\""]
Please ignore the regex for now. Just assume it says a colon or comma. Now, in parts you will get all the, well, parts, that were detected by cutting the inputdata to pieces at every colon or comma.
If the layout never changes and is always the same, then your interesting-data will be always at place 13th:
almostvalue = parts[12]
=> "\\\"somename\\\""
Now, just strip the spurious characters. Since the grammar is constant, there's 2 chars to be cut from both sides:
value = almostvalue[2..-3]
Ok, another way. Since regex already showed up, let's try with them. We know:
data is prefixed with \"Name\" then colon and slash-quote
data consists of some text without quotes inside (well, at least I guess so)
data ends with a slash-quote
the parts in regex syntax would be, respectively:
\"Name\":\"
[^\"]*
\"
together:
inputdata =~ /\\"Name\\":\\"([^\"]*)\\"/
value = $1
Note that I surrounded the interesting part with (), hence after sucessful match that part is available in the $1 special variable.
Yet another way:
If you look at the grammar carefully, it really resembles a set of embedded hashes:
\"
{ \"#Network\" :
{ \"command\" : \"Connect\",
\"data\" :
{ \"Id\" : \"xx:xx:xx:xx:xx:xx\",
\"Name\" : \"somename\",
\"Pwd\" : \"123456789\"
}
}
}
\0\"
If we'd write something similar as Ruby hashes:
{ "#Network" =>
{ "command" => "Connect",
"data" =>
{ "Id" => "xx:xx:xx:xx:xx:xx",
"Name" => "somename",
"Pwd" => "123456789"
}
}
}
What's the difference? the colon was replaced with =>, and the slashes-before-quotes are gone. Oh, and also opening/closing \" is gone and that \0 at the end is gone too. Let's play:
tmp = inputdata[2..-4] # remove opening \" and closing \0\"
tmp.gsub!('\"', '"') # replace every \" with just "
Now, what about colons.. We cannot just replace : with =>, because it would damage the internal colons of the xx:xx:xx:xx:xx:xx part.. But, look: all the other colons have always a quote before them!
tmp.gsub!('":', '"=>') # replace every quote-colon with quote-arrow
Now our tmp is:
{"#Network"=>{"command"=>"Connect","data"=>{"Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789"}}}
formatted a little:
{ "#Network"=>
{ "command"=>"Connect",
"data"=>
{ "Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789" }
}
}
So, it looks just like a Ruby hash. Let's try 'destringizing' it:
packeddata = eval(tmp)
value = packeddata['#Network']['data']['Name']
Done.
Well, this has grown a bit and Jonas was obviously faster, so I'll leave the JSON part to him since he wrote it already ;) The data was so similar to Ruby hash because it was obviously formatted as JSON which is a hash-like structure too. Using the proper format-reading tools is usually the best idea, but mind that the JSON library when asked to read the data - will read all of the data and then you can ask them "what was inside at the key xx/yy/zz", just like I showed you with the read-it-as-a-Hash attempt. Sometimes when your program is very short on the deadline, you cannot afford to read-it-all. Then, scanning with regex or scanning manually for "known markers" may (not must) be much faster and thus prefereable. But, still, much less convenient. Have fun.

How to write a Ruby switch statement (case...when) with regex and backreferences?

I know that I can write a Ruby case statement to check a match against a regular expressions.
However, I'd like to use the match data in my return statement. Something like this semi-pseudocode:
foo = "10/10/2011"
case foo
when /^([0-9][0-9])/
print "the month is #{match[1]}"
else
print "something else"
end
How can I achieve that?
Thanks!
Just a note: I understand that I wouldn't ever use a switch statement for a simple case as above, but that is only one example. In reality, what I am trying to achieve is the matching of many potential regular expressions for a date that can be written in various ways, and then parsing it with Ruby's Date class accordingly.
The references to the latest regex matching groups are always stored in pseudo variables $1 to $9:
case foo
when /^([0-9][0-9])/
print "the month is #{$1}"
else
print "something else"
end
You can also use the $LAST_MATCH_INFO pseudo variable to get at the whole MatchData object. This can be useful when using named captures:
case foo
when /^(?<number>[0-9][0-9])/
print "the month is #{$LAST_MATCH_INFO['number']}"
else
print "something else"
end
Here's an alternative approach that gets you the same result but doesn't use a switch. If you put your regular expressions in an array, you could do something like this:
res = [ /pat1/, /pat2/, ... ]
m = nil
res.find { |re| m = foo.match(re) }
# Do what you will with `m` now.
Declaring m outside the block allows it to still be available after find is done with the block and find will stop as soon as the block returns a true value so you get the same shortcutting behavior that a switch gives you. This gives you the full MatchData if you need it (perhaps you want to use named capture groups in your regexes) and nicely separates your regexes from your search logic (which may or may not yield clearer code), you could even load your regexes from a config file or choose which set of them you wanted at run time.

Resources