Semantics of F# let statement with comma - syntax

I'm learning F#. I started by looking over the F# samples from Microsoft.
I ran across this statement:
let line1,line2 =
use sr = System.IO.File.OpenText #"test.txt"
let line1 = sr.ReadLine()
let line2 = sr.ReadLine()
(line1,line2)
Can anyone explain this statement to me?
What type is being defined here? A function? A tuple?
Why do line1 and line2 have to be redefined within the definition of line1,line2 (let line1 = ... let line2 =)?
What's with the final line, (line1, line2) and how does this tie into the type of the original definition? Is this the function return?
Is "statement" even the right word to use for a definition in F#?
Thanks.

The general form for binding identifier values in F# is
let pattern = expression
In this case, the pattern is "line1, line2", which is a tuple pattern, it will expect to bind to a 2-tuple of values and assign the names "line1" and "line2" to those two values.
The expression is the next 4 lines. Inside that expression there are local variables. They happen to also be named "line1" and "line2", but they could easily have been renamed "x" and "y" or whatever - the scope of those identifiers is local to this indented expression. (The fact that the same names are used as the names in the outer scope has no effect as far as the compiler is concerned.)
The final line if the expression is the 'return value' of the expression. In this case it returns the 2-tuple of values "line1" and "line2" (or "x" and "y" if you rename them for clarity of exposition). Incidentally, since these two values each have type "string", the type of the return expression is "string*string", which is a 2-tuple where each value is a string. This means the original "line1" and "line2" names on the first line will each be inferred to have type "string".
F# is functional, and so in a sense "everything is an expression" and "there are no statements" (only sequences of expressions that are sequentially evaluated), but it is ok IMO to (ab)use the term "statement" to describe the inner "let" lines, unless you're trying to be very precise.

Related

How to use wildcard with a variable?

I need to evaluate the output to see if it starts with a specific sequence.
For example if Cat1 = (A)
I want to verify that the entry begins with the value of Cat1 and can contain any text after it. If so then to output that entry.
I don't exactly know how to use wildcards in conjunction with the variable to allow entries such as
(A) First assignment
(A) Second assignment
to be selected and then to be transferred.
The portion that is in question is the following in my code:
if(assign.title == ){
SpreadsheetApp.openByUrl(url).getSheetByName(shet).appendRow([assign.title, marks.assignedGrade,
assign.maxPoints]);}
}
Your issue can be solved by using Regular Expressions which essentially are special text strings used to describe a search pattern.
Therefore, if you want to search for the entries which begin with (A) and appendRow() like you mentioned above, you should use the following code snippet:
function theFunction() {
var ss = SpreadsheetApp.openByUrl("YOUR_URL").getSheetByName("YOUR_SHEET_NAME");
var regEx = /((A)).*/;
//Getting the assign & marks variables
if (assign.title.match(regEx))
appendRow([assign.title, marks.assignedGrade, assign.maxPoints]);
}
The regular expression here is represented by the var regEx = /((A)).*/; which searches for a string to see if it starts with the (A) string.
Furthermore, I suggest you take a look at these links since they might be of help:
Syntax for Regular Expressions;
Regular Expressions Tester.

Ruby regex: union 2K values in one regex,

I code a process to process bunch of text files and capture its name if any of 2000 literals exists in it (1 or many). So I'm thinking to combine that many values into one regex, do you think it's doable, I did test for 100 and looks like it's OK. Tx all
Code below depics my flow and sample code, just without looping.
# 1. read regex value list as file [alpha,fox, delta] # 2000 values
# 2. read file into s #5000 files
# 3. find if any of #1 values exists in each #2 file. *with regex tweaks to match format dbname.dob.table
s = '1 dbName.dbo.ALPHA 2 DBNAME.bcd.ALPHA 3 dbName..ALPHA 4 ALPHA 5x dbName.alphA 6x alpha.XX 7x ###dbName.###a.alpha --alpha
dbName..FOX dbName.dbo.DELTA clarity.aba..fox '
value1 = '(?<=^|\s)(?:dbName\.[a-z]*\.)?(?:alpha)(?=\s|$)'
value2 = '(?<=^|\s)(?:dbName\.[a-z]*\.)?(?:fox)(?=\s|$)'
##...
value2000 = '(?<=^|\s)(?:dbName\.[a-z]*\.)?(?:delta)(?=\s|$)'
regex = /#{value1}|#{value2}|#{value2000}/i ## can I union 2000 regex's ???
puts 'reg1: ' + regex.to_s
puts 'result: ' + s.scan(regex).to_s
if s.scan(regex) then puts '...Match!!!d' end
Declaring 2000 variables is highly unnecessary; you should define all values in a single array, then somehow loop through them.
Also, the regular expression is highly repetitive - e.g. the use of (?:dbName\.[a-z]*\.) 2000 times. This can be simplified by grouping all of your values within the non-capture group as follows:
values = %w(alpha fox delta)
regex = /(?<=^|\s)(?:dbName\.[a-z]*\.)?(?:#{Regexp.union(values)})(?=\s|$)/
This is the result:
/(?<=^|\s)(?:dbName\.[a-z]*\.)?(?:(?-mix:alpha|fox|delta))(?=\s|$)/
If you extend that values array to contain 2000 strings, the other code does not need to change.
Provided two conditions are met, I would do it as follows, which I think would be far more efficient than using a gigantic regular expression, which, by its nature, requires that a linear search of the "bad words" be performed for each word in the string, until a match is found or it is determined that there are no matches.
We are given a file whose path is contained in a variable fname and an array of bad words:
arr = ["alpha", "fox", "delta", "charlie", "mabel"]
The first condition that I spoke of above is that, by way of example, "ALPHA" and "Alpha" match "alpha", but "aLPha" does not (or some variant of that).
The second condition is that there is a regular expression with a capture group that would capture a bad word if a bad word were present at the given location in a match. For example:
regex = (?<=^|\s)(?:dbName\.[a-z]*\.)?(\p{Alpha}+)(?=\s|$)
Wherever there is a match, the capture group (\p{Alpha}+) would capture a string of one or more alphanumeric characters whose value is assigned to the global variable $1. We will then check to see if the value of $1 is a bad word. (The regular expression might have other capture groups as well, in which case we might be looking for $2 or $3, say, or a named capture group.)
If there were more than one such regular expression to check for, the code below could be executed for each of them until a match is found or it is determined that there are no more matches.
The first step is to convert the array of bad words to a set:
require 'set'
bad_words = arr.flat_map { |w| [w, w.capitalize, w.upcase] }.to_set
#=> #<Set: {"alpha", "Alpha", "ALPHA", "fox", "Fox", "FOX",
# "delta", "Delta", "DELTA", "charlie", "Charlie", "CHARLIE",
# "mabel", "Mabel", "MABEL"}>
This allows very fast word lookups--much faster than stepping through an array. We may then search the file as follows.
rv = IO.foreach(fname).any? do |line|
line.gsub(regex).any? { bad_words.include?($1) }
end
IO::foreach without a block is seen to return an enumerator. We can then chain that to any? to determine if there is a line that contains a match of the regular expression and the value of its capture group is contained in the set bad_words. If such a line is found the search terminates and true is returned; else, false is returned.
It is seen that String#gsub without a block returns an enumerator, which here I've chained to any?. This form of gsub has nothing to do with string replacements; it just generates matches. Those matches are passed to the block, but we are only interested in the contents of the capture group, which are held by $1. Hence the expression bad_words.include?($1).

When do you have to use #{} while referring to variables in Ruby?

Sometimes if I've defined a a variable, for example
xyz="example"
and I'd like to refer back to xyz, I can either type xyz or #{xyz} in statements/loops etc.
My question is when do I use xyz and when do I use #{xyz}? And how do they differ?
#{} allows you to use any Ruby expression (not necessarily a variable) inside an interpolating quote (doubly-quoted strings, regular expressions...). It will evaluate the expression, convert the result to a string using the to_s method, then insert ("interpolate") the result into the string at that spot.
For example, if you have a string "Hello, apollo!" and want to replace the apollo with the contents of a variable, you could say "Hello, #{name}!".
You could even put a whole program inside a string:
"One plus two equals #{
def last_name_by_first_name(last_name)
People.find_by_last_name(last_name).first.first_name
end
find_by_last_name('Jack')
}!"
or
"#{a = 1} + #{b = 2} = #{a + b}"
(But please don't do that, it's a horrid way to write code. Sticking with variable look-ups and simple function calls should be enough for most purposes.)
Anywhere outside of a string (or other interpolating quote), #{} is an error.

What is the best way to delimit a csv files thats contain commas and double quotes?

Lets say I have the following string and I want the below output without requiring csv.
this, "what I need", to, do, "i, want, this", to, work
this
what i need
to
do
i, want, this
to
work
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
"([^"]+)"|[^, ]+
The left side of the alternation | matches complete "quotes" and captures the contents to Group1. The right side matches characters that are neither commas nor spaces, and we know they are the right ones because they were not matched by the expression on the left.
Option 2: Allowing Multiple Words
In your input, all tokens are single words, but if you also want the regex to work for my cat scratches, "what I need", your dog barks, use this:
"([^"]+)"|[^, ]+(?:[ ]*[^, ]+)*
The only difference is the addition of (?:[ ]*[^, ]+)* which optionally adds spaces + characters, zero or more times.
This program shows how to use the regex (see the results at the bottom of the online demo):
subject = 'this, "what I need", to, do, "i, want, this", to, work'
regex = /"([^"]+)"|[^, ]+/
# put Group 1 captures in an array
mymatches = []
subject.scan(regex) {|m|
$1.nil? ? mymatches << $& : mymatches << $1
}
mymatches.each { |x| puts x }
Output
this
what I need
to
do
i, want, this
to
work
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...

How to write a Ruby switch statement (case...when) with regex and backreferences?

I know that I can write a Ruby case statement to check a match against a regular expressions.
However, I'd like to use the match data in my return statement. Something like this semi-pseudocode:
foo = "10/10/2011"
case foo
when /^([0-9][0-9])/
print "the month is #{match[1]}"
else
print "something else"
end
How can I achieve that?
Thanks!
Just a note: I understand that I wouldn't ever use a switch statement for a simple case as above, but that is only one example. In reality, what I am trying to achieve is the matching of many potential regular expressions for a date that can be written in various ways, and then parsing it with Ruby's Date class accordingly.
The references to the latest regex matching groups are always stored in pseudo variables $1 to $9:
case foo
when /^([0-9][0-9])/
print "the month is #{$1}"
else
print "something else"
end
You can also use the $LAST_MATCH_INFO pseudo variable to get at the whole MatchData object. This can be useful when using named captures:
case foo
when /^(?<number>[0-9][0-9])/
print "the month is #{$LAST_MATCH_INFO['number']}"
else
print "something else"
end
Here's an alternative approach that gets you the same result but doesn't use a switch. If you put your regular expressions in an array, you could do something like this:
res = [ /pat1/, /pat2/, ... ]
m = nil
res.find { |re| m = foo.match(re) }
# Do what you will with `m` now.
Declaring m outside the block allows it to still be available after find is done with the block and find will stop as soon as the block returns a true value so you get the same shortcutting behavior that a switch gives you. This gives you the full MatchData if you need it (perhaps you want to use named capture groups in your regexes) and nicely separates your regexes from your search logic (which may or may not yield clearer code), you could even load your regexes from a config file or choose which set of them you wanted at run time.

Resources