I've looked everywhere and cannot find a clear cut example. I want to be able to only match some, not all variants of an enum.
pub enum InfixToken {
Operator(Operator),
Operand(isize),
LeftParen,
RightParen,
}
So I can perform this in a for loop of tokens:
let x = match token {
&InfixToken::Operand(c) => InfixToken::Operand(c),
&InfixToken::LeftParen => InfixToken::LeftParen,
};
if tokens[count - 1] == x {
return None;
}
How do I compare if the preceding token matches the only two variants of an enum without comparing it to every variant of the enum? x also has to be the same type of the preceding token.
Also, and probably more important, how can I match an operand where isize value doesn't matter, just as long as it is an operand?
You can use _ in patterns to discard a value: InfixToken::Operand(_) => branch. If the whole pattern is _, it will match anything.
To only perform code if specific variants are matched, put that code in the match branch for those variants:
match token {
&InfixToken::Operand(_) |
&InfixToken::LeftParen => {
if tokens[count - 1] == token {
return None;
}
}
_ => {}
}
The bar (|) is syntax for taking that branch if either pattern is satisfied.
In cases where you only want to match one variant of an enum, use if let
Related
I got a task on code wars.
The task is
In this simple Kata your task is to create a function that turns a string into a Mexican Wave. You will be passed a string and you must return that string in an array where an uppercase letter is a person standing up.
Rules are
The input string will always be lower case but maybe empty.
If the character in the string is whitespace then pass over it as if it was an empty seat
Example
wave("hello") => []string{"Hello", "hEllo", "heLlo", "helLo", "hellO"}
So I have found the solution but I want to understand the logic of it. Since its so minimalistic and looks cool but I don't understand what happens there. So the solution is
fun wave(str: String) = str.indices.map { str.take(it) + str.drop(it).capitalize() }.filter { it != str }
Could you please explain?
str.indices just returns the valid indices of the string. This means the numbers from 0 to and including str.length - 1 - a total of str.length numbers.
Then, these numbers are mapped (in other words, transformed) into strings. We will now refer to each of these numbers as "it", as that is what it refers to in the map lambda.
Here's how we do the transformation: we first take the first it characters of str, then combine that with the last str.length - it characters of str, but with the first of those characters capitalized. How do we get the last str.length - it characters? We drop the first it characters.
Here's an example for when str is "hello", illustrated in a table:
it
str.take(it)
str.drop(it)
str.drop(it).capitalize()
Combined
0
hello
Hello
Hello
1
h
ello
Ello
hEllo
2
he
llo
Llo
heLLo
3
hel
lo
Lo
helLo
4
hell
o
O
hellO
Lastly, the solution also filters out transformed strings that are the same as str. This is to handle Rule #2. Transformed strings can only be the same as str if the capitalised character is a whitespace (because capitalising a whitespace character doesn't change it).
Side note: capitalize is deprecated. For other ways to capitalise the first character, see Is there a shorter replacement for Kotlin's deprecated String.capitalize() function?
Here's another way you could do it:
fun wave2(str: String) = str.mapIndexed { i, c -> str.replaceRange(i, i + 1, c.uppercase()) }
.filter { it.any(Char::isUpperCase) }
The filter on the original is way more elegant IMO, this is just as an example of how else you might check for a condition. replaceRange is a way to make a copy of a string with some of the characters changed, in this case we're just replacing the one at the current index by uppercasing what's already there. Not as clever as the original, but good to know!
I want to split tokens in an array if they are of the form "a number, a dot ("."), and then non-numbers". If the tokens is of the form: "number, dot, number", I don't want to split it. I thought this would do the trick
tokens.flat_map {|o| o.scan(/^\d+\.|[a-z]+/i) }
The expression works correctly for this case:
tokens = ["44.WORD"]
tokens.flat_map {|o| o.scan(/^\d+\.|[a-z]+/i) }
# => ["44.", "WORD"]
but the expression seems to cut off the token, as shown below:
tokens = ["72.9"]
tokens.flat_map {|o| o.scan(/^\d+\.|[a-z]+/i) }
# => ["72."]
How do I adjust my regular expression so that if the token is a number, a dot, and a number, I keep it just as it is and split it in two otherwise?
Try this
tokens.flat_map { |token| token =~ /[a-z]/i ? token.split('.') : token }
This doesn't adjust your regexp, but sometimes it is easier to use Ruby rather than cramming everything into a regexp. And often also more readable.
Since you have a well defined notion of where to split, use split instead of scan.
["44.WORD"].flat_map{|s| s.split(/(?<=\d\.)(?=\D)/)}
# => ["44.", "WORD"]
["72.9"].flat_map{|s| s.split(/(?<=\d\.)(?=\D)/)}
# => ["72.9"]
I have some parameters that I have to sort into different lists. The prefix determines which list should it belong to.
I use prefixes like: c, a, n, o and an additional hyphen (-) to determine whether to put it in include l it or exclude list.
I use the regex grouped as:
/^(-?)([o|a|c|n])(\w+)/
But here the third group (\w+) is not generic, and it should actually be dependent on the second group's result. I.e, if the prefix is:
'c' or 'a' -> /\w{3}/
'o' -> /\w{2}/
else -> /\w+/
Can I do this with a single regex? Currently I am using an if condition to do so.
Example input:
Valid:
"-cABS", "-aXYZ", "-oWE", "-oqr", "-ncanbeanyting", "nstillanything", "a123", "-conT" (will go to c_exclude_list)
Invalid:
"cmorethan3chars", "c1", "-a1234", "prefizisnotvalid", "somethingelse", "oABC"
Output: for each arg push to the correct list, ignore the invalid.
c_include_list, c_exclude_list, a_include_list, a_exclude_list etc.
You can use this pattern:
/(-?)\b([aocn])((?:(?<=[ac])\w{3}|(?<=o)\w{2}|(?<=n)\w+))\b/
The idea consists to use lookbehinds to check the previous character without including it in the capture group.
Since version 2.0, Ruby has switched from Oniguruma to Onigmo (a fork of Oniguruma), which adds support for conditional regex, among other features.
So you can use the following regex to customize the pattern based on the prefix:
^-(?:([ca])|(o)|(n))?(?(1)\w{3}|(?(2)\w{2}|(?(3)\w+)))$
Demo at rubular
Is a single, mind-bending regex the best way to deal with this problem?
Here's a simpler approach that does not employ a regex at all. I suspect that it would be at least as efficient as a single regex, considering that with the latter you must still assign matching strings to their respective arrays. I think it also reads better and would be easier to maintain. The code below should be easy to modify if I have misunderstood some fine points of the question.
Code
def devide_em_up(str)
h = { a_exclude: [], a_include: [], c_exclude: [], c_include: [],
o_exclude: [], o_include: [], other_exclude: [], other_include: [] }
str.split.each do |s|
exclude = (s[0] == ?-)
s = s[1..-1] if exclude
first = s[0]
s = s[1..-1] if 'cao'.include?(first)
len = s.size
case first
when 'a'
(exclude ? h[:a_exclude] : h[:a_include]) << s if len == 3
when 'c'
(exclude ? h[:c_exclude] : h[:c_include]) << s if len == 3
when 'o'
(exclude ? h[:o_exclude] : h[:o_include]) << s if len == 2
else
(exclude ? h[:other_exclude] : h[:other_include]) << s
end
end
h
end
Example
Let's try it:
str = "-cABS cABT -cDEF -aXYZ -oWE -oQR oQT -ncanbeany nstillany a123 " +
"-conT cmorethan3chars c1 -a1234 prefizisnotvalid somethingelse oABC"
devide_em_up(str)
#=> {:a_exclude=>["XYZ"], :a_include=>["123"],
# :c_exclude=>["ABS", "DEF"], :c_include=>["ABT"],
# :o_exclude=>["WE", "QR"], :o_include=>["QT"],
# :other_exclude=>["ncanbeany"], :other_include=>["nstillany"]}
I have a large file in a ruby variable, it follows a common pattern like so:
// ...
// comment
$myuser['bla'] = 'bla';
// comment
$myuser['bla2'] = 'bla2';
// ...
I am trying to given a 'key' replace the 'value'
This replaces the entire string how do I fix it? Another method I thought is to do it in two steps, step one would be to find the value within the quotes then to perform a string replace, what's best?
def keyvalr(content, key, value)
return content.gsub(/\$bla\[\'#{key}\'\]\s+\=\s+\'(.*)\'/) {|m| value }
end
The .* is greedy and consumes as much as possible (everything until the very last '). Make that . a [^'] then it is impossible for it to go past the first closing '.
/(\$bla\[\'#{key}\'\]\s+\=\s+\')[^']*(\')/
I also added parentheses to capture everything except for the value, which is to be replaced. The first set of parens will correspond to \1 and the second to \2. So that you replace the match of this with:
"\1yournewvaluehere\2"
I'd use something like:
text = %q{
// ...
// comment
$myuser['bla'] = 'bla';
// comment
$myuser['bla2'] = 'bla2';
// ...
}
from_to = {
'bla' => 'foo',
'bla2' => 'bar'
}
puts text.gsub(/\['([^']+)'\] = '([^']+)'/) { |t|
key, val = t.scan(/'([^']+)'/).flatten
"['%s'] = '%s'" % [ key, from_to[key] ]
}
Which outputs:
// ...
// comment
$myuser['bla'] = 'foo';
// comment
$myuser['bla2'] = 'bar';
// ...
This is how it works:
If I do:
puts text.gsub(/\['([^']+)'\] = '([^']+)'/) { |t|
puts t
}
I see:
['bla'] = 'bla'
['bla2'] = 'bla2'
Then I tried:
"['bla'] = 'bla'".scan(/'([^']+)'/).flatten
=> ["bla", "bla"]
That gave me a key, "value" pair, so I could use a hash to look-up the replacement value.
Sticking it inside a gsub block meant whatever matched got replaced by my return value for the block, so I created a string to replace the "hit" and let gsub do its "thang".
I'm not a big believer in using long regex. I've had to maintain too much code that tried to use complex patterns, and got something wrong, and failed to accomplish what was intended 100% of the time. They're very powerful, but maintenance of code is a lot harder/worse than developing it, so I try to keep patterns I write in spoon-size pieces, having mercy on those who follow me in maintaining the code.
I'm attempting to parse blocks of text and need a way to detect the difference between apostrophes in different contexts. Possession and abbreviation in one group, quotations in the other.
e.g.
"I'm the cars' owner" -> ["I'm", "the", "cars'", "owner"]
but
"He said 'hello there' " -> ["He","said"," 'hello there' "]
Detecting whitespace on either side won't help as things like " 'ello " and " cars' " would parse as one end of a quotation, same with matching pairs of apostrophes. I'm getting the feeling that there's no way of doing it other than an outrageously complicated NLP solution and I'm just going to have to ignore any apostrophes not occurring mid-word, which would be unfortunate.
EDIT:
Since writing I have realised this is impossible. Any regex-ish based parser would have to parse:
'ello there my mates' dogs
in 2 different ways, and could only do that with understanding of the rest of the sentence. Guess I'm for the inelegant solution of ignoring the least likely case and hoping it's rare enough to only cause infrequent anomalies.
Hm, I'm afraid this won't be easy. Here's a regex that kinda works, alas only for stuff like "I'm" and "I've":
>> s1 =~ /[\w\s]*((?<!I)'(?:[^']+)')[\w\s]*/
=> nil
>> s2 =~ /[\w\s]*((?<!I)'(?:[^']+)')[\w\s]*/
=> 0
>> $1
=> "'hello there'"
If you play around with it a bit more, you may be able to eliminate some other common contractions, which might still be better than nothing.
Some rules to think about:
Quotes will start with an apostrophe with a whitespace character or nothing before it.
Quotes will end with an apostrophe with punctuation or a whitespace character after it.
Some words may look like the end of quotes, e.g., peoples'.
Quote delimiting apostrophes will never have letters directly before and after them.
Use a very simple two-phase process.
In pass 1 of 2, start with this regular expression to break the text down into alternating segments of word and non-word characters.
/(\w+)|(\W+)/gi
Store the matches in a list like this (I'm using AS3-style pseudo-code, since I don't work with ruby):
class MatchedWord
{
var text:String;
var charIndex:int;
var isWord:Boolean;
var isContraction:Boolean = false;
function MatchedWord( text:String, charIndex:int, isWord:Boolean )
{
this.text = text; this.charIndex = charIndex; this.isWord = isWord;
}
}
var match:Object;
var matched_word:MatchedWord;
var matched_words:Vector.<MatchedWord> = new Vector.<MatchedWord>();
var words_regex:RegExp = /(\w+)|(\W+)/gi
words_regex.lastIndex = 0; //this is where to start looking for matches, and is updated to the end of the last match each time exec is called
while ((match = words_regex.exec( original_text )) != null)
matched_words.push( new MatchedWord( match[0], match.index, match[1] != null ) ); //match[0] is the entire match and match[1] is the first parenthetical group (if it's null, then it's not a word and match[2] would be non-null)
In pass 2 of 2, iterate over the list of matches to find contractions by checking to see if each (trimmed, non-word) match ENDS with an apostrophe. If it does, then check the next adjacent (word) match to see if it matches one of only 8 common contraction endings. Despite all the two-part contractions I could think of, there are only 8 common endings.
d
l
ll
m
re
s
t
ve
Once you've identified such a pair of matches (non-word)="'" and (word)="d", then you just include the preceding adjacent (word) match and concatenate the three matches to get your contraction.
Understanding the process just described, one modification you must make is expand that list of contraction endings to include contractions that start with apostrophe, such as "'twas" and "'tis". For those, you simply don't concatenate the preceding adjacent (word) match, and you look at the apostrophe match a little more closely to see if it included other non-word character before it (that's why it's important it ends with an apostrophe). If the trimmed string EQUALS an apostrophe, then merge it with the next match, and if it only ENDS with an apostrophe, then strip off the apostrophe and merge it with the following match. Likewise, conditions that will include the prior match should first check to ensure the (trimmed non-word) match ending with an apostrophe EQUALS an apostrophe, so there are no extra non-word characters included accidentally.
Another modification you may need to make is expand that list of 8 endings to include endings that are whole words such as "g'day" and "g'night". Again, it's a simple modification involving a conditional check of the preceding (word) match. If it's "g", then you include it.
That process should capture the majority of contractions, and is flexible enough to include new ones you can think of.
The data structure would look like this.
Condition(Ending, PreCondition)
where PreCondition is
"*", "!", or "<exact string>"
The final list of conditions would look like this:
new Condition("d","*") //if apostrophe d is found, include the preceding word string and count as successful contraction match
new Condition("l","*");
new Condition("ll","*");
new Condition("m","*");
new Condition("re","*");
new Condition("s","*");
new Condition("t","*");
new Condition("ve","*");
new Condition("twas","!"); //if apostrophe twas is found, exclude the preceding word string and count as successful contraction match
new Condition("tis","!");
new Condition("day","g"); //if apostrophe day is found and preceding word string is g, then include preceding word string and count as successful contraction match
new Condition("night","g");
If you just process those conditions as I explained, that should cover all of these 86 contractions (and more):
'tis 'twas ain't aren't can't could've couldn't didn't doesn't don't
everybody's g'day g'night hadn't hasn't haven't he'd he'll he's how'd
how'll how's I'd I'll I'm I've isn't it'd it'll it's let's li'l
might've mightn't mustn't needn't nobody's nothing's shan't she'd
she'll she's should've shouldn't that'd that'll that's there's they'd
they'll they're they've wasn't we'd we'll we're we've weren't what'll
what're what'd what's what've when'd when'll when's where'd where'll
where's who's who'll who're who'd who'll who's who've why'd why'll
why's won't would've wouldn't you'd you'll you're you've
On a side note, don't forget about slang contractions that don't use apostrophes such as "gotta" > "got to" and "gonna" > "going to".
Here is the final AS3 code. Overall, you're looking at less than 50 lines of code to parse the text into alternating word and non-word groups, and identify and merge contractions. Simple. You could even add a Boolean "isContraction" variable to the MatchedWord class and set the flag in the code below when a contraction is identified.
//Automatically merge known contractions
var conditions:Array = [
["d","*"], //if apostrophe d is found, include the preceding word string and count as successful contraction match
["l","*"],
["ll","*"],
["m","*"],
["re","*"],
["s","*"],
["t","*"],
["ve","*"],
["twas","!"], //if apostrophe twas is found, exclude the preceding word string and count as successful contraction match
["tis","!"],
["day","g"], //if apostrophe day is found and preceding word string is g, then include preceding word string and count as successful contraction match
["night","g"]
];
for (i = 0; i < matched_words.length - 1; i++) //not a type-o, intentionally stopping at next to last index to avoid a condition check in the loop
{
var m:MatchedWord = matched_words[i];
var apostrophe_text:String = StringUtils.trim( m.text ); //check if this ends with an apostrophe first, then deal more closely with it
if (!m.isWord && StringUtils.endsWith( apostrophe_text, "'" ))
{
var m_next:MatchedWord = matched_words[i + 1]; //no bounds check necessary, since loop intentionally stopped at next to last index
var m_prev:MatchedWord = ((i - 1) >= 0) ? matched_words[i - 1] : null; //bounds check necessary for previous match, since we're starting at beginning, since we may or may not need to look at the prior match depending on the precondition
for each (var condition:Array in conditions)
{
if (StringUtils.trim( m_next.text ) == condition[0])
{
var pre_condition:String = condition[1];
switch (pre_condition)
{
case "*": //success after one final check, include prior match, merge current and next match into prior match and delete current and next match
if (m_prev != null && apostrophe_text == "'") //EQUAL apostrophe, not just ENDS with apostrophe
{
m_prev.text += m.text + m_next.text;
m_prev.isContraction = true;
matched_words.splice( i, 2 );
}
break;
case "!": //success after one final check, do not include prior match, merge current and next match, and delete next match
if (apostrophe_text == "'")
{
m.text += m_next.text;
m.isWord = true; //match now includes word text so flip it to a "word" block for logical consistency
m.isContraction = true;
matched_words.splice( i + 1, 1 );
}
else
{ //strip apostrophe off end and merge with next item, nothing needs deleted
//preserve spaces and match start indexes by manipulating untrimmed strings
var apostrophe_end:int = m.text.lastIndexOf( "'" );
var apostrophe_ending:String = m.text.substring( apostrophe_end, m.text.length );
m.text = m.text.substring( 0, m.text.length - apostrophe_ending.length); //strip apostrophe and any trailing spaces
m_next.text = apostrophe_ending + m_next.text;
m_next.charIndex = m.charIndex + apostrophe_end;
m_next.isContraction = true;
}
break;
default: //conditional success, check prior match meets condition
if (m_prev != null && m_prev.text == pre_condition)
{
m_prev.text += m.text + m_next.text;
m_prev.isContraction = true;
matched_words.splice( i, 2 );
}
break;
}
}
}
}
}