Is it possible to exclude some of the string used to match from Ruby regexp data? - ruby

I have a bunch of strings that look, for example, like this:
<option value="Spain">Spain</option>
And I want to extract the name of the country from inside.
The easiest way I could think of to do this in Ruby was to use a regular expression of this form:
country = line.match(/>(.+)</)
However, this returns >Spain<. So I did this:
line.match(/>(.+)</).to_s.gsub!(/<|>/,"")
Works well enough, but I'd be surprised if there's not a more elegant way to do this? It seems like using a regular expression to declare how to find the thing you want, without actually wanting the enclosing strings that were used to match it to be part of the data that gets returned.
Is there a conventional approach to this problem?

The right way to deal with that string is to use an HTML parser, for example:
country = Nokogiri::HTML('<option value="Spain">Spain</option>').at('option').text
And if you have several such strings, paste them together and use search:
html = '<option value="Spain">Spain</option><option value="Canada">Canada</option>'
countries = Nokogiri::HTML(html).search('option').map(&:text)
# ["Spain", "Canada"]
But if you must use a regex, then:
country = '<option value="Spain">Spain</option>'.match('>([^<]+)<')[1]
Keep in mind that match actually returns a MatchData object and MatchData#to_s:
Returns the entire matched string.
But you can access the captured groups using MatchData#[]. And if you don't like counting, you could use a named capture group as well:
country = '<option value="Spain">Spain</option>'.match('>(?<name>[^<]+)<')['name']

Related

How to have ruby conditionally check if variables exist in a string?

So I have a string from a rendered template that looks like
"Dear {{user_name}},\r\n\r\nThank you for your purchase. If you have any questions, we are happy to help.\r\n\r\n\r\n{{company_name}}\r\n{{company_phone_number}}\r\n"
All those variables like {{user_name}} are optional and do not need to be included but I want to check that if they are, they have {{ in front of the variable name. I am using liquid to parse and render the template and couldn't get it to catch if the user only uses 1 (or no) opening brackets. I was only able to catch the proper number of closing brackets. So I wrote a method to check that if these variables exist, they have the correct opening brackets. It only works, however, if all those variables are found.
here is my method:
def validate_opening_brackets?(template)
text = %w(user_name company_name company_phone_number)
text.all? do |variable|
next unless template.include? variable
template.include? "{{#{variable}"
end
end
It works, but only if all variables are present. If, for example, the template created by the user does not include user_name, then it will return false. I've also done this loop using each, and creating a variable outside of the block that I assign false if the conditions are not met. I would really, however, like to get this to work using the all? method, as I can just return a boolean and it's cleaner.
If the question is about how to rewrite the all? block to make it return true if all present variable names have two brackets before them and false otherwise then you could use something like this:
def validate_opening_brackets?(template)
variables = %w(user_name company_name company_phone_number)
variables.all? do |variable|
!template.include?(variable) || template.include?("{{#{variable}")
end
end
TL;DR
There are multiple ways to do this, but the easiest way I can think of is to simply prefix/postfix a regular expression with the escaped characters used by Mustache/Liquid, and using alternation to check for each of your variable names within the template variable characters (e.g. double curly braces). You can then use String#scan and then return a Boolean from Enumerable#any? based on the contents of the Array returned by from #scan.
This works with your posted example, but there may certainly be other use cases where you need a more complex solution. YMMV.
Example Code
This solution escapes the leading and trailing { and } characters to avoid having them treated as special characters, and then interpolates the variable names with | for alternation. It returns a Boolean depending on whether templated variables are found.
def template_string_has_interpolations? str
var_names = %w[user_name company_name company_phone_number]
regexp = /\{\{#{var_names.join ?|}\}\}/
str.scan(regexp).any?
end
Tested Examples
template_string_has_interpolations? "Dear {{user_name}},\r\n\r\nThank you for your purchase. If you have any questions, we are happy to help.\r\n\r\n\r\n{{company_name}}\r\n{{company_phone_number}}\r\n"
#=> true
template_string_has_interpolations? "Dear Customer,\r\n\r\nThank you for your purchase. If you have any questions, we are happy to help.\r\n\r\n\r\nCompany, Inc.\r\n(555) 555-5555\r\n"
#=> false

Need XPath and XQuery query

I'm working on Xpath/Xquery to return values of multiple child nodes based on a sibling node value in a single query. My XML looks like this
<FilterResults>
<FilterResult>
<ID>535</ID>
<Analysis>
<Name>ZZZZ</Name>
<Identifier>asdfg</Identifier>
<Result>High</Result>
<Score>0</Score>
</Analysis>
<Analysis>
<Name>XXXX</Name>
<Identifier>qwerty</Identifier>
<Result>Medium</Result>
<Score>0</Score>
</Analysis>
</FilterResult>
<FilterResult>
<ID>745</ID>
<Analysis>
<Name>XXXX</Name>
<Identifier>xyz</Identifier>
<Result>Critical</Result>
<Score>0</Score>
</Analysis>
<Analysis>
<Name>YYYY</Name>
<Identifier>qwerty</Identifier>
<Result>Medium</Result>
<Score>0</Score>
</Analysis>
</FilterResult>
</FilterResults>
I need to get values of Score and Identifier based on Name value. I'm currently trying with below query but not working as desired
fn:string-join((
for $Identifier in fn:distinct-values(FilterResults/FilterResult/Analysis[Name="XXXX"])
return fn:string-join((//Identifier,//Score),'-')),',')
The output i'm looking for is this
qwerty-0,xyz-0
Your question suggests some fundamental misunderstandings about XQuery, generally. It's hard to explain everything in a single answer, but 1) that is not how distinct-values works (it returns string values, not nodes), and 2) the double slash selections in your return statement are returning everything because they are not constrained by anything. The XPath you use inside the distinct-values call is very close, however.
Instead of calling distinct-values, you can assign the Analysis results of that XPath to a variable, iterate over them, and generate concatenated strings. Then use string-join to comma separate the full sequence. Note that in the return statement, the variable $a is used to concat only one pair of values at a time.
string-join(
let $analyses := FilterResults/FilterResult/Analysis[Name="XXXX"]
for $a in $analyses
return $a/concat(Identifier, '-', Score),
',')
=> qwerty-0,xyz-0

simplified regex for modifying a string in ruby

Here is my original string:
"Chassis ID TLV\n\tMAC: 00:xx:xx:xx:xx:xx\nPort ID TLV\n\tIfname: Ethernet1/3\nTime to Live TLV\n\t120"
and i want the string to be formatted as :
"Chassis ID TLV;00:xx:xx:xx:xx:xx\nPort ID TLV;Ethernet1/3\nTime to Live TLV;120"
so i used following ruby string functions to do it:
y = x.gsub(/\t[a-zA-Z\d]+:/,"\t")
y = y.gsub(/\t /,"\t")
y = y.gsub("\n\t",";")
so i am looking for a one liner to do the above. since i am not used to regex, i tried doing it sequentially. i am messing it up when i try to do all of them together.
Replace the following construct
[\n\r]\t(?:\w+: )?
with ;, see a demo on regex101.com.
I'd tackle it as a few smaller steps:
input = "Chassis ID TLV\n\tMAC: 00:xx:xx:xx:xx:xx\nPort ID TLV\n\tIfname: Ethernet1/3\nTime to Live TLV\n\t120"
input.split(/\n\t?/).map { |s| s.sub(/\A[^:]+\:\s*/, '') }.join(';')
# => "Chassis ID TLV;00:xx:xx:xx:xx:xx;Port ID TLV;Ethernet1/3;Time to Live TLV;120"
That way you have control over each element instead of being entirely dependent on the regular expression to do it as one shot.

Selenium Webdriver + Ruby regex: Can I use regex with find_element?

I am trying to click an element that changes per each order like so
edit_div_123
edit_div_124
edit_div_xxx
xxx = any three numbers
I have tried using regex like so:
#driver.find_element(:css, "#edit_order_#{\d*} > div.submit > button[name=\"commit\"]").click
#driver.find_element(:xpath, "//*[(#id = "edit_order_#{\d*}")]//button").click
Is this possible? Any other ways of doing this?
You cannot use Regexp, like the other answers have indicated.
Instead, you can use a nifty CSS Selector trick:
#driver.find_element(:css, "[id^=\"edit_order_\"] > div.submit > button[name=\"commit\"]").click
Using:
^= indicates to find the element with the value beginning with your criteria.
*= says the criteria should be found anywhere within the element's value
$= indicates to find the element with with your criteria at the end of the value.
~= allows you to find the element based on a single criteria when the actual value has multiple space-seperated list of values.
Take a look at http://net.tutsplus.com/tutorials/html-css-techniques/the-30-css-selectors-you-must-memorize/ for some more info on other neat CSS tricks you should add to your utility belt!
You have no provided any html fragment that you are working on. Hence my answer is just based on the limited inputs provided your question.
I don't think WebDriver APIs support regex for locating elements. However, you can achieve what you want using just plain XPath as follows:
//*[starts-with(#id, 'edit_div_')]//button
Explanation: Above xpath will try to search all <button> nodes present under all elements whose id attribute starts with string edit_div_
In short, you can use starts-with() xpath function in order to match element with id format as edit_div_ followed by any number of characters
No, you can not.
But you should do something like this:
function hasClass(element, className) {
var re = new RegExp('(?:^|\\s+)' + className + '(?:\\s+|$)');
return re.test(element.className);
}
This worked for me
#driver.find_element(:xpath, "//a[contains(#href, 'person')]").click

Ruby Regular Expression: Setting $1 variable in a hash

Everything in this code works properly, except the contents of the $1 variable aren't being properly displayed. According to my tests, all the matching is being done properly, I am just having trouble figuring out how to actually output the contents of $1.
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
regexp = Regexp.new(/(#{Regexp.union(codeTags.keys)})/)
message = (message).gsub(/#{regexp}/) do |match|
codeTags[codeTags.keys.select {|k| match =~ Regexp.new(k)}[0]]
end
return message.html_safe
Thank you!
As soon as you do this:
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
The #{$1} bits in the values are interpolated using whatever happens to be in $1 at the time. The values will most likely be "<strong></strong>" and "<em></em>" and those aren't very useful.
And regexp is already a regular expression object so gsub(/#{regexp}/) should be just gsub(regexp). Similar things apply to the keys of codeTags, they're already regular expression objects so you don't need to Regexp.new(k).
I'd change the whole structure, you're overcomplicating things. Just something simple like this would be fine for only two replacements:
message = message.gsub(/\[b\](.*?)\[\/b\]/) { '<strong>' + $1 + '</strong>' }
message = message.gsub(/\[i\](.*?)\[\/i\]/) { '<em>' + $1 + '</em>' }
If you try to do it all at once you'll have problems with nesting in something like this:
message = 'Where [b]is[/b] pancakes [b]house [i]and[/i] more[/b] stuff?'
You'd end up having to use a recursive gsub and possibly some lambdas if you wanted to properly handle things like that with a single expression.
There are better things to spend your time on than trying to be clever on something like this.
Response to comments: If you have more bb-tags and some smilies to worry about and several messages per page then you should HTMLify each message when you create it. You could store only the HTML version or both HTML and BB-Code versions if you want the BB-Code stuff around for some reason. This way you'd only pay for the HTMLification once per message and producing your big lists would be nearly free.

Resources