In xpath how you compare text() with \r\n (line break)?

In xpath how you compare text() with \r\n (line break)? - xpath

I want to get the node :
//script[starts-with(text(). '\r\nvar name')]
but it seems xpath does not recognize \r\n escape characters. Any ideas how to match them?
Note: I am using html agility pack

Use:
//script[starts-with(., '
var name')]
Most often XML is normalized by the XML parser and there is only a single NL character left -- therefore, if the above expression doesn't select the wanted script elements, try with:
//script[starts-with(., '
var name')]
Or, this would work in both cases:
//script
[(starts-with(., '
') or starts-with(., '
'))
and
starts-with(substring-after(., '
'), 'var name')
]

Related

trailing and leading blank issue in string

I am working on a project where I need to check if the employee enter *done* in a text field, though employee enters '* done *' or '*done *' or '* done*' in similar fashion. As you see they are putting trailing and leading blank or both at a time.I have to check the column for all three/four possible entry in like statement, I tried trim,rtrim nothing seems like working.
case when
col like ('*done*')
or col like ('* done*')
or col like ('*done *')
or col like ('* done *')
end as work_status
doesn't seems a smart way to do it. What is the best way to to check this. Any help will be appreciated. Thank you.

Remove spaces:
case when replace(col, ' ') = '*done*' then 'done'
else 'not'
end as work_status

You can look for the done substring with anything preceding and following using the LIKE operator and % wildcards:
CASE WHEN col LIKE '%done%' THEN 'done' END AS work_status
Or you can trim the leading and trailing space characters:
CASE WHEN TRIM(col) = 'done' THEN 'done' END AS work_status
Or you can replace all the leading/trailing white spaces (in case the users have entered new lines, tabs, etc. rather than space characters) using a regular expression:
CASE
WHEN REGEXP_REPLACE(col, '^[[:space:]]+|[[:space:]]+$') = 'done'
THEN 'done'
END AS work_status
fiddle

Double quotes in csv-table cell

I am struggling to add a cell with double quotes in the csv-table.
.. csv-table::
:header: f,d
2,"ts*"
the above one works fine.
But if I try to get the cell as ts"*" instead of ts*, it starts throwing an error :
Error with CSV data in "csv-table" directive: ',' expected after '"'
I tried using escape characters (like \ ) but it didn't work.
I was trying it here : online editor

I think i found the solution; There is an option to specify the escape sequence :escape: '.
.. csv-table::
:escape: '
:header: f,d
2,"ts'"*'""
It is now showing the cell as ts"*".
Try it online

Xpath: Find element with class that contains spaces

So I have elements that look like this
<li class="attribute "></li> # note the space
<li class="attribute"></li>
Using the xpath //li[#class="attribute"] will get the second element but not the first. How can I get both elements with the same xpath?

This XPath 1.0 expression,
//li[contains(concat(' ', normalize-space(#class), ' '),
' attribute ')]
will select all li elements with class attributes that contain the attribute substring, regardless of whether it has leading or trailing spaces.
If you only want to match attribute with possible leading and trailing spaces only (no other string values), just use normalize-space():
//li[normalize-space(#class) = 'attribute']

How can I select quoted strings that are outside html tags?

I am working on a syntax highlighter in ruby. From this input string (processed per line):
"left"<div class="wer">"test"</div>"right"
var car = ['Toyota', 'Honda']
How can I find "left", and "right" in the first line, 'Toyota', and 'Honda' on the second line?
I have (["'])(\\\1|[^\1]*?)\1 to highlight the quoted strings. I am struggling with the negative look behind part of the regex.
I tried appending another regex (?![^<]*>|[^<>]*<\/), but I can't get it to work with quoted strings. It works with simple alphanumeric only.

You can match one or more tokens by creating groups using parentheses in regex, and using | to create an or condition:
/("left")|("right")|('Toyota')|('Honda')/
Here's an example:
http://rubular.com/r/C8ONnxKYEV
EDIT
Just saw the tile of your question specified that you want to search outside HTML tags.
Unfortunately this isn't possible using only Regular expressions. The reason is that HTML, along with any language that requires delimiters like "", '', (), aren't regular. In other words, regexes don't contain a way of distinguishing levels of nesting and therefore you'll need to use a parser along with your Regex. If you're doing this strictly in Ruby, consider using a tool like Nokogiri or Mechanize to properly parse and interact with the DOM.

Description
This Ruby script first finds and replaces the HTML tags, note this is not perfect, and is susceptible to many edge cases. Then the script just looks for all the single and double quoted values.
str = %Q["left" <div class="wer">"test"</div>"right"\n]
str = str + %Q<var car = ['Toyota', 'Honda']>
puts "SourceString: \n" + str + "\n\n"
str.gsub!(/(?:<([a-z]+)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?>).*?<\/\1>/i, '_')
puts "SourceString after replacement: \n" + str + "\n\n"
puts "array of quoted values"
str.scan(/"[^"]*"|'[^']*'/)
Sample Output
SourceString:
"left" <div class="wer">"test"</div>"right"
var car = ['Toyota', 'Honda']
SourceString after replacement:
"left" _"right"
var car = ['Toyota', 'Honda']
=> ["\"left\"", "\"right\"", "'Toyota'", "'Honda'"]
Live Example
https://repl.it/CRGo
HTML Parsing
I do recommend using an HTML parsing engine instead. This one seems pretty decent for Ruby: https://www.ruby-toolbox.com/categories/html_parsing

Regular expression help

I am currently doing a bunch of processing on a string using regular expressions with gsub() but I'm chaining them quite heavily which is starting to get messy. Can you help me construct a single regex for the following:
string.gsub(/\.com/,'').gsub(/\./,'').gsub(/&/,'and').gsub(' ','-').gsub("'",'').gsub(",",'').gsub(":",'').gsub("#39;",'').gsub("*",'').gsub("amp;",'')
Basically the above removes the following:
.com
.
,
:
*
switches '&' for 'and'
switches ' ' for '-'
switches ' for ''
Is there an easier way to do this?

You can combine the ones that remove characters:
string.gsub(/\.com|[.,:*]/,'')
The pipe | means "or". The right side of the or is a character class; it means "one of these characters".

A translation table is more scalable as you add more options:
translations = Hash.new
translations['.com'] = ''
translations['&'] = 'and'
...
translations.each{ |from, to| string.gsub from, to }

Building on Tim's answer:
You can pass a block to String.gsub, so you could combine them all, if you wanted:
string.gsub(/\.com|[.,:*& ']/) do |sub|
case(sub)
when '&'
'and'
when ' '
'-'
else
''
end
end
Or, building off echoback's answer, you could use a translation hash in the block (you may need to call translations.default = '' to get this working):
string.gsub(/\.com|[.,:*& ']/) {|sub| translations[sub]}
The biggest perk of using a block is only having one call to gsub (not the fastest function ever).
Hope this helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

In xpath how you compare text() with \r\n (line break)? - xpath

I want to get the node : //script[starts-with(text(). '\r\nvar name')] but it seems xpath does not recognize \r\n escape characters. Any ideas how to match them? Note: I am using html agility pack

Related

trailing and leading blank issue in string

Double quotes in csv-table cell

Xpath: Find element with class that contains spaces

How can I select quoted strings that are outside html tags?

Regular expression help

Categories

Resources