Regular expression to fetch the value from a given string - ruby

I have the following string:
a=<record><FPR_AGENT_CODE>990042833</FPR_AGENT_CODE><FPR_AGENT_LABELCODE>CIF Code :</FPR_AGENT_LABELCODE><FPR_AGENT_LABELNAME>CIF Name :</FPR_AGENT_LABELNAME>
I need to get the value from:
<FPR_AGENT_CODE>990042833</FPR_AGENT_CODE>
to
"FPR_AGENT_CODE 990042833 FPR_AGENT_CODE"
How can I write the regular expression for this? I tried using the one given below, but it's not working.
puts a[/<.*>.*<\/.*>/]

You can use scan with the following regex:
/<([^>]+)>(\d+)<\/\1>/
Sample code:
a="<record><FPR_AGENT_CODE>990042833</FPR_AGENT_CODE><FPR_AGENT_LABELCODE>CIF Code :</FPR_AGENT_LABELCODE><FPR_AGENT_LABELNAME>CIF Name :</FPR_AGENT_LABELNAME><FPR_AGENT_NAME>Mr Kamal Kishore</FPR_AGENT_NAME><FPR_BANK_BRANCH_NAME>STATE BANK OF INDIA KHOUR</FPR_BANK_BRANCH_NAME><FPR_BRANCH_ADDRESS>"
puts a.scan(/<([^>]+)>(\d+)<\/\1>/)
Output:
FPR_AGENT_CODE
990042833
The regex <([^>]+)>(\d+)<\/\1> searches for a string in angle brackets (capturing the text into group 1), then a sequence of 1 or more digits (\d+), and then the closing tag.
If you need to get multiple values, you can use:
puts a.scan(/<([^>]+\b)[^<>]*>(.*?)<\/\1>/)
See another demo, output:
FPR_AGENT_CODE
990042833
FPR_AGENT_LABELCODE
CIF Code :
FPR_AGENT_LABELNAME
CIF Name :
FPR_AGENT_NAME
Mr Kamal Kishore
FPR_BANK_BRANCH_NAME
STATE BANK OF INDIA KHOUR
For multiline input, either use m option, or replace (.*?) with ([^<]*).
puts a.scan(/<([^>]+\b)[^<>]*>(.*?)<\/\1>/m)
Or
puts a.scan(/<([^>]+\b)[^<>]*>([^<]*)<\/\1>/)
See another demo

Related

Extracting a substring from a string using `Regexp.new`

I have a string like this:
var = "Renewal Quote RQ00041233 (Payment Pending) Policy R38A014294-1"
I have to extract "Payment Pending" from that string using only the information included in another single string.
The following:
var[/\((.*)\)/, 1]
will extract what I want. I can include the string representation of the regex in the string to be given, and construct the regular expression from it using Regexp.new, but I have no way to achieve the information 1 used as the second argument of [].
Without the second argument 1,
regex_string = '\((.*)\)'
var[Regexp.new(regex_string)]
fetches the string "(Payment Pending)"instead of the expected "Payment Pending".
Can someone help me?
Not sure what you are trying to do, but you can get rid of capturing groups using a different regex:
var[/(?<=\().*(?=\))/]
# => "Payment Pending"
or
var[Regexp.new('(?<=\().*(?=\))')]
# => "Payment Pending"
/\((.*)\)/ is just shorthand for Regexp.new('\((.*)\)').
String#[] takes a regex and a capture group as two separate arguments. var[/\((.*)\)/, 1] is var[Regex, 1].
The important thing to realize is 1 is passed to var[], not the regex.
re = Regexp.new('\((.*)\)')
match = var[re, 1]
Note: you might want to require a named capture group rather than a numbered one. It's very easy to accidentally include an extra capture group in a regex.
Assuming there are no nested parenthesis in the string, one way to do that without using a regular expression is as follows.
instance_eval "var[(i=var.index('(')+1)..var.index(')',i)-1]"
#=> "Payment Pending"
See String#index, particularly the reference to the optional second argument, "offset".

How do I create an XPath query to extract a substring of text?

I am trying to create xpath so that it only returns order number instead of whole line.Please see attached screenshot
What you want is the substring-after() function -
fn:substring-after(string1,string2)
Returns the remainder of string1 after string2 occurs in it
Example: substring-after('12/10','/')
Result: '10'
For your situation -
substring-after(string(//p[contains(text(), "Your order # is")]), ": ")
To test this, I modified the DOM on this page to include a "Order Number: ####" string.
See it in action:
You could also just use your normal Xpath selector to get the complete text, being "Your oder # is: 123456" and then perform a regex on the string like mentioned in Get numbers from string with regex

Regex to extract last number portion of varying URL

I'm creating a URL parser and have three kind of URLs from which I would like to extract the number portion from the end of the URL and increment the extracted number by 10 and update the URL. I'm trying to use regex to extract but I'm new to regex and having trouble.
These are three URL structures of which I'd like to increment the last number portion of:
Increment last number 20 by 10:
http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/
Increment last number 50 by 10:
https://forums.questionablecontent.net/index.php/board,1.50.html
Increment last number 30 by 10:
https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/
With \d+(?!.*\d) regex, you will get the last digit chunk in the string. Then, use s.gsub with a block to modify the number and put back to the result.
See this Ruby demo:
strs = ['http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/', 'https://forums.questionablecontent.net/index.php/board,1.50.html', 'https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/']
arr = strs.map {|item| item.gsub(/\d+(?!.*\d)/) {$~[0].to_i+10}}
Note: $~ is a MatchData object, and using the [0] index we can access the whole match value.
Results:
http://forums.scamadviser.com/site-feedback-issues-feature-requests/30/
https://forums.questionablecontent.net/index.php/board,1.60.html
https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.40/
Try this regex:
\d+(?=(\/)|(.html))
It will extract the last number.
Demo: https://regex101.com/r/zqUQlF/1
Substitute back with this regex:
(.*?)(\d+)((\/)|(.html))
Demo: https://regex101.com/r/zqUQlF/2
this regex matches only the last whole number in each URL by using a lookahead (which 'sees' patterns but doesn't eat any characters):
\d+(?=\D*$)
online demo here.
Like this:
urls = ['http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/', 'https://forums.questionablecontent.net/index.php/board,1.50.html', 'https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/']
pattern = /(\d+)(?=[^\d]+$)/
urls.each do |url|
url.gsub!(pattern) {|m| m.to_i + 10}
end
puts urls
You can also test it online here: https://ideone.com/smBJCQ

Regex for validating constant field with numbers

I am new to ruby. I am trying for a regex pattern matching for my input. My requirement is that my input should strictly adhere to the following format
CHECK ID#<number>
(Eg. my input should be CHECK ID#3213)
How do i frame the pattern for this?
If you want to extract the ID number use this
"CHECK ID#123".scan(/CHECK ID#(\d+)/).last.first.to_i # => 123
Because you just need one result there is not need to use .scan or .match
"CHECK ID#123"[/CHECK ID#(\d+)/, 1].to_i
How about this:
match = "CHECK ID#1221".match /^CHECK ID#(\d+)$/
puts match[1] if match
=> 1221

Ruby Regex: How to match pattern that follows another pattern?

I have ID numbers that should come after the text ID: so my file consists of
ID: A1234
ID: A1235
ID: A1236
etc. I want to match /[A-Z]*[0-9]+/ but only if it comes after the characters ID:. How would I add that to the regular expression but not make it return ID: as part of the result? I just want it to match the regex that follows ID:, because at the end of the file I have numbers and it's returning them, but those aren't ID numbers.
/ID:\s*([A-Z]*[0-9]+)/
the parentheses capture what's inside the parentheses, and then you can refer to it using backreferences. If you post some code of how you're using the regex, I can try to add some more detail to show you how.

Resources