How do I write a regex for Excel cell range? - ruby

I need to validate that something is an Excel cell range in Ruby, i.e: "A4:A6". By looking at it, the requirement I am looking for is:
<Alphabetical, Capitalised><Integer>:<Integer><Alphabetical, Capitalised>
I am not sure how to form a RegExp for this.
I would appreciate a small explanation for a solution, as opposed to purely a solution.
A bonus would be to check that the range is restricted to within a row or column. I think this would be out of scope of Regular Expressions though.
I have tried /[A-Z]+[0-9]+:[A-Z]+[0-9]+/ this works but allows extra characters on the ends.
This does not work because it allows extra's to be added on to the beginning or end:
"HELLOAA3:A7".match(/\A[A-Z]+[0-9]+:[A-Z]+[0-9]+\z/) also returns a match, but is more on the right track.
How would I limit the number range to 10000?
How would I limit the number of characters to 3?

This is my solution:
(?:(?:\'?(?:\[(?<wbook>.+)\])?(?<sheet>.+?)\'?!)?(?<colabs>\$)?(?<col>[a-zA-Z]+)(?<rowabs>\$)?(?<row>\d+)(?::(?<col2abs>\$)?(?<col2>[a-zA-Z]+)(?<row2abs>\$)?(?<row2>\d+))?|(?<name>[A-Za-z]+[A-Za-z\d]*))
It includes named ranges, but the R1C1 notation is not supported.
The pattern is written in perl compatible regex dialect (i.e. can also be used with C#), I'm not familiar with Ruby, so I can't tell the difference, but you may want to look here: What is the difference between Regex syntax in Ruby vs Perl?

This will do both: match Excel range and that they must be same row or column. Stub
^([A-Z]+)(\d+):(\1\d+|[A-Z]+\2)$
A4:A6 // ok
A5:B10 // not ok
B5:Z5 // ok
AZ100:B100hello // not ok
The magic here is the back-reference group:
([A-Z]+)(\d+) -- column is in capture group 1, row in group 2
(\1\d+|[A-Z]+\2) -- the first column followed by any number; or
-- the first row preceded by any character

Related

Regex: Grouping with OR

I'm new here, so please don't scold me for misspellings etc.
What I need to do is to rename a bunch of files with a date in different formats at the beginning of their names, like:
05.07.2020-abc.pdf
2020.07.05-pqr.pdf
Instead of writing a different expression for each formatting, eg.
^(\d{2})\.(\d{2}).(\d{4})(.+) => $3-$2-$1$4
Example
02.11.2022-abc.pdf => 2022-11-02-abc.pdf
I'd like to do it in one fell swoop using the OR operator "|" but I have no idea how to formulate the groupings etc. Can one have nested groupings in regex?
Any ideas? Thank in advance!
#The fourth bird:
No (.+) needed. You're right, I condensed my actual expression and could have taken it out.
The different date 'formats' I mean are dd.mm.yyyy and yyyy.mm.dd respectively, and I need to convert both to yyyy-mn-dd
So,if the format is dd.mm.yyyy I have to flip the string, so to say, else I just need to replace the dots by hyphens.
The OS is Android, and for this operation I use Solid Explorer multi search & replace using regex.
I hope I made myself clear this time around ;-)

Is there a way to change the way Google Sheets Query Group sorts? By both capitals and letter case?

I have a simple query function that returns a range of names and sums, grouped by the names.
=QUERY('Mamut inklipp'!C:R;"select F, sum(R) group by F";0)
This sorts by the names, but case sensitive. A-Z all comes before a-z. Therefore "Eve" comes before "adam". To me that is just plain wrong.
Is there a way to change the the sorting method?
You should be able to work around that. Pre-processing the data ('before the query') might be an option. Here's a little example.
I hope that works for you?
Note: Depending on your locale, you may have to use commas instead of semi-colons as argument separators (in the formula).

Jmeter : Removing Spaces using RegEx

Jmeter :
I am having a JSON from which I have to fetch value of "ci".
I am using the following RegEx : ci:\s*(.*?)\" and getting the following result RegEx tester:
Match count: 1
Match1[0]=ci: 434547"
Match1=434547
Issue is Match1[0] is having spaces because of which while running the load test it says
: Server Error - Could not convert JSON to Object
Need help is correcting this RegEx.
Basically, your RegEx is fine. This is the way I would look for it too, the first group (Match[1]) would give you 434613, which is the value you are looking for. As I don't know that piece of software you are using, I have no idea why using just that match doesn't work.
Here is an idea to work around that: if the value will always be the only numeric value in the string, you could simplify the RegEx to:
\d+
This will give you a numeric value that is at least 1 digit long. If there are other numeric values in the string though, but these have different lengths, try this:
\d{m,n} --> between m and n digits long
\d{n,} --> at least n digits long
\d{0,n} --> not more than n digits long
This is not as secure / reliable as the original RegEx (since it assumes some certain conditions), but it might work in your case, because you don't have to look for groups but just use the whole matched text. Tell me if it helped!

Ruby (on Rails) Regex: removing thousands comma from numbers

This seems like a simple one, but I am missing something.
I have a number of inputs coming in from a variety of sources and in different formats.
Number inputs
123
123.45
123,45 (note the comma used here to denote decimals)
1,234
1,234.56
12,345.67
12,345,67 (note the comma used here to denote decimals)
Additional info on the inputs
Numbers will always be less than 1 million
EDIT: These are prices, so will either be whole integers or go to the hundredths place
I am trying to write a regex and use gsub to strip out the thousands comma. How do I do this?
I wrote a regex: myregex = /\d+(,)\d{3}/
When I test it in Rubular, it shows that it captures the comma only in the test cases that I want.
But when I run gsub, I get an empty string: inputstr.gsub(myregex,"")
It looks like gsub is capturing everything, not just the comma in (). Where am I going wrong?
result = inputstr.gsub(/,(?=\d{3}\b)/, '')
removes commas only if exactly three digits follow.
(?=...) is a lookahead assertion: It needs to be possible to be matched at the current position, but it's not becoming part of the text that is actually matched (and subsequently replaced).
You are confusing "match" with "capture": to "capture" means to save something so you can refer to it later. You want to capture not the comma, but everything else, and then use the captured portions to build your substitution string.
Try
myregex = /(\d+),(\d{3})/
inputstr.gsub(myregex,'\1\2')
In your example, it is possible to tell from the number of digits after the last separator (either , or .) that it is a decimal point, since there are 2 lone digits. For most cases, if the last group of digits does not have 3 digits then you can assume that the separator in front is decimal point. Another sign is the multiple appearance of a separator in big numbers allows us to differentiate between decimal point and separators.
However, I can give a string 123,456 or 123.456 without any sort of context. It is impossible to tell whether they are "123 thousand 456" or "123 point 456".
You need to scan the document to look for clue whether , is used for thousand separator or decimal point, and vice versa for .. With the context provided, then you can safely apply the same method to remove the thousand separators.
You may also want to check out this article on Wikipedia on the less common ways to specify separators or decimal points. Knowing and deciding not to support is better than assuming things will work.

Using Regex to grab multiple values from a string and drop them into an array?

Trying to grab the two $ values and the X value from this string in Ruby/watir:
16.67%: $xxx.xx down, includes the Policy Fee, and x installments of $xxx.xx
So far I've got:
16.67%:\s+\$(\d+.\d{2})
which grabs the first xxx.xx fine, what do I need to add to it to grab the last two variables and load this all into an array?
You can use the following, but regex may be unnecessary if the surrounding text is always the same:
\$(\d+.\d{2}).*?(\d+) installments.*?\$(\d+.\d{2})
http://www.rubular.com/r/sk5wO3fyZF
if you know that the text in between will always be the same you could just:
16.67%:\s+\$(\d+.\d{2}) down, includes the Policy Fee, and x installments of (\d+.\d{2})
You better use scan.
sub(/.*%/, '').scan(/\$?([\d\.]+)/)
Have you considered just splitting the string on the $ character?, then manipulating what you get with a regex or basic string commands?
/\$(\d+.\d{2}).+\$(\d+.\d{2})/ should do it. it wont matter what text is there, only that there are two "$" in the sentence.

Resources