Regex: Grouping with OR - logic

I'm new here, so please don't scold me for misspellings etc.
What I need to do is to rename a bunch of files with a date in different formats at the beginning of their names, like:
05.07.2020-abc.pdf
2020.07.05-pqr.pdf
Instead of writing a different expression for each formatting, eg.
^(\d{2})\.(\d{2}).(\d{4})(.+) => $3-$2-$1$4
Example
02.11.2022-abc.pdf => 2022-11-02-abc.pdf
I'd like to do it in one fell swoop using the OR operator "|" but I have no idea how to formulate the groupings etc. Can one have nested groupings in regex?
Any ideas? Thank in advance!
#The fourth bird:
No (.+) needed. You're right, I condensed my actual expression and could have taken it out.
The different date 'formats' I mean are dd.mm.yyyy and yyyy.mm.dd respectively, and I need to convert both to yyyy-mn-dd
So,if the format is dd.mm.yyyy I have to flip the string, so to say, else I just need to replace the dots by hyphens.
The OS is Android, and for this operation I use Solid Explorer multi search & replace using regex.
I hope I made myself clear this time around ;-)

Related

Is there a way to change the way Google Sheets Query Group sorts? By both capitals and letter case?

I have a simple query function that returns a range of names and sums, grouped by the names.
=QUERY('Mamut inklipp'!C:R;"select F, sum(R) group by F";0)
This sorts by the names, but case sensitive. A-Z all comes before a-z. Therefore "Eve" comes before "adam". To me that is just plain wrong.
Is there a way to change the the sorting method?
You should be able to work around that. Pre-processing the data ('before the query') might be an option. Here's a little example.
I hope that works for you?
Note: Depending on your locale, you may have to use commas instead of semi-colons as argument separators (in the formula).

Ruby Regex: How to match (named) groups inside square brackets?

I'm trying to write a regex in Ruby that will parse various date/time formats. The entire regex looks like this:
/^(?<year>\d{4})\-(?<month>\d{2})\-(?<day>\d{2})(T(?<hour>\d{2})(:(?<minute>\d{2})(:(?<second>\d{2}(\.\d{1,3})?))?)?)?(?<offset>[+-]\d{2}:\d{2})?$/
I'm using named groups so that I can fetch the matching parts out of the match object just using the simple names like "year", "month", "day", etc. This regex is working fine, but let's focus on the "offset" at the end of this:
(?<offset>[+-]\d{2}:\d{2})?
The problem is that I'm trying to add the ability to interpret a "Z" on the end of the string to denote UTC time (aka Zulu Time). This "Z" should be mutually exclusive with the offset. Here's some of the ways I've tried it:
(?<offset>[Z([+-]\d{2}:\d{2})])?
(?<offset>[(Z)([+-]\d{2}:\d{2})])?
[(?<zulu>Z)(?<offset>[+-]\d{2}:\d{2})]?
None of these work. In the first two cases, it can interpret date strings ending in "Z", but it can no longer interpret date string ending with actual offsets like "-07:00". In the third case, the named groups "zulu" and "offset" are just totally missing from the match object.
I think this issue is because I'm trying use square brackets to denote [(ThisGroup)(OrThisGroup)]? but I don't think the regex engine appreciates having groups inside of square brackets. How do I tell the regex engine to allow and capture "group A or group B or neither, but not both"?
Square brackets are used for "exactly one of any of these characters" -- that's not what you need here. Pattern-level alternation is done via the | operator: (hello|goodbye) world will match either hello world or goodbye world.
(?<offset>Z|[+-]\d{2}:\d{2})?
Specifically to parse a datetime, though, I suggest preferring DateTime.parse (plus to_time, if you need a Time instance). And if that isn't sufficiently flexible, consider the chronic gem.

return line of strings between two strings in a ruby variable

I would like to extract a line of strings but am having difficulties using the correct RegEx. Any help would be appreciated.
String to extract: KSEA 122053Z 21008KT 10SM FEW020 SCT250 17/08 A3044 RMK AO2 SLP313 T01720083 50005
For Some reason StackOverflow wont let me cut and paste the XML data here since it includes "<>" characters. Basically I am trying to extract data between "raw_text" ... "/raw_text" from a xml that will always be formatted like the following: http://www.aviationweather.gov/adds/dataserver_current/httpparam?dataSource=metars&requestType=retrieve&format=xml&hoursBeforeNow=3&mostRecent=true&stationString=PHNL%20KSEA
However, the Station name, in this case "KSEA" will not always be the same. It will change based on user input into a search variable.
Thanks In advance
if I can assume that every strings that you want starts with KSEA, then the answer would be:
.*(KSEA.*?)KSEA.*
using ? would let .* match as less as possible.

Ruby (on Rails) Regex: removing thousands comma from numbers

This seems like a simple one, but I am missing something.
I have a number of inputs coming in from a variety of sources and in different formats.
Number inputs
123
123.45
123,45 (note the comma used here to denote decimals)
1,234
1,234.56
12,345.67
12,345,67 (note the comma used here to denote decimals)
Additional info on the inputs
Numbers will always be less than 1 million
EDIT: These are prices, so will either be whole integers or go to the hundredths place
I am trying to write a regex and use gsub to strip out the thousands comma. How do I do this?
I wrote a regex: myregex = /\d+(,)\d{3}/
When I test it in Rubular, it shows that it captures the comma only in the test cases that I want.
But when I run gsub, I get an empty string: inputstr.gsub(myregex,"")
It looks like gsub is capturing everything, not just the comma in (). Where am I going wrong?
result = inputstr.gsub(/,(?=\d{3}\b)/, '')
removes commas only if exactly three digits follow.
(?=...) is a lookahead assertion: It needs to be possible to be matched at the current position, but it's not becoming part of the text that is actually matched (and subsequently replaced).
You are confusing "match" with "capture": to "capture" means to save something so you can refer to it later. You want to capture not the comma, but everything else, and then use the captured portions to build your substitution string.
Try
myregex = /(\d+),(\d{3})/
inputstr.gsub(myregex,'\1\2')
In your example, it is possible to tell from the number of digits after the last separator (either , or .) that it is a decimal point, since there are 2 lone digits. For most cases, if the last group of digits does not have 3 digits then you can assume that the separator in front is decimal point. Another sign is the multiple appearance of a separator in big numbers allows us to differentiate between decimal point and separators.
However, I can give a string 123,456 or 123.456 without any sort of context. It is impossible to tell whether they are "123 thousand 456" or "123 point 456".
You need to scan the document to look for clue whether , is used for thousand separator or decimal point, and vice versa for .. With the context provided, then you can safely apply the same method to remove the thousand separators.
You may also want to check out this article on Wikipedia on the less common ways to specify separators or decimal points. Knowing and deciding not to support is better than assuming things will work.

Using Regex to grab multiple values from a string and drop them into an array?

Trying to grab the two $ values and the X value from this string in Ruby/watir:
16.67%: $xxx.xx down, includes the Policy Fee, and x installments of $xxx.xx
So far I've got:
16.67%:\s+\$(\d+.\d{2})
which grabs the first xxx.xx fine, what do I need to add to it to grab the last two variables and load this all into an array?
You can use the following, but regex may be unnecessary if the surrounding text is always the same:
\$(\d+.\d{2}).*?(\d+) installments.*?\$(\d+.\d{2})
http://www.rubular.com/r/sk5wO3fyZF
if you know that the text in between will always be the same you could just:
16.67%:\s+\$(\d+.\d{2}) down, includes the Policy Fee, and x installments of (\d+.\d{2})
You better use scan.
sub(/.*%/, '').scan(/\$?([\d\.]+)/)
Have you considered just splitting the string on the $ character?, then manipulating what you get with a regex or basic string commands?
/\$(\d+.\d{2}).+\$(\d+.\d{2})/ should do it. it wont matter what text is there, only that there are two "$" in the sentence.

Resources