Regex that doesn't accept spaces in between numbers - ruby

I am trying to parse lines like this:
0: abc 0.5
1: a 16.1,3
2: b 0.9,2.3
3: c -19.645
7:
which are in the format:
Number:[space][up to 4 letters from the range ABCD][space][comma separated numbers that could be decimal and/or negative]
with the ruby command below
if line=~ /^(\d*): [abcd]{0,4} ((\-?)(\d*).(\d*))*/) then
do x
else
do y
However, it also matches the strings below, which I don't want it to, since they have " " or ":" in between numbers instead of ",".
4: d 0.8 16.56
5: d 0.9:5.0
How can I modify my regex to make it work for only comma separators?
Edit: The Rubular link if you would like to edit the Regular Expression is as follows: http://rubular.com/r/8Z9Eeu27i5

If I understand correctly, this one should work:
^\d+:\s[a-zA-Z]+\s(-?(\d+\.)?\d+,)*(-?(\d+\.)?\d+)$
EDIT:
If 7:[space][space] is valid too, then use this one:
^\d+:\s[a-zA-Z]*\s(-?(\d+\.)?\d+,)*(-?(\d+\.)?\d+)?$

Related

Checking if a text file is formatted in a specific way

I have a text file which contains instructions. I'm reading it using File.readlines(filename). I want to check that the file is formatted as follows:
Has 3 lines
Line 1: two integers (including negatives) separated by a space
Line 2: two integers (including negatives) separated by a space and 1 capitalised letter of the alphabet also separated by a space.
Line 3: capitalised letters of the alphabet without any spaces (or punctuation).
This is what the file should look like:
8 10
1 2 E
MMLMRMMRRMML
So far I have calculated the number of lines using File.readlines(filename).length. How do I check the format of each line, do I need to loop through the file?
EDIT:
I solved the problem by creating three methods containing regular expressions, then I passed each line into it's function and created a conditional statement to check if the out put was true.
Suppose IO::read is used to return the following string str.
str = <<~END
8 10
1 2 E
MMLMRMMRRMML
END
#=> "8 10\n1 2 E\nMMLMRMMRRMML\n"
You can then test the string with a single regular expression:
r = /\A(-?\d+) \g<1>\n\g<1> \g<1> [A-Z]\n[A-Z]+\n\z/
str.match?(r)
#=> true
I could have written
r = /\A-?\d+ -?\d+\n-?\d+ -?\d+ [A-Z]\n[A-Z]+\n\z/
but matching an integer (-?\d+) is done three times. It's slightly shorter, and reduces the chance of error, to put the first of the three in capture group 1, and then treat that as a subexpression by calling it with \g<1> (not to be confused with a back-reference, which is written \k<1>). Alternatively, I could have use named capture groups:
r = /\A(?<int>-?\d+) \g<int>\n\g<int> \g<int> (?<cap>[A-Z])\n\g<cap>+\n\z/

What's the efficient way of checking the format of file by Ruby?

I have a file like:
Fruit.Store={
#blabla
"customer-id:12345,item:store/apple" = (1,2); #blabla
"customer-id:23456,item:store/banana" = (1,3); #blabla
"customer-id:23456,item:store/watermelon" = (1,4);
#blabla
"customer-id:67890,item:store/watermelon" = (1,6);
#The following two are unique
"customer-id:0000,item:store/" = (100, 100);
#
"" = (0,0)
};
Except the comments, each line has the same format: customer-id and item:store/ are fixed, and customer-id is a 5-digit number. The last two records are unique. How could I make sure the file is in the right format elegantly? I am thinking about using the flag for the first special line Fruit.Store={ and than for the following lines split each line by "," and "=", and if the splitted line is not correct, match them with the last two records. I want to use Ruby for it. Any advice? Thank you.
I am also thinking about using regular expression for the format, and wrote:
^"customer:\d{5},item:store\/\D*"=\(\d*,\d*\);
but I want to combine these two situations (with comment and without comment):
^"customer:\d{5},item:store\/\D*"=\(\d*,\d*\);$
^"customer:\d{5},item:store\/\D*"=\(\d*,\d*\);#.*$
how could I do it? Thanks
Using regular expressions could be a good option since each line has a fixed format; and you almost got it, your regex just needed a few tweaks:
(?:#.*|^"customer-id:\d{5},item:store\/\D*" *= *\(\d*, *\d*\); *(?:#.*)?)$
This is what was added to your current regex:
Option to be a comment line (#.*) or (|) a regular line (everything after |).
Check for possible spaces before and after =, after the comma (,) that separates the digits in parenthesis, and at the end of the line.
Option to include another comment at the end of the line ((?:#.*)?).
So just compare each line against this regex to check for the right format.

Ruby Regular Expression String Matching t =~ /^\d{2}(:\d{2}){2}$/ [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I found this from a code challenge:
def time_correct(t)
return unless t =~ /^\d{2}(:\d{2}){2}$/
end
it is used to find out whether e.g. "0;:44:07" is a regular time string ("HH:MM:SS") or not.
I don't understand the regex though. Can someone explain the /^\d{2}(:\d{2}){2}$/ to me please? Thanks!
On /^\d{2}(:\d{2}){2}$/:
/.../ delimiters the regex expression.
^ matches the start of the line, if on multi line mode, or the beginning of the string otherwise.
\d matches one digit
{2} states that the preceding statement \d must match 2 times.
(...) delimiters a capture group. It group things together as the usual math parenthesis concept and also allow you to you refer to them latter using \i, where i is the index of the group. Example, (a)(b), a is the group 1 and b is the group 2.
\d{2} just explained on the steps 3 and 4.
{2} the same as on the step 4, but here the preceding is the capture group (:\d{2}), which must repeat also 2 times.
$ matches the end of the line, if on multi line mode, or the end of the string otherwise.
If the multi line mode is enabled, your expression matches only things like:
22:33:44
02:33:44
But not as
22:33:44 d
d 22:33:44
f 02:33:44 f
If multi line is not enabled, your expression only matches a string containing a valid expression as:
22:33:44
But nothing, on a string with two valid lines:
22:33:44
02:33:44
This is a link for live testing: https://regex101.com/r/cdSdt4/1

How do I concatenate lines from a text file into one big string?

I have an input file that looks like(without such big spaces between lines):
3 4
ATCGA
GACTTACA
AACTGTA
ATC
...and I need to concatenate all lines except for the first "3 4" line. Is there a simple solution? I've tried manipulating getline() somehow, but that has not worked for me.
Edit: The amount of lines will not be known initially, so it will have to be done recursively.
If your concate 2 lines in 1 line then you can use easily concate "+",
e.g:
String a = "WAQAR MUGHAL";
String b = "check";
System.out.println(a + b);
System.out.println("WAQAR MUGHAL" + "CHECK");
Output:
WAQAR MUGHAL check
WAQAR MUGHAL CHECK

Regular expression to match my pattern of words, wild chars

can you help me with this:
I want a regular expression for my Ruby program to match a word with the below pattern
Pattern has
List of letters ( For example. ABCC => 1 A, 1 B, 2 C )
N Wild Card Charaters ( N can be 0 or 1 or 2)
A fixed word (for example “XY”).
Rules:
Regarding the List of letters, it should match words with
a. 0 or 1 A
b. 0 or 1 B
c. 0 or 1 or 2 C
Based on the value of N, there can be 0 or 1 or 2 wild chars
Fixed word is always in the order it is given.
The combination of all these can be in any order and should match words like below
ABWXY ( if wild char = 1)
BAXY
CXYCB
But not words with 2 A’s or 2 B’s
I am using the pattern like ^[ABCC]*.XY$
But it looks for words with more than 1 A, or 1 B or 2 C's and also looks for words which end with XY, I want all words which have XY in any place and letters and wild chars in any postion.
If it HAS to be a regex, the following could be used:
if subject =~
/^ # start of string
(?!(?:[^A]*A){2}) # assert that there are less than two As
(?!(?:[^B]*B){2}) # and less than two Bs
(?!(?:[^C]*C){3}) # and less than three Cs
(?!(?:[ABCXY]*[^ABCXY]){3}) # and less than three non-ABCXY characters
(?=.*XY) # and that XY is contained in the string.
/x
# Successful match
else
# Match attempt failed
end
This assumes that none of the characters A, B, C, X, or Y are allowed as wildcards.
I consider myself to be fairly good with regular expressions and I can't think of a way to do what you're asking. Regular expressions look for patterns and what you seem to want is quite a few different patterns. It might be more appropriate to in your case to write a function which splits the string into characters and count what you have so you can satisfy your criteria.
Just to give an example of your problem, a regex like /[abc]/ will match every single occurrence of a, b and c regardless of how many times those letters appear in the string. You can try /c{1,2}/ and it will match "c", "cc", and "ccc". It matches the last case because you have a pattern of 1 c and 2 c's in "ccc".
One thing I have found invaluable when developing and debugging regular expressions is rubular.com. Try some examples and I think you'll see what you're up against.
I don't know if this is really any help but it might help you choose a direction.
You need to break out your pattern properly. In regexp terms, [ABCC] means "any one of A, B or C" where the duplicate C is ignored. It's a set operator, not a grouping operator like () is.
What you seem to be describing is creating a regexp based on parameters. You can do this by passing a string to Regexp.new and using the result.
An example is roughly:
def match_for_options(options)
pattern = '^'
pattern << 'A' * options[:a] if (options[:a])
pattern << 'B' * options[:b] if (options[:b])
pattern << 'C' * options[:c] if (options[:c])
Regexp.new(pattern)
end
You'd use it something like this:
if (match_for_options(:a => 1, :c => 2).match('ACC'))
# ...
end
Since you want to allow these "elements" to appear in any order, you might be better off writing a bit of Ruby code that goes through the string from beginning to end and counts the number of As, Bs, and Cs, finds whether it contains your desired substring. If the number of As, Bs, and Cs, is in your desired limits, and it contains the desired substring, and its length (i.e. the number of characters) is equal to the length of the desired substring, plus # of As, plus # of Bs, plus # of Cs, plus at most N characters more than that, then the string is good, otherwise it is bad. Actually, to be careful, you should first search for your desired substring and then remove it from the original string, then count # of As, Bs, and Cs, because otherwise you may unintentionally count the As, Bs, and Cs that appear in your desired string, if there are any there.
You can do what you want with a regular expression, but it would be a long ugly regular expression. Why? Because you would need a separate "case" in the regular expression for each of the possible orders of the elements. For example, the regular expression "^ABC..XY$" will match any string beginning with "ABC" and ending with "XY" and having two wild card characters in the middle. But only in that order. If you want a regular expression for all possible orders, you'd need to list all of those orders in the regular expression, e.g. it would begin something like "^(ABC..XY|ACB..XY|BAC..XY|BCA..XY|" and go on from there, with about 5! = 120 different orders for that list of 5 elements, then you'd need more for the cases where there was no A, then more for cases where there was no B, etc. I think a regular expression is the wrong tool for the job here.

Resources