I working with some regular expression matching and I'm trying to figure out how you would exclude a specific character pattern. Specifically, I want to exclude the following pattern:
5 - #in words: digit, space, dash & space)
I know how to exclude the components individually: [^5 ^-] but I'm looking to exclude the specific pattern. Is this possible?
Update - I'm using Ruby as my programming language.
Here is some sample input and desired output.:
Input: 1 - Blue-Stork Stables; 2 - Young, Robert, S.; 3 - Seahorse Stable; 4 - Carney, Elvis; 5 - Guerrero, Juan, Carlos-Martin; 6 - Dubb, Michael; 7 - Summers, Hope; 8 - DTH Stables; 9 - Peebles, Matthew\n
the desired output would be:
Output: Blue-Stork Stables; Young, Robert, S.; Seahorse Stable; Carney, Elvis; Guerrero, Juan, Carlos-Marting; Dubb, Michael; Summers, Hope; DTH Stables; Peebles, Matthew\n
Please take note of the dashes on Blue-Stork Stables and Juan Carlos-Martin.
EDIT: So you mean "remove", not "exclude". No problem:
result = subject.gsub(/\d+ - /, '')
transforms your input into the desired output. I've taken the liberty to allow more than one digit (after all, if numbers reach 10 or higher, you probably want to remove those entirely, too. Right?).
(Old answer for "historical reasons")
Depending on what you mean by "exclude", it appears that you're looking for negative lookahead assertions:
^(?!.*\d - )
will fail on strings that contain 5 - anywhere and succeed on all other strings:
"5 - " // fail
"5 -" // match
"abc5 - xyz" // fail
"foobar5 - " // fail
Related
Record layout contains two fields:
Requistion
Test Names
Example record:
R00000001,"4 Calprotectin, 1 Luminex xTAG, 8 H. pylori stool antigen (IgA), 9 Lactoferrin, 3 Anti-gliadin IgA, 10 H. pylori Panel, 6 Fecal Fat, 11 Antibiotic Resistance Panel, 2 C. difficile Tox A/ Tox B, 5 Elastase, 7 Fecal Occult Blood, 12 Shigella"
The current Ruby code snippet that is used in the LIMS (Lab Info Management System) system is this:
subj.get_value('Tests').join(', ')
What I need to be able to do in the Ruby code snippet is create a new record off each comma-separated value in the second field.
NOTE:
the amount of values in the 'Test Names' field varies from 1 to 20...or more.
There can be 100's of Requistion records
Final result would be:
R00000001,"4 Calprotectin"
R00000001,"1 Luminex xTAG"
R00000001,"8 H. pylori stool antigen (IgA)"
R00000001,"9 Lactoferrin"
R00000001,"3 Anti-gliadin IgA"
R00000001,"10 H. pylori Panel"
R00000001,"6 Fecal Fat"
R00000001,"11 Antibiotic Resistance Panel"
R00000001,"2 C. difficile Tox A/ Tox B"
R00000001,"5 Elastase"
R00000001,"7 Fecal Occult Blood"
R00000001,"12 Shigella"
If your data is a reliable string which you've shown in your example, here's your method:
data = subj.get_value('Tests').join(', ') # assuming this gives your string obj.
def split_data(data)
arr = data.gsub('"','').split(',')
arr.map {|l| "#{arr[0]} \"#{l.strip}\""}[1..-1]
end
puts split_data(data)
I'm trying to write a Regex in Ruby for a shipping query.
If postcodes match MK1 - MK10, MK19, MK43, MK46 or MK77, then allow it.
If postcodes match NN1 - NN7, NN12, NN13, NN29 or NN77, then allow it.
If postcodes match MK11 - MK18 then don't allow it.
My trouble is that in the UK our postcodes are a bit funny where you can put MK1 1TS and MK11TS and they're considered the same. By not allowing MK11, MK11TY could be misread as MK11.
I've written a regex below, and so far it will disallow MK111TS and MK11\s1TS, and allow MK1\s1TS but not MK11TS. Any help would be greatly appreciated, I've only tested this for MK11 so far.
^((?!MK11\d).)*$&^((?!MK11\s\d).)*$|(MK(1 |2 |3 |4 |5 |6 |7 |8 |9 |10 ))|(MK19)|(MK43)|(MK46)|(MK77)|(NN1)|(NN2)|(NN3)|(NN4)|(NN5)|(NN6)|(NN7)|(NN12)|(NN13)|(NN29)|(NN77)
Thanks in advance.
r = /
(?: # begin non-capture group
MK # match characters
(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77) # match one of the choices
| # or
NN # match characters
(?:1|2|3|4|5|6|7|12|13|29|77) # match one of the choices
) # end non-capture group
(?![^\sA-Z]) # do not match a space or cap letter
/ix # case indifferent and free-spacing
# regex definition mode
This is conventionally written
r = /(?:MK(?:1|2|...|10|19|...|77)|NN(?:1|2|...|7|12|13|29|77))(?![^\sA-Z])/i
"MK4 abc def MK11MK19ghi NN6 jkl NN13 NN29NN77".scan(r)
# => ["MK4", "NN6", "NN13", "NN29", "NN77"]
"MK11" is not matched because "11" is not in the list. "MK19" is not matched because it is followed by a character that is neither a space nor a capital letter.
Alternatively, one could write
s = (['MK'].product(%w{1 2 3 4 5 6 7 8 9 10 19 43 46 77}).map(&:join) +
['NN'].product(%w{1 2 3 4 5 6 7 12 13 29 77}).map(&:join)).join('|')
# => "MK1|MK2|...|MK10|MK19|MK43|MK46|MK77|NN1|NN2|...|NN7|NN12|NN13|NN29|NN77"
r = /(?:#{s})(?![^\sA-Z])/i
#=> /(?:MK1|MK2|...|MK10|MK19|...|MK77|NN1|NN2|...|NN7|NN12|NN13|NN29|NN77)(?![^\sA-Z])/
If the remainder of the postal code is to be included in the regex, perhaps something like the following could be done.
suffixes = %w|ES AB CD EF|.join('|')
#=> "ES|AB|CD|EF"
Then replace (?![^\sA-Z])/x with the following.
\s? # optionally match a space
(?:#{suffixes}) # match a valid suffix in a non-capture group
(?!\S) # do not match a non-whitespace char (negative lookahead)
/ix # case-indifferent and free-spacing regex definition mode
Note the negative lookahead is satisfied if the suffix is at the end of the string.
Now I have written the following to match the postcodes format exactly:
#format: Area Code, Localities accepted, whitespace (MKor not), any digit, any single character, any single character
((MK|mk|Mk|mK)(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77)\s\d[A-Za-z][A-Za-z]) #with whitespace
|
((MK|mk|Mk|mK)(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77)\d[A-Za-z][A-Za-z]) #without whitespace
|
((NN|nn|Nn|nN)(?:1|2|3|4|5|6|7|12|13|29|77)\s\d[A-Za-z][A-Za-z]) #with whitespace
|
((NN|nn|Nn|nN)(?:1|2|3|4|5|6|7|12|13|29|77)\d[A-Za-z][A-Za-z]) #without whitespace
This works for my purposes, I got here using Cary's answer, which has been extremely helpful. Thank you and have marked up.
here is a little example:
02-09-17 1:01 PM - Some User (Add comments)
Hello,
How are you?
Regards,
02-09-17 3:29 PM - Another User (Add comments)
Hey,
Thanks, all is fine.
Some another text here.
02-09-17 4:30 AM - Just a User (Add comments)
some text
with
multiline
I want to parse and process this three comments. What is the best way for this?
Tried regex like this - http://www.rubular.com/r/k1CHJ1STTD but have problems with /m flag. Without multiline flag for regex - can`t catch "body" of comment.
Also tried to split by regex:
text_above.split(/^(\d{1,2}-\d{1,2}-\d{2} \d{1,2}:\d{1,2} [AP]M - .+ \(Add comments\))/)
=> ["",
"02-09-17 1:01 PM - Some User (Add comments)",
"\n" + "Hello,\n" + "\n" + "How are you?\n" + "\n" + "Regards,\n" + "\n",
"02-09-17 3:29 PM - Another User (Add comments)",
"\n" + "Hey,\n" + "\n" + "Thanks, all is fine.\n" + "\n" + "Some another text here.\n" + "\n",
"02-09-17 4:30 AM - Just a User (Add comments)",
"\n" + "some text\n" + "with\n" + "multiline\n" + "\n",
"02-09-17 5:29 PM - Another User (Add comments)",
"\n" + "Hey,\n" + "\n" + "Thanks, all is fine.\n" + "\n" + "Some another text here.\n" + "\n",
"02-09-17 6:30 AM - Just a User (Add comments)",
"\n" + "some text\n" + "with\n" + "multiline\n"]
But this is not comfortable solution.
Ideally I want to get regex captures with three or two group matches, for example:
1. 02-09-17 1:01 PM
2. Some User (Add comments)
3. Hello,
How are you?
Regards,
for each comment, or, Array of comments:
[['02-09-17 1:01 PM - Some User (Add comments) Hello,
How are you?
Regards,'],[...]]
Any ideas? Thanks.
You can keep it simple using two splits (one for the whole string and one for each block):
text.split(/\n\n(?=\d\d-)/).map { |m| m.split(/ - |\n/, 3) }
You can also use the scan method, but it's a little more fastidious:
text.scan(/([\d-]+[^-]+) - (.*)\n(.*(?>\n.*)*?(?=\n\n\d\d-|\z))/)
slice_before might be easier to understand than a huge scan, and it has the advantage of keeping the pattern (split removes it)
data = text.each_line.slice_before(/^\d\d\-\d\d\-\d\d/).map do |block|
time, user = block.shift.strip.split(' - ')
[time, user, block.join.strip]
end
p data
# [["02-09-17 1:01 PM",
# "Some User (Add comments)",
# "Hello,\n\nHow are you?\n\nRegards,"],
# ["02-09-17 3:29 PM",
# "Another User (Add comments)",
# "Hey,\n\nThanks, all is fine.\n\nSome another text here."],
# ["02-09-17 4:30 AM",
# "Just a User (Add comments)",
# "some text\nwith\nmultiline"]]
You can use this regular expression:
(\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM)) - (.*?)\r?\n((?:.|\r?\n)+?)(?=\r?\n\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM) - |$)
(\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM)) matches the first group, the date and time. The date must consist of three numbers, separated by a dash, followed by the time with AM/PM
(.*?)\r?\n((?:.|\r?\n)+?) matches the username up to the first line break (\r?\n) as the second group. Afterwards, anything including linebreaks is matching and building the third group, the comment.
This won't work, because it would handle everything from the beginning of the comment up to the end of the file as a comment. Therefore, you need to select the next date/time format, so that it stops there. You can do this just by repeating the date/time format after the comment and matching non-greedy, but this will include the next datetime already in the current match and therefore exclude it in the next match (which will lead to a skip of every second match). To circumvent this, you can use a positive lookahead: (?=\r?\n\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM) - |$). This matches a number afterwards, but does not include it in the match. The last comment must then end at the end of the string $.
You need to use the global flag /g but mustn't use the multi-line flag /g, because the matching of the comment goes over multiple lines.
Here is a live example: https://regex101.com/r/o63GQE/2
I am just new to scripting and I need some help. I have something like a bazillion files that look like this.
Assign F2 Height
3IleN 2.34025e+07
4PheN 2.05028e+07
6LysN 1.43672e+07
7ThrN 1.49120e+07
8LeuN 1.30838e+07
9ThrN 1.44298e+07
And i want it to look like this + save it in another file with the same name as the previous file however, with a "MOD" written at the beginning.
Number AA Height
3 IleN 6.20756e+07
4 PheN 5.26499e+07
7 ThrN 3.00216e+07
8 LeuN 3.26377e+07
9 ThrN 4.03901e+07
10 GlyN 2.73659e+07
12 ThrN 3.16319e+07
13 IleN 5.94604e+07
If you could please describe and explain the parameters used, that would be of great help.
Thanks!
The following should work for you:
sed 's/^\([0-9]*\)/\1 /' filename
Suppose I have a list in a text file which is as follows -
TaskB_115
TaskB_19
TaskB_105
TaskB_13
TaskB_10
TaskB_0_A_1
TaskB_17
TaskB_114
TaskB_110
TaskB_0_A_5
TaskB_16
TaskB_12
TaskB_113
TaskB_15
TaskB_103
TaskB_2
TaskB_18
TaskB_106
TaskB_11
TaskB_14
TaskB_104
TaskB_112
TaskB_107
TaskB_0_A_4
TaskB_102
TaskB_100
TaskB_109
TaskB_101
TaskB_0_A_2
TaskB_0_A_3
TaskB_116
TaskB_1_A_0
TaskB_111
TaskB_108
If I sort in vim with command %sort, it gives me output as -
TaskB_0_A_1
TaskB_0_A_2
TaskB_0_A_3
TaskB_0_A_4
TaskB_0_A_5
TaskB_10
TaskB_100
TaskB_101
TaskB_102
TaskB_103
TaskB_104
TaskB_105
TaskB_106
TaskB_107
TaskB_108
TaskB_109
TaskB_11
TaskB_110
TaskB_111
TaskB_112
TaskB_113
TaskB_114
TaskB_115
TaskB_116
TaskB_12
TaskB_13
TaskB_14
TaskB_15
TaskB_16
TaskB_17
TaskB_18
TaskB_19
TaskB_1_A_0
TaskB_2
But I would like to have the output as follows -
TaskB_0_A_1
TaskB_0_A_2
TaskB_0_A_3
TaskB_0_A_4
TaskB_0_A_5
TaskB_1_A_0
TaskB_2
TaskB_10
TaskB_11
TaskB_12
TaskB_13
TaskB_14
TaskB_15
TaskB_16
TaskB_17
TaskB_18
TaskB_19
TaskB_100
TaskB_101
TaskB_102
TaskB_103
TaskB_104
TaskB_105
TaskB_106
TaskB_107
TaskB_108
TaskB_109
TaskB_110
TaskB_111
TaskB_112
TaskB_113
TaskB_114
TaskB_115
TaskB_116
Note I just wrote this list to demonstrate the problem. I could generate the list in VIM. But I want to do it for other things as well in VIM.
With [n] sorting is done on the first decimal number
in the line (after or inside a {pattern} match).
One leading '-' is included in the number.
try this command:
sor n
and you don't need the %, sort sorts all lines if no range was given.
EDIT
as commented by OP, if you have:
TaskB_0_A_1
TaskB_0_A_2
TaskB_0_A_4
TaskB_0_A_3
TaskB_0_A_5
TaskB_1_A_0
you could try:
sor n /.*_\ze\d*/
or
sor nr /\d*$/
EDIT2
for newly edited question, this line may give you expected output based on your example data:
sor nr /\d*$/|sor n