I'm trying to match the From line all the way to the end of the Subject line in the following:
....
From: XXXXXX
Date: Tue, 8 Mar 2011 10:52:42 -0800
To: XXXXXXX
Subject: XXXXXXX
....
So far I have:
/From:.*Date:.*To:.*Subject/m
But that doesn't match to the end of the subject line. I tried adding $ but that had no effect.
You can use the /m modifier to enable multiline mode (i.e. to allow . to match newlines), and you can use ? to perform non-greedy matching:
message = <<-MSG
Random Line 1
Random Line 2
From: person#example.com
Date: 01-01-2011
To: friend#example.com
Subject: This is the subject line
Random Line 3
Random Line 4
MSG
message.match(/(From:.*Subject.*?)\n/m)[1]
=> "From: person#example.com\nDate: 01-01-2011\nTo: friend#example.com\nSubject: This is the subject line"
See http://ruby-doc.org/core/Regexp.html and search for "multiline mode" and "greedy by default".
If you are using ruby, you can try :
Regexp.new("some reg", Regexp::MULTILINE)
If you are not using ruby, I suggest you hack this question:
replace all the "\n" with SOME_SPECIAL_TOKEN
search the regexp, and do other operations...
restore: replace SOME_SPECIAL_TOKEN with "\n"
If you want to match across linebreaks, one possibility is to first replace all newline characters with some other character (or character sequence) that wouldn't otherwise appear in the text. For example, if you have all of the text in one string variable you can do something like aString.split("\n").join("|") to replace all newlines in the string with pipe characters.
Also, look at Alan Moore's answer to your previous question regarding how to match the newline character in a regular expression.
Try:
/...^Subject:[^\n]*/m
Using the following data:
From: XXXXXX
Date: Tue, 8 Mar 2011 10:52:42 -0800
To: XXXXXXX
Subject: XXXXXXX
The following regex will do the magic:
From:([^\r\n]+)[\r\n]+Date:([^\r\n]+)[\r\n]+To:([^\r\n]+)[\r\n]+Subject:([^\r\n]+)[\r\n]+
But I would recommend that you don't try and do this in 1 regex. Push into a regex "^(\w+):(.+)$" line by line, unless you are sure that the sequence of the FROM/DATE/TO/SUBJECT is not going to change ;)
Related
I want an oracle query that will handle two things:
1) Only ONE space after any word. Remove any extra spaces if any after any word in the string.
2) Do not want to see any spaces if any, after any "dot or comma or colon or semi-colon" in the string. ZERO spaces after ". or , or ; or :"
example:
'My name is Pramod. I am writing a query, Today is AUG 16TH: 2019; X11.'
Requested output:
'My name is Pramod.I am writing a query,Today is AUG 16TH:2019;X11.'
Note: There are two spaces after My and two spaces after I,;,: and ,.
Something like this should work:
...
regexp_replace(input_string, '( |\.|,|:|;) +', '\1')
...
The stuff in parentheses in the second argument (the regular expression) is an alternation - it matches exactly one of the five characters space, period, comma, colon or semicolon. (Period must be escaped, of course.) You can include any punctuation characters there. The stuff in parentheses is also a subexpression which can be referenced by a backreference in the third argument to the function.
Find and read the documentation for Oracle regexp_replace() if you are unfamiliar with any of these concepts.
Using regexp_replace:
FSITJA#db01> select regexp_replace('My name is Pramod. I am writing . a query, Today is AUG 16TH: 2019; X11. abc', '([,;:. ]){1} +', '\1')
2 from dual;
REGEXP_REPLACE('MYNAMEISPRAMOD.IAMWRITING.AQUERY,TODAYISAUG16TH:2019;X
----------------------------------------------------------------------
My name is Pramod.I am writing .a query,Today is AUG 16TH:2019;X11.abc
I'm trying to create a multi-line string cleanly which includes backslashes and variable substitution. Originally, I did not need backslashes and was able to achieve what I needed with this snippet:
description=$(cat <<EOF
Creation time: $(date)
Creator: $test_group_creator
Test group: $test_group
Test execution: $test_execution_id - $test_name
Adapter: $initiator1_mac_addr
http://...
EOF
)
It's not ideal as it looks a little clunky at the end and requires I break the indentation level of the code, but it's pretty clear and represents how the description will appear on a webpage later. (I am aware of the <<-EOF syntax, but we are using 4-space indents in code and that syntax requires tabs.) Here is the variable echoed out:
Creation time: Tue Jun 27 15:22:27 PDT 2017
Creator: langlorx
Test group: all
Test execution: 13 - go
Adapter: eth0
link...
However, recently, the text has been also dropped into a Wiki parser and now line-breaks are collapsed to a single space. To force a line-break mid-paragraph, the line must end in a double backslash. Knowing that bash treats backslashes specially, I knew I had to double up on the double-backslashes, but it still didn't come out right. Using this:
description=$(cat <<EOF
Creation time: $(date)\\\\
Creator: $test_group_creator\\\\
Test group: $test_group\\\\
Test execution: $test_execution_id - $test_name\\\\
Adapter: $initiator1_mac_addr\\\\
http://...
EOF
)
I find that the newlines are completely gone in the resulting variable, and last \\\\ sequence before the blank line results in a single backslash in the variable with the blank line completely gone. Without any backslashes, the resulting string comes out exactly as written above with newlines, blank lines and with the variables substituted. Once I include any backslashes at the end of a line, it triggers a different set of rules in bash for encoding the variable.
Creation time: Tue Jun 27 15:20:17 PDT 2017\\Creator: user\\Test group: all\\Test execution: 12 - go\\Adapter: eth0\link...
Is there a better way to do this in bash than trying to abuse cat and command-substitution while still having the code somewhat clean?
Store the data in the format you want it, and then just postprocess as necessary. If you want to ignore leading spaces and append \\ to each line, just do that:
embed() { sed -e 's/^ *//; s/$/\\\\/'; }
description=$(embed << eof
Creation time: $(date)
Creator: foo
Test group: bar
etc
eof
)
printf '%s\n' "$description"
will give you:
Creation time: Tue Jun 27 15:33:31 PDT 2017\\
Creator: foo\\
Test group: bar\\
etc\\
I have some strings like 2015 - THIS Test and 2015 - THAT Test.
I want to have the part THIS Test or THAT Test so I tried this:
"2015 - THIS Test"[/((THIS|THAT)\s\.*)/]
But that only gives me THIS or THAT.
Why does it cut the rest?
How to get the desired substring correctly?
I don't want to rely on just cutting the first 7 characters.
You escaped the dot and it lost the meaning of any character but a newline and started to denote a literal . symbol. \.* matches zero or more literal dots.
Remove the \:
puts "2015 - THIS Test"[/((THIS|THAT)\s.*)/]
puts "2015 - THAT Test"[/((THIS|THAT)\s.*)/]
Result (see demo):
THIS Test
THAT Test
I have a csv file formatted like this:
Wed Mar 07 00:00:00 CET 2012,78.0
Thu Mar 08 00:00:00 CET 2012,46.0
...
I read it using standard input, and tried to match parts of each line using:
ARGF.each_line do |line|
time << line.scan(/^\w{3} (\w{3}) (\d{2}) (\d{2}:\d{2}:\d{2}) \w+ (\d{4}),([.\d]+)$/)
end
Which for some reason only returns the last line in the file.
If I copy the CSV file to a string variable, it starts to match each line correctly. If I remove the dollar sign from the regex it matches correctly as well, but I don't understand why $ doesn't work when reading from ARGF. Any ideas?
Is there a reason you have to use ARGF? You can check out the CSV class in the standard library, which gives tools to make the processing easier.
Here's an example that yields one row at a time to foreach. I would guess that this allows you to not worry about where lines begin or end:
require "csv"
CSV.foreach("path/to/file.csv") do |row|
time << line.scan(/^\w{3} (\w{3}) (\d{2}) (\d{2}:\d{2}:\d{2}) \w+ (\d{4}),([.\d]+)$/)
end
I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
only gets me the first line, i.e.
<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
I don't think it's my test data, but that's possible. What am I missing?
(ruby 1.9 on Ubuntu 10.10(
Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)
The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.
/^<DD>(.*)\n?(.*)\n/
Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.
^<DD>([^\n]*\n?[^\n]*)\n
I believe you need the multiline modifier in your code:
/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.
The following:
#!/usr/bin/env ruby
desc= '<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
prints
#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
on my system (Linux, Ruby 1.8.7).
Perhaps your line breaks are really \r\n (Windows style)? What if you try:
desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/