I wonder how can I preserve consecutive newline characters with Ruby here-document? In my program all of them are collapsed to one newline. For example:
s=<<END
1
2
3
4
END
evaluates to:
s="1\n2\n3\n4\n"
However I would like to preserve the consecutive newlines when for example formatting a BBcode document a letter or something similar.
That looks like a bug to me. Have you tried a multiline %q?
s=%q(1
2
3
4
)
Related
I'm trying to make a replacement in one pass with just one Regular expression but I think this is not possible at all. I'm using RegexBuddy but I'm always getting a catastrophic result and the expression cannot be evaluated.
For this text:
3 bla bla! !
4 yep yep! ?
FROM HERE
5 something randdom here!
6 perhaps some HTML there
TO HERE
7 what ever you like over here
8 and that's all folks!enter code here
I want to find a REGEX that replaces the line breaks by something else, let's say $$, but only on the section "from here" "to here". So basically the end result would be this:
3 bla bla! !
4 yep yep! ?
FROM HERE$$
$$
5 something randdom here!$$
$$
6 perhaps some HTML there$$
$$
TO HERE
7 what ever you like over here
8 and that's all folks!
I have this expression
((FROM HERE))((.*)(\n))+(TO HERE)
But I'm stuck so far trying to replace just the \n group by something else. I have done similar things in the past so I would say this should be possible in one go.
If this is not possible in regex I would simply create a C# console app to take first that text to a string and then replace each \n by $$, then put it back. That shouldn't be that difficult.
If you are using .NET, one option could be
(?s)(?<=^FROM HERE\b.*?)\r?\n\r?\n(?=.*?^TO HERE\b)
(?s) Inline modifier, dot matches a newline
(?<=^FROM HERE\b.*?) Assert FROM HERE at the left at the start of the line
\r?\n\r?\n Match 2 newlines
(?=.*?^TO HERE\b) Assert TO HERE at the right at the start of the line
In the replacement use (with double escapes $$)
\n$$$$\n
See a .NET regex demo and a C# demo.
Let's say I have a list of URLs separated by a white space with their corresponding titles.
http://url1.com/qfwarsas/ gb_title 1 - 1
http://url2.com/arsas/ xe_title 2 - 2
http://url3.com/qfsas ah_title 3 - 3
I'm trying to sort the lines by the titles to look like this:
http://url3.com/qfsas ah_title 3 - 3
http://url1.com/qfwarsas/ gb_title 1 - 1
http://url2.com/arsas/ xe_title 2 - 2
I can do it by running a simple macro to copy out the first letter of each title to the front of the line, then ctrl+v sort the blocks, then remove the first letters of each line. I wonder if there's a way to do it using regex and visual block selection?
Regex to get title first letters selection is
:s/\v[^ ]* (.)/\1/
but when i try to convert that into visual block selection i'm running into issues.
Any ideas?
If your separator is a white space, you can use
:sort / /
The default behavior of :sort using a search pattern is to sort on whatever follows the match.
^(?=(.*\d){4,})(?=(.*[A-Z]){3})(?!\s)(?=.*\W{2,})(?=(.*[a-z]){2,}).{12,14}$
The RegExp above is trying to:
match at least 4 digits - (?=(.*\d){4,})
match exactly 3 upper case letters - (?=(.*[A-Z]){3})
don't match spaces - (?!\s)
match at least 2 non-word characters - (?=.*\W{2,})
match at least 2 lower - (?=(.*[a-z]){2,})
string must be between 12 and 14 in length - .{12,14}
But I am having a challenge getting this to avoid matching spaces. It seems like because \W also includes spaces, my preceding negative look-ahead on spaces is being ignored.
For example:
b4A#Ac33*8Pd -- should match
b4A#Ac3 3*8Pd -- should not match
rubular link
Edited to provide further clarification:
Basically, I am trying to avoid having to spell out all the characters in the POSIX [:punct:] class ie !"#$%&'()*+,./:;<=>?#\^_\{|}~-` .. that is why I had a need to use \W .. But I would also want to exclude spaces
I can use a second pair of eyes, and more experienced suggestions here ..
Edited again, to correct mix-ups in counts specified in sub-patterns, as pointed out in the accepted answer below.
Instead of using dot ., use non spaces \S:
^(?=(.*\d){3,})(?=(.*[A-Z]){2})(?=.*\W{1,})(?=(.*[a-z]){1,})\S{12,14}$
// here ___^^
And is this a typo match at least 4 digits - (?=(.*\d){3,}),
it should be:
match at least 3 digits - (?=(.*\d){3,})
or
match at least 4 digits - (?=(.*\d){4,})
Same for other counts.
I am attempting to seperate blocks of Japanese text into individual sentences using regex. Right now I'm mostly experimenting on rubular but here is what I have so far.
regex: /(.*?(。|?|!))/
sample text
強面のため周囲の人から敬遠されている主人公が、クラスメイトと共通の話題を持とうとVRMMORPG「アナザーワールド」のベータテストに申し込んだ。ところが当選したのは彼一人。しかたなくひとりでゲーム内の仮想世界「イストピア」に「ケイオス」と名乗って乗り込んだが、そこはゲームでありながら五感すべてを体感でき、現実と間違えるほどのリアルな世界だった。サポートAIのテミスの協力を得つつ、クエストをこなしていったが、実はそこは本物の異世界「イストピア」であり、ケイオスのこなしたクエストによって、多くの人が影響を受けて……というお話。その戯言、聞き飽きたわ!あれ、ここにあった筆入れはどこにやったの?
The results im getting are correct however it is also separately matching the punctuation characters
How can I improve my regular expression so that the punctuation mark isn't separately matched?
Using (.*?[。?!]) seems to do the trick, check on rubular
Match 1
1. 強面のため周囲の人から敬遠されている主人公が、クラスメイトと共通の話題を持とうとVRMMORPG「アナザーワールド」のベータテストに申し込んだ。
Match 2
1. ところが当選したのは彼一人。
Match 3
1. しかたなくひとりでゲーム内の仮想世界「イストピア」に「ケイオス」と名乗って乗り込んだが、そこはゲームでありながら五感すべてを体感でき、現実と間違えるほどのリアルな世界だった。
Match 4
1. サポートAIのテミスの協力を得つつ、クエストをこなしていったが、実はそこは本物の異世界「イストピア」であり、ケイオスのこなしたクエストによって、多くの人が影響を受けて……というお話。
Match 5
1. その戯言、聞き飽きたわ!
Match 6
1. あれ、ここにあった筆入れはどこにやったの?
What about this?
str.scan /[\p{Han}\p{Katakana}\p{Hiragana}\p{Hangul}[[:punct:]]]+/
=> ["強面のため周囲の人から敬遠されている主人公が、クラスメイトと共通の話題を持とうと",
"「アナザ",
"ワ",
"ルド」のベ",
"タテストに申し込んだ。ところが当選したのは彼一人。しかたなくひとりでゲ",
"ム内の仮想世界「イストピア」に「ケイオス」と名乗って乗り込んだが、そこはゲ",
"ムでありながら五感すべてを体感でき、現実と間違えるほどのリアルな世界だった。サポ",
"ト",
"のテミスの協力を得つつ、クエストをこなしていったが、実はそこは本物の異世界「イストピア」であり、ケイオス のこなしたクエストによって、多くの人が影響を受けて……というお話。その戯言、聞き飽きたわ!あれ、ここにあった筆入れはどこにやったの?"]
http://rubular.com/r/8CtYuV8AAl
I need to filter all lines with words starting with a letter followed by zero or more letters or numbers, but no special characters (basically names which could be used for c++ variable).
egrep '^[a-zA-Z][a-zA-Z0-9]*'
This works fine for words such as "a", "ab10", but it also includes words like "b.b". I understand that * at the end of expression is problem. If I replace * with + (one or more) it skips the words which contain one letter only, so it doesn't help.
EDIT:
I should be more precise. I want to find lines with any number of possible words as described above. Here is an example:
int = 5;
cout << "hello";
//some comments
In that case it should print all of the lines above as they all include at least one word which fits the described conditions, and line does not have to began with letter.
Your solution will look roughly like this example. In this case, the regex requires that the "word" be preceded by space or start-of-line and then followed by space or end-of-line. You will need to modify the boundary requirements (the parenthesized stuff) as needed.
'(^| )[a-zA-Z][a-zA-Z0-9]*( |$)'
Assuming the line ends after the word:
'^[a-zA-Z][a-zA-Z0-9]+|^[a-zA-Z]$'
You have to add something to it. It might be that the rest of it can be white spaces or you can just append the end of line.(AFAIR it was $ )
Your problem lies in the ^ and $ anchors that match the start and end of the line respectively. You want the line to match if it does contain a word, getting rid of the anchors does what you want:
egrep '[a-zA-Z][a-zA-Z0-9]+'
Note the + matches words of length 2 and higher, a * in that place would signel chars too.