How do I make this regular expression more general? - ruby

I'm using Ruby 1.8.7. I have a text file with following content:
"testhost-01.test.de|lan|ip-v4|cmk-agent|tcp|ip-v4-only|site:tir_projects|test|wato|/" + FOLDER_PATH + "/",
"testhost-02.test.de|lan|ip-v4|cmk-agent|tcp|ip-v4-only|site:tir_projects|prod|puppetagent|wato|/" + FOLDER_PATH + "/",
"testhost-03.test.de|wan|ip-v4|cmk-agent|tcp|ip-v4-only|site:tir_projects|prod|puppetagent|wato|/" + FOLDER_PATH + "/",
"testhost-04.test.de|ip-v4|cmk-agent|tcp|ip-v4-only|site:tir_projects|dmz|prod|puppetagent|wato|/" + FOLDER_PATH + "/",
"testhost-05.test.de|wan|ip-v4|cmk-agent|tcp|ip-v4-only|site:tir_projects|prod|puppetagent|wato|/" + FOLDER_PATH + "/",
"testhost-06.test.de|lan|ip-v4|cmk-agent|tcp|ip-v4-only|site:tir_projects|prod|wato|/" + FOLDER_PATH + "/",
"testhost-07.test.de|ip-v6|cmk-agent|tcp|site:tir_projects|ip-v6-only|dmz|prod|puppetagent|wato|/" + FOLDER_PATH + "/",
"testhost-08.test.de|ip-v4|snmp|snmp-only|ip-v4-only|critical|site:tir_projects|dmz|wato|/" + FOLDER_PATH + "/",
I'm trying to extract the hostnames (testhost-01.test.de - testhost-08.test.de) to an Array but only when "puppetagent" is in the same line.
The result should be:
[
"testhost-02.test.de",
"testhost-03.test.de",
"testhost-04.test.de",
"testhost-05.test.de",
"testhost-07.test.de"
]
Code Example:
path = "Textfile"
file = IO.read(path)
nodes = file.scan(/^"(.*)\|lan.*\|puppetagent/).flatten
This example above works only for lines where after the first pipe,
"lan" follows, so it only finds host 02.

If you don't want to restrict output to lines that include |lan, you can't include |lan in the expression. It looks like you want |lan to mark the end of your capture group - instead, you can restrict your capture group to not include | by using the character set [^|]. Then, even if the line doesn't include lan, you'll stop at the first |. After the |, you don't care about content until puppetagent, so we'll consume that with .*.
/^"([^|]*).*puppetagent/
In plain English, that's
^" Start with "
([^|]*) Capture anything that's not a |
.* Accept anything else on the line
puppetagent Require puppetagent to be present

Related

Parse multiline text with pattern

here is a little example:
02-09-17 1:01 PM - Some User (Add comments)
Hello,
How are you?
Regards,
02-09-17 3:29 PM - Another User (Add comments)
Hey,
Thanks, all is fine.
Some another text here.
02-09-17 4:30 AM - Just a User (Add comments)
some text
with
multiline
I want to parse and process this three comments. What is the best way for this?
Tried regex like this - http://www.rubular.com/r/k1CHJ1STTD but have problems with /m flag. Without multiline flag for regex - can`t catch "body" of comment.
Also tried to split by regex:
text_above.split(/^(\d{1,2}-\d{1,2}-\d{2} \d{1,2}:\d{1,2} [AP]M - .+ \(Add comments\))/)
=> ["",
"02-09-17 1:01 PM - Some User (Add comments)",
"\n" + "Hello,\n" + "\n" + "How are you?\n" + "\n" + "Regards,\n" + "\n",
"02-09-17 3:29 PM - Another User (Add comments)",
"\n" + "Hey,\n" + "\n" + "Thanks, all is fine.\n" + "\n" + "Some another text here.\n" + "\n",
"02-09-17 4:30 AM - Just a User (Add comments)",
"\n" + "some text\n" + "with\n" + "multiline\n" + "\n",
"02-09-17 5:29 PM - Another User (Add comments)",
"\n" + "Hey,\n" + "\n" + "Thanks, all is fine.\n" + "\n" + "Some another text here.\n" + "\n",
"02-09-17 6:30 AM - Just a User (Add comments)",
"\n" + "some text\n" + "with\n" + "multiline\n"]
But this is not comfortable solution.
Ideally I want to get regex captures with three or two group matches, for example:
1. 02-09-17 1:01 PM
2. Some User (Add comments)
3. Hello,
How are you?
Regards,
for each comment, or, Array of comments:
[['02-09-17 1:01 PM - Some User (Add comments) Hello,
How are you?
Regards,'],[...]]
Any ideas? Thanks.
You can keep it simple using two splits (one for the whole string and one for each block):
text.split(/\n\n(?=\d\d-)/).map { |m| m.split(/ - |\n/, 3) }
You can also use the scan method, but it's a little more fastidious:
text.scan(/([\d-]+[^-]+) - (.*)\n(.*(?>\n.*)*?(?=\n\n\d\d-|\z))/)
slice_before might be easier to understand than a huge scan, and it has the advantage of keeping the pattern (split removes it)
data = text.each_line.slice_before(/^\d\d\-\d\d\-\d\d/).map do |block|
time, user = block.shift.strip.split(' - ')
[time, user, block.join.strip]
end
p data
# [["02-09-17 1:01 PM",
# "Some User (Add comments)",
# "Hello,\n\nHow are you?\n\nRegards,"],
# ["02-09-17 3:29 PM",
# "Another User (Add comments)",
# "Hey,\n\nThanks, all is fine.\n\nSome another text here."],
# ["02-09-17 4:30 AM",
# "Just a User (Add comments)",
# "some text\nwith\nmultiline"]]
You can use this regular expression:
(\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM)) - (.*?)\r?\n((?:.|\r?\n)+?)(?=\r?\n\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM) - |$)
(\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM)) matches the first group, the date and time. The date must consist of three numbers, separated by a dash, followed by the time with AM/PM
(.*?)\r?\n((?:.|\r?\n)+?) matches the username up to the first line break (\r?\n) as the second group. Afterwards, anything including linebreaks is matching and building the third group, the comment.
This won't work, because it would handle everything from the beginning of the comment up to the end of the file as a comment. Therefore, you need to select the next date/time format, so that it stops there. You can do this just by repeating the date/time format after the comment and matching non-greedy, but this will include the next datetime already in the current match and therefore exclude it in the next match (which will lead to a skip of every second match). To circumvent this, you can use a positive lookahead: (?=\r?\n\d{2}-\d{2}-\d{2} \d{1,2}:\d{2} (?:AM|PM) - |$). This matches a number afterwards, but does not include it in the match. The last comment must then end at the end of the string $.
You need to use the global flag /g but mustn't use the multi-line flag /g, because the matching of the comment goes over multiple lines.
Here is a live example: https://regex101.com/r/o63GQE/2

Ruby XML Reading from one XML and parsing into another

XPath.each( xmldoc, "//speech/speaking") do |element|
# puts element.attributes['name']
# puts element.text
File.open(file_name + "_" + element.attributes['name'] + "-" + year + ".xml", 'a+') do |f|
f.write("<speaker>" + element.attributes['name'] + "</speaker>")
f.write("<speech>" + doc.xpath('//speech/speaking').text + "</speech>" + "\n")
end
end
Hello stackoverflow I am looking for help solving a logic issue I am having with XML files. The above code creates a file with the "speakers" name and then it should place what the speaker says into that file.
The problem that I am running into is that it places ALL of the speakers into the same file. So I am thinking the problem lies here:
f.write("<speech>" + doc.xpath('//speech/speaking').text + "</speech>" + "\n")
I am hoping that someone has a better way of doing this, but the idea would be to change the above code to:
doc.xpath('//speech/speaking').text WHERE speaker == element.attributes['name']
Ultimately I would like to have each speaker in their own XML file with their own speeches.
<speaking name="Mr. FAZIO">I appreciate my friend yielding.</speaking>
The above is a sample from the XML file.
The xpath you are looking for is:
doc.xpath("//speech/speaking[#name='#{element.attributes['name']}']").text
see XPath to select Element by attribute value

Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)

Let's say I have an array of Twitter account names:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
And a prepend and append variable:
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this:
tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday']
(The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.)
Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet.
It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far:
# Create one long string of #usernames separated by a space
tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ')
# alternative: tmp = '#' + twitter_accounts.join(' #')
# Number of characters left for mentioning the Twitter accounts
length = 140 - (prepend + append).length
# This method would split a string into multiple strings
# each with a maximum length of 'length' and it will only split on empty spaces (' ')
# ideally strip that space as well (although .map(&:strip) could be use too)
tweets = tmp.some_method(' ', length)
# Prepend and append
tweets.map!{|t| prepend + t + append}
P.S.
If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string:
arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
str = arr.map{|name|"##{name}"}.join(' ')
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
max_chars = 140 - prepend.size - append.size
until str.size <= max_chars do
p str.slice!(0, str.rindex(" ", max_chars))
str.lstrip! #get rid of the leading space
end
p str unless str.empty?
I'd make use of reduce for this:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
prepend = 'Check out these cool people:'
append = '#FollowFriday'
# Extra -1 is for the space before `append`
max_content_length = 140 - prepend.length - append.length - 1
content_strings = string.reduce([""]) { |result, target|
result.push("") if result[-1].length + target.length + 2 > max_content_length
result[-1] += " ##{target}"
result
}
tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" }
Which would yield:
"Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday"
"Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday"
"Check out these cool people: #example18 #example19 #example20 #FollowFriday"

How to insert line break in a return statement for D3

I have the following code d3 code:
tooltip.select("#popupCount").text(function(){
if (varToGraph == "rough_top_cost"){
return " " + textValue + ": $" + addCommas(allCountyData[countyName][varToGraph]) + "\n" +
"Count:"
}})
I want the word count to appear on a new line. However, the above code results in everything being on one line. How can I get the output to be on two lines?
Thanks,
AH
Untested answer, but FWIW this may get close;
tooltip.select("#popupCount").html(function(){
if (varToGraph == "rough_top_cost"){
return " " + textValue + ": <br/>$" + addCommas(allCountyData[countyName][varToGraph]) + "\n" +
"Count:"
}})
Working from the example provided on page 80 of D3 Tips and Tricks which includes tooltips with line breaks.
Uses html element instead of text which allows line breaks. Check out the document for more detail.

QUrl containing parentheses

Our application is a 32 bit application. When it is installed in windows 7 64bit, typically it installs at “C:\Program Files (x86)”, instead of “C:\Program Files”. We are constructing a Url based on the install location and pass it around as part of a web service. We are constructing the Url like this:
ppmPath = "http://" + ipAddress + ":13007/" + folder + ".ppm" + "?filePath="
+ applicationDirPath + "/" + FIRMWARE;
QUrl ppmURL( ppmPath, QUrl::TolerantMode );
ppmPath = QString( ppmURL.toEncoded() );
The variable types and meaning are usual.
Since “applicationDirPath” for Windows 7 64 bit contains one closing bracket “)” - in the “(x86)” substring – apparently the URL is broken. If we install it to any other location, it works perfectly, even though the location has any other special character.
How to deal with “)” character in the URL, so that is is not broken?
From the documentation it doesn't look like parentheses are automatically encoded by QUrl, even in tolerant mode. If you first wrap your URL in a QString and then replace all ( characters with "%28" and all ) characters with "%29" then it should behave like you expect.
QString ppmPath = QString("http://" + ipAddress + ":13007/" + folder + ".ppm" + "?filePath="
+ applicationDirPath + "/" + FIRMWARE);
QUrl ppmURL( ppmPath, QUrl::TolerantMode );
ppmPath = QString( ppmURL.toEncoded() );
ppmPath.replace(QChar('('), "%%28");
ppmPath.replace(QChar(')'), "%%29");
I'm not 100% sure the double-% needs to be there, but I remember having trouble with that in the past. Try it both ways.
Alternatively, you could try playing with QUrl::toPercentEncoding() and skip the constructor altogether. It appears to convert parentheses.
QUrl ppmURL(QString("http://" + ipAddress + ":13007/" + folder + ".ppm"), QUrl::TolerantMode );
QString filepath = QUrl::toPercentEncoding(applicationDirPath + "/" + FIRMWARE);
ppmUrl.addEncodedQueryItem("filepath", filepath.toLocal8Bit());
ppmPath = QString( ppmURL.toEncoded() );

Resources