Ruby Regex to match multiple portions of a string - ruby

Using Ruby: ruby 1.9.3dev (2011-09-23 revision 33323) [i686-linux]
I have the following string:
str = 'Message relates to activity TU4 Sep 5 Activity 1 <img src="/images/layout/placeholder.png" width="222" height="149"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1.'
I want to match the following:
35 (a number which is part of href attribute value)
TU4 Sep 5 Activity (the text for tag)
First question from Manager on TU4 Sep 5 Activity 1. (the remaining text after last <br/><br/> tags)
For achieving the same I have written the following regex
result = str.match(/<a href="\/activities\/(?<activity_id>\d+)">(?<activity_title>.*)<\/a>.*<br\/><br\/>(?<message>.*)/)
This produces following result:
#<MatchData "TU4 Sep 5 Activity 1 <img src=\"/images/layout/placeholder.png\" width=\"222\" height=\"149\"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1."
activity_id:"35"
activity_title:"TU4 Sep 5 Activity 1"
message:"First question from Manager on TU4 Sep 5 Activity 1.">
But I guess this is not efficient.
Is it possible that somehow only the required values(as mentioned above under what I want to match) is returned in the matched result and the following
value gets excluded from matched result:
"TU4 Sep 5 Activity 1 <img src=\"/images/layout/placeholder.png\" width=\"222\" height=\"149\"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1."
Thanks,
Jignesh

The appropriate way to do this is NOT to use regexen. Instead, use the Nokogiri library to easily parse your html:
require 'nokogiri'
doc = Nokogiri::HTML.parse(str)
activity_id = doc.css('[href^="/activities"]').attr('href').value[/\d+$/]
activity_title = doc.css('[href^="/activities"]')[0].inner_text
message = doc.search("//text()").last
This will do exactly what your regexp was attempting, with much lower chance of random failure.

Related

console output of the current calendar month in Ruby

I need to output to the console the calendar of the current month in Ruby. The result should be similar to ncal on UNIX-like systems. I found a solution for C ++ but can't adapt for Ruby. So far, I only realized that I need to use nested loops to output the height and width. Tell me in which direction to move?
require 'date'
days = %w[Mun Tue Wed Thu Fri Sat Sun]
puts " #{Date::MONTHNAMES[Date.today.month]} #{Date.today.year}"
i = 0
start_month = (Date.today - Date.today.mday + 1).strftime("%a")
while i < days.size
print days[i]
j = 1
while j <= 31
if days[i] == start_month
print " #{j}"
end
j += 7
end
i += 1
puts
end
I'll take your solution so far, and try to give some specific pointers for how to progress with it - but of course, there are many different ways to approach this problem in general, so this is by no means the only approach!
The first critical issue (as you're aware!) is that you're only printing things for the row starting on the 1st of the month, due to this line:
if days[i] == start_month
Sticking with the current overall design, we know we'll need to print something for every line, so clearly a conditional like this isn't going to work. Let's try removing it.
Firstly, it will be more convenient to know which day of the week the month started on as a number, not a string, so we can easily calculate offsets against another day. Let's do that with:
# e.g. for 1st July 2021 this was a Thursday, so we get `4`.
start_of_month_weekday = (Date.today - Date.today.mday + 1).cwday
Next (and this is the crucial step!), we can use the above information to find out "which day of the month is it, on this day of the week?"
Here a first version of that calculation, incorporated into your solution so far:
require 'date'
days = %w[Mon Tue Wed Thu Fri Sat Sun]
puts " #{Date::MONTHNAMES[Date.today.month]} #{Date.today.year}"
i = 0
# e.g. for 1st July 2021 this was a Thursday, so we get `4`.
start_of_month_weekday = (Date.today - Date.today.mday + 1).cwday
while i < days.size
print days[i]
day_of_month = i - start_of_month_weekday + 2 # !!!
while day_of_month <= 31
print " #{day_of_month}"
day_of_month += 7
end
i += 1
puts
end
This outputs:
July 2021
Mon -2 5 12 19 26
Tue -1 6 13 20 27
Wed 0 7 14 21 28
Thu 1 8 15 22 29
Fri 2 9 16 23 30
Sat 3 10 17 24 31
Sun 4 11 18 25
Not bad! Now we're getting somewhere!
I'll leave you to figure out the rest 😉 .... But here are some clues, for what I'd tackle next:
This code, print " #{day_of_month}", needs to print a "blank space" if the day number is less than 1. This could be done with a simple if statement.
Similarly, since you want this calendar to line up neatly in a grid, you need this code to always print a something two characters wide. sprintf is your friend here! Check out the "Examples of width", about halfway down the page.
You've hardcoded 31 for the number of days in the month. This should be fixed, of course. (Use the Date library!)
It's funny how you used strftime("%a") in one place, yet constructed the calendar title awkwardly in the line above! 😄 Take a look at the documentation for formatting dates; it's extremely flexible. I think you can use: Date.today.strftime("%B %Y").
If you'd like to add some colour (or background colour?) to the current day of the month, consider doing something like this, or use a library to assist.
Using while loops works OK, but is quite un-rubyish. In 99% of cases, ruby has even better tools for the job; it's a very expressive language - iterators are king! (I'm guessing you first learned another language, before ruby? Seeing while loops, and/or for loops, is a dead giveaway that you're more familiar with a different language.) Instead of the outer while loop (while i < days.size), you could use days.each_with_index. And instead of the inner while loop (while j < 31), you could use day_of_month.step(31, 7) (how cool is that!!).
This is one way:
Construct a one-dimensional array, beginning with the daynames (Mon Tue ...).
Figure out a way to determine with how many "blanks" the month starts (these are days from the previous month. wday might help). Attach that amount of empty strings to the array.
Determine how many days the month has (hint Date.new(2021,7,-1), and attach all these daynumbers to the array.
Attach empty strings to the array until the size of the array is divisible by 7 (or better, calculate). Skip this if you're skipping the last bullet.
Convert all elements of this array to right-adjusted strings of size 3 or some-such.
Use each_slice(7) to slice the array into weeks.
If desired, transpose this array of week-slices to mimic the ncal output.
Thank you for your help, literally 10 hours and I figured it out thanks to you. I apologize once again for the initially incorrectly posed question.
With the help of hints, I assembled such a solution.
require 'date'
days = %w[Mon Tue Wed Thu Fri Sat Sun]
p days
blanks = Date.new(2021,7,1).wday - 1
blanks.times do
days.push(' ')
end
days_in_month = Date.new(2021, 7, -1).day
days_in_month
day = 1
while day <= days_in_month
days.push(day)
day += 1
end
unless (days.size % 7) == 0
days.push(' ')
end
days.join(', ')
new_arr = days.each_slice(7).to_a
puts"Массив дней: #{new_arr}"
for i in 0...7
for j in 0...new_arr.size
print " #{new_arr[j][i]}"
end
puts
end
require 'date'
# init
DAYS_ORDER = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
today = Date.today
month = today.month
year = today.year
first_day = Date.new(year, month, 1)
last_day = Date.new(year, month, -1)
hash_days = {}
# get all current months days and add to hash_days
first_day.upto(last_day) { |day| hash_days[day.day] = day.strftime('%a') }
# group by wday
grouped_hash = hash_days.group_by { |day| day.pop }.transform_values { |days| days.flatten }
# sort by wday from DAYS_ORDER
sorted_arr = grouped_hash.sort_by { |k, v| DAYS_ORDER.index(k) }
# rendering current month's calendar with mark current day
## title
print "\x1b[4m#{today.strftime("%B %Y")}\x1b[0m\n"
## calendar
indent = true
sorted_arr.each do |wday, days|
print wday
if days[0] != 1 && indent == true
print " "
else
indent = false
end
days.each do |value|
spaces = " " * (value > 9 ? 1 : 2)
str_day = spaces + value.to_s
current_day = "\x1b[1;31m#{str_day}\x1b[0m"
print value == today.day ? current_day : str_day
end
puts
end
view

Create monthly trigger for Scheduled Task in Powershell (With additional criteria)

I'm currently working on a script that when run, creates some Scheduled tasks that makes the host machine do several things and then restart within a specified time span.
This script needs to be run on multiple domain controllers, and therefor i would like to "load balance" by using something like New-ScheduledTaskTrigger -RandomDelay in order for them to not reboot all at once, but kind of spread it out.
The goal is to be able to change some variables of when to restart, things like:
First Monday of the month between 18:00 and 23:59
Every Thursday between 01:00 and 06:00
Every day between 04:00 and ..... you see where I'm going
However there is no such thing as a "-Monthly" in New-ScheduledTaskTrigger
That's the first problem, this one i can probably solve with the help from other posts, but if i do it for example like this I'm not able to use the -RandomDelay which I think is a major feature for this to work.
Here is how I imagine it should look if the -Monthly did work (for a monthly trigger):
$rebootFrequency = MONTHLY # DAILY, WEEKLY, MONTHLY
$rebootWeek = FIRST # FIRST, SECOND, THIRD, FOURTH, LAST
$rebootDayOfWeek = MON # MON, TUE, WED, THU, FRI, SAT, SUN
$rebootTimeFrom = 10:00 # HH:MM[:SS]
$rebootTimeTo = 16:00 # HH:MM[:SS]
New-ScheduledTaskTrigger -"$rebootFrequency" -WeekOfMonth $rebootWeek;
-DayOfWeek $rebootDayOfWeek -At $rebootTimeFrom -RandomDelay $rebootTimeTo
Do you have any suggestions as to how I should solve this problem?
I could do the same thing with schtask.exe, however I would end up having to make some kind of script to do the "RandomDelay" function.
Feel free to ask further if you have any questions.
Thanks in advance.
Challenge 1
I've now got it to work, but I'm trying to make the script a bit more intuitive, but I can't figure out how i would do it...
What i want to do is to "convert" from using the numbers in days (for example: 16 for Thursday) to being able to write "THU" instead.
Right now it looks something like this:
$rebootDaysOfWeek = "16" # SUN=1, MON=2, TUE=4, WED=8, THU=16 etc.
$trigger.DaysOfWeek = $rebootDaysOfWeek
But I would find it alot cooler if it was something like this:
$rebootDaysOfWeek = "THU" # SUN, MON, TUE, WED, THU, FRI, SAT
$trigger.DaysOfWeek = $rebootDaysOfWeek
But I can't seem to find a way to "convert" $rebootDaysOfWeek to work with the bit mask.
Check out the Microsoft Docs:
https://learn.microsoft.com/en-us/windows/win32/taskschd/time-trigger-example--scripting-
The sample is in VB, but it looks like it's just a ComObject. I haven't had enough time to play around, but you can start like this:
$service = new-object -comobject Schedule.Service
$service.connect()
$taskdefinitiion = $service.NewTask(0)
There's lots of task definition stuff, but it get's down to the triggers and you'll do this:
$triggers = $taskDefinition.Triggers
$trigger = triggers.Create(5) # I had to try different numbers here, didn't dig through the docs
$trigger.DaysOfWeek = 16 #Thursday
$trigger.WeeksOfMonth = 1 # First week, 2 for second, 6 for third, 8 for forth
$trigger.MonthsOfYear = 4095 # all months
$trigger.RandomDelay = 'PT1H' # 1 hour random delay.
I'll let you take it from here. Links to some of the items above:
https://learn.microsoft.com/en-us/windows/win32/taskschd/monthlydowtrigger-daysofweek
https://learn.microsoft.com/en-us/windows/win32/taskschd/monthlydowtrigger-monthsofyear
https://learn.microsoft.com/en-us/windows/win32/taskschd/monthlydowtrigger-weeksofmonth
https://learn.microsoft.com/en-us/windows/win32/taskschd/monthlydowtrigger-randomdelay
UPDATE FOR CHALLENGE 1
In order to use "friendly" references to the bitwise decimal value you can either create a constants section or use hashtable, either way you are going to have to do the conversion yourself:
# Constants
$SUN = 1
$MON = 2
$TUE = 4
$WED = 8
$THU = 16
$FRI = 32
$SAT = 64
# Hashtable - because why not!
$DaysOfWeek = #{
SUN = 1
MON = 2
TUE = 4
WED = 8
THU = 16
FRI = 32
SAT = 64
}
Then you can use:
$trigger.DaysOfWeek = $THU
or
$trigger.DaysOfWeek = $DaysOfWeek["THU"]

How to add zero to single digit in a string

I need to to append zero in my below string whenever I get date with single digit without changing Quantity digit (below string is system generated in my application not created by user),
Data Added Quantity:1 on Dec 9 2015 modified on Jun 7 2016
I need to change this string just like below,
Data Added Quantity:125 on Dec 09 2015 modified on Jun 07 2016
So far I have tried the below regular expression, but not getting desired output.
str = "Data Added Quantity:1 on Dec 9 2015 modified on Jun 7 2016"
Set oReg = New RegExp
oReg.Pattern = "\s\d{1}\s"
Set obj = oReg.Execute(str)
For i = 0 To obj.Count-1
mD = obj.Item(i).Value
oReg.Replace(str, "0" & mD)
Next
How we can achieve this using VBScript?
If you adjust the pattern a little and set the Global option to True you can simply use the Replace method. No need to Excecute and loop.
Set re = New RegExp
re.Pattern = "\s(\d)\s"
re.Global = True
str = re.Replace(str, " 0$1 ")
\d without a modifier already matches exactly one digit, so \d{1} is redundant. The parentheses around the \d define a capturing group that allows you to use the matched substring in the replacement ($1).

Splitting a String with Pig

I have a String in the following format :
Sat, 09 Jul 2011 05:38:24 GMT
I would have an output like this :
09 Jul 2011
05:38:24
Thanks.
[EDIT]
I have tried many solutions, I have had errors. I will re-explain the problem. I have an XML file where I have a node : Tue, 05 Jul 2011 10:10:30 GMT from which I would like to extract two separated String as illustrated above.
I have tried this code:
register /usr/lib/pig/piggybank.jar;
items = LOAD ' depeche/2011_7_10_12_30_rss.txt' USING org.apache.pig.piggybank.storage.XMLLoader('item') AS (item:chararray);
source_name = FOREACH items GENERATE REGEX_EXTRACT(item, '<link>(.*)</link>', 1) AS link:chararray,
REGEX_EXTRACT(item, '<title>(.*)</title>', 1) AS title:chararray,
REGEX_EXTRACT(item, '<description>(.*)</description>', 1) AS description:chararray,
REGEX_EXTRACT(item, '<pubDate>(.*)</pubDate>', 1) AS pubdate:chararray,
sortie = FOREACH pubdate GENERATE SUBSTRING((chararray)$0, 4, 25);
illustrate sortie;
error:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 21, column 333> mismatched input '=' expecting SEMI_COLON
EDITED ANSWER:
That example is a bit more clear ... I grabbed an RSS feed example, and did a quick test. The code below worked using a sample which contained all of the elements in your example above. I used REGEX_EXTRACT instead of SUBSTRING to get the pubdate, however.
--rss.pig
REGISTER piggybank.jar
items = LOAD 'rss.txt' USING org.apache.pig.piggybank.storage.XMLLoader('item') AS (item:chararray);
data = FOREACH items GENERATE REGEX_EXTRACT(item, '<link>(.*)</link>', 1) AS link:chararray,
REGEX_EXTRACT(item, '<title>(.*)</title>', 1) AS title:chararray,
REGEX_EXTRACT(item, '<description>(.*)</description>', 1) AS description:chararray,
REGEX_EXTRACT(item, '<pubDate>.*(\\d{2}\\s[a-zA-Z]{3}\\s\\d{4}\\s\\d{2}:\\d{2}:\\d{2}).*</pubDate>', 1) AS pubdate:chararray;
dump data;
--rss.txt
<rss version="2.0">
<channel>
<title>News</title>
<link>http://www.hannonhill.com</link>
<description>Hannon Hill News</description>
<language>en-us</language>
<pubDate>Tue, 10 Jun 2003 04:00:00 GMT</pubDate>
<generator>Cascade Server</generator>
<webMaster>webmaster#hannonhill.com</webMaster>
<item>
<title>News Item 1</title>
<link>http://www.hannonhill.com/news/item1.html</link>
<description>Description of news item 1 here.</description>
<pubDate>Tue, 03 Jun 2003 09:39:21 GMT</pubDate>
<guid>http://www.hannonhill.com/news/item1.html</guid>
</item>
<item>
<title>News Item 2</title>
<link>http://www.hannonhill.com/news/item2.html</link>
<description>Description of news item 2 here.</description>
<pubDate>Fri, 30 May 2003 11:06:42 GMT</pubDate>
<guid>http://www.hannonhill.com/news/item2.html</guid>
</item>
<item>
<title>News Item 3</title>
<link>http://www.hannonhill.com/news/item3.html</link>
<description>Description of news item 3 here.</description>
<pubDate>Tue, 20 May 2003 08:56:02 GMT</pubDate>
<guid>http://www.hannonhill.com/news/item3.html</guid>
</item>
</channel>
</rss>
Results for rss.pig:
(http://www.hannonhill.com/news/item1.html,News Item 1,Description of news item 1 here.,03 Jun 2003 09:39:21)
(http://www.hannonhill.com/news/item2.html,News Item 2,Description of news item 2 here.,30 May 2003 11:06:42)
(http://www.hannonhill.com/news/item3.html,News Item 3,Description of news item 3 here.,20 May 2003 08:56:02)
ORIGINAL ANSWER:
There are several methods that would work here, so I'll cover two: SUBSTRING and REGEX_EXTRACT.
If your string length is constant, then you can use the builtin SUBSTRING function. Think of it like the cut command in Linux.
OUTPUT = FOREACH INPUT GENERATE SUBSTRING((chararray)$0, 4, 25);
Otherwise, you can use the builtin REGEX_EXTRACT to pull the string that you're looking for. Given the example, the easiest regex match that I came up with was to begin the string with the first digit, and end with the last digit, capturing all characters in between.
OUTPUT = FOREACH INPUT GENERATE REGEX_EXTRACT((chararray)$0, '([\d].*[\d])', 1);

How to examine if a the time of a DateTime is between 9 PM and 17 PM (Ruby)?

I have this datetime:
time = 2013-08-21 11:54:49 +0200
I also have two integers that represents a time interval. For example if I wish it to be between 9 AM and 5 PM I will use:
start = 9
finish = 17
I now wish to examine if the datetime is within the interval of the two integers. In this example the result would be true, because the event at 11:54 is between 9 and 17...
How to do this?
If you don't care about timezones the code could look something like:
(start..finish).include? DateTime.now.strftime('%H').to_i
In your question the assignment is not correct. This would work:
require 'date'
event = DateTime.parse('2013-08-21 11:54:49 +0200')
If you have a DateTime instance you can simply use #hour:
event.hour # => 11
You can compare this integer with any other:
event.hour.between?(start, finish) # => true

Resources