Scheduling Weekly Oozie - hadoop

I've just started on Oozie. Hoping someone here can offer some useful advice.
Here is a snippet of the coordinator.xml
<coordinator-app name="weeklyABCFacts" frequency="${coord:days(7)}" start="${start}T00:00Z" end="${end}" timezone="CET" xmlns="uri:oozie:coordinator:0.1">
<controls>
<timeout>-1</timeout>
<concurrency>1</concurrency>
<execution>FIFO</execution>
</controls>
<datasets>
<dataset name="weekly-f_stats-flag" frequency="${coord:days(7)}" initial-instance="2013-07-01T00:00Z" timezone="CET">
<uri-template>${nameNode}/warehouse/hive/f_stats/dt=${YEAR}W${WEEK} </uri-template>
</dataset>
</datasets>
...
</coordinator-app>
The part where my question will relate to is in within the tag. They are normally expressed in the following: "...revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}..."
Can this part be expressed in WEEK? i.e. the last column in table rep below.
Reason for the question is that our date table has a field column called 'iso_week' (e.g. 28, or its corresponding date range is 8 July - 14 July 2013). It looks like the following:
-----------------------------------+
|date_field |iso_week|iso_week_date|
-----------------------------------+
'2013-07-08', '28', '2013W28'
'2013-07-09', '28', '2013W28'
'2013-07-10', '28', '2013W28'
'2013-07-11', '28', '2013W28'
'2013-07-12', '28', '2013W28'
'2013-07-13', '28', '2013W28'
'2013-07-14', '28', '2013W28'
I hope this is clear enough, otherwise, please let me know how else I can be more clear.

There is not (in the 3.3.2 source i'm looking at), but there's nothing stopping you from downloading the source and amending the core/java/org/apache/oozie/coord/CoordELEvaluator.java file, specifically the createURIELEvaluator(String) method:
public static ELEvaluator createURIELEvaluator(String strDate) throws Exception {
ELEvaluator eval = new ELEvaluator();
Calendar date = Calendar.getInstance(DateUtils.getOozieProcessingTimeZone());
// always???
date.setTime(DateUtils.parseDateOozieTZ(strDate));
eval.setVariable("YEAR", date.get(Calendar.YEAR));
eval.setVariable("MONTH", make2Digits(date.get(Calendar.MONTH) + 1));
eval.setVariable("DAY", make2Digits(date.get(Calendar.DAY_OF_MONTH)));
eval.setVariable("HOUR", make2Digits(date.get(Calendar.HOUR_OF_DAY)));
eval.setVariable("MINUTE", make2Digits(date.get(Calendar.MINUTE)));
// add the following line:
eval.setVariable("WEEK", make2Digits(date.get(Calendar.WEEK_OF_YEAR)));
return eval;
}
You should then be able to follow the instructions to recompile oozie
I would note that you should be weary of how week numbers and years don't always fit together nicely - for example week 1 of 2013 actually starts in 2012:
Tue Dec 25 11:11:52 EST 2012 : 2012 W 52
Wed Dec 26 11:11:52 EST 2012 : 2012 W 52
Thu Dec 27 11:11:52 EST 2012 : 2012 W 52
Fri Dec 28 11:11:52 EST 2012 : 2012 W 52
Sat Dec 29 11:11:52 EST 2012 : 2012 W 52
Sun Dec 30 11:11:52 EST 2012 : 2012 W 1 <= Here's your problem
Mon Dec 31 11:11:52 EST 2012 : 2012 W 1
Tue Jan 01 11:11:52 EST 2013 : 2013 W 1 <= 'Fixed' from here
Wed Jan 02 11:11:52 EST 2013 : 2013 W 1
Thu Jan 03 11:11:52 EST 2013 : 2013 W 1
Fri Jan 04 11:11:52 EST 2013 : 2013 W 1
Sat Jan 05 11:11:52 EST 2013 : 2013 W 1
Sun Jan 06 11:11:52 EST 2013 : 2013 W 2
Mon Jan 07 11:11:52 EST 2013 : 2013 W 2
Tue Jan 08 11:11:52 EST 2013 : 2013 W 2
As produced by the following test snippet:
#Test
public void testDates() {
Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
cal.set(2012, 11, 25);
for (int x = 0; x < 15; x++) {
System.err.println(cal.getTime() + " : " + cal.get(Calendar.YEAR)
+ " W " + cal.get(Calendar.WEEK_OF_YEAR));
cal.add(Calendar.DAY_OF_YEAR, 1);
}
}

Related

How to check if a date value inside an hash is bigger than a reference date

I'm writing a validation and I have an hash with this structure
elements.map{ |e| [e.id,e.coverable.published_at] }.to_h
=> {305=>Fri, 17 Apr 2020 15:23:00 CEST +02:00,
306=>Fri, 17 Apr 2020 13:00:00 CEST +02:00,
307=>Fri, 17 Apr 2020 09:20:00 CEST +02:00,
308=>Fri, 17 Apr 2020 12:59:00 CEST +02:00,
309=>Fri, 17 Apr 2020 11:39:00 CEST +02:00}
I have a reference date...
published_at
=> Mon, 04 May 2020 23:51:00 CEST +02:00
I have to check if any of the element has a published_at datetime value bigger than my published_at.
Is there a short way to do that?
Try something like this
elements.any? { |e| e.coverable.published_at > your_published_at }
In case you need the element which passes the condition use find
element = elements.find { |e| e.coverable.published_at > your_published_at }
# if element is not nil such element is present

Sort a collection with a property type Date value

I have a collection as:
content [Collection]
[0] [Content]
creationDate Thu Aug 22 11:50:37 GMT 2019
[1] [Content]
creationDate Thu Aug 22 11:45:37 GMT 2019
[2] [Content]
creationDate Thu Aug 22 11:54:37 GMT 2019
How can I sort this collection by date value?
i.e.
content [Collection]
[0] [Content]
creationDate Thu Aug 22 11:45:37 GMT 2019
[1] [Content]
creationDate Thu Aug 22 11:50:37 GMT 2019
[2] [Content]
creationDate Thu Aug 22 11:54:37 GMT 2019
Create a DateComparator for sorting.
class DateComparator implements Comparator<Date> {
public int compare(Date d1, Date d2) {
if (d1.before(d2)) {
return -1;
} else if (d1.after(d2)) {
return 1;
} else {
return 0;
}
}
}
and call content.sort(new DateComparator())

I can't get the 15th weekday in ruby

I cant get the next 15th day but not the working day.
DateTime.now.next_day(+15).strftime('%d %^B %Y')
how can i get the next 15th weekday?
You're just adding 15 days to the current date. What you want is to adjust the date:
date = DateTime.now
if (date.mday > 15)
date = date.next_month
end
date = date.next_day(15 - date.mday)
Where that adjusts to be the 15th of the next month if it's already past the 15th of the current month.
Now this can be extended to be an Enumerator:
def each_mday(mday, from: nil)
from ||= DateTime.now
Enumerator.new do |y|
loop do
if (from.mday > mday)
from = from.next_month
end
from = from.next_day(mday - from.mday)
y << from
from += 1
end
end
end
Which makes it possible to find the first day matching particular criteria, like being a weekday:
each_mday(15, from: Date.parse('2019-06-14')).find { |d| (1..5).include?(d.wday) }
Where that returns July 15th, as June 15th is a weekend.
The from argument is optional but useful for testing cases like this to ensure it's working correctly.
15.times.reduce(Date.civil 2019, 03, 24) do |acc, _|
begin
acc += 1
end while [0, 6].include? acc.wday
acc
end
#⇒ #<Date: 2019-04-12 ((2458586j,0s,0n),+0s,2299161j)>
So you want to add 15 business days from the current date. You can go with iGian or Aleksei vanilla ruby answers or use business_time gem:
15.business_days.from_now
If I understood correctly, you want to get next Monday if you hit Saturday or Sunday. Since wday gives you 0 for Sun and 6 for Sat, you can use it to as a conditional to add days towards Monday.
def date_add_next_week_day(days)
date = (Date.today + days)
date += 1 if date.wday == 6
date += 1 if date.wday == 0
date
end
date_add_next_week_day(15).strftime('%d %^B %Y')
If I get the point you need to find the 15th day after a specified date, skipping weekends.
One possible option is to define the skipping_weekend hash like this, considering Date.html#wday:
skip_weekend = { 6 => 2, 0 => 1}
skip_weekend.default = 0
Then:
next15 = DateTime.now.next_day(15)
next15_working = next15.next_day(skip_weekend[next15.wday]).strftime('%d %B %Y')
Now if next15 falls on a working day, next15_working is the same day (hash defaults to 0), otherwise it skips 2 days if Saturday (6th week day, hash maps to 2) or 1 day if Sunday (0th week day, hash maps to 1)
I assume that, given a starting date, ds (a Date object), and a positive integer n, the problem is determine a later date, dt, such that between ds+1 and dt, inclusive, there n weekdays.
require 'date'
def given_date_plus_week_days(dt, week_days)
wday = dt.wday
weeks, days = (week_days + {0=>4, 6=>4}.fetch(wday, wday-1)).divmod(5)
dt - (wday.zero? ? 6 : (wday - 1)) + 7*weeks + days
end
The variable wday is assigned to the day of week for the start date, dt. The start date is moved back to the previous Monday, unless it falls on a Monday, in which case it is not changed. That is reflected in the expression
wday.zero? ? 6 : (wday - 1)
which is subtracted from dt. The number of week days is correspondingly adjusted to
week_days + { 0=>4, 6=>4 }.fetch(wday, wday-1)
The remaining calculations are straightforward.
def display(start_str, week_days)
start = Date.parse(start_str)
7.times.map { |i| start + i }.each do |ds|
de = given_date_plus_week_days(ds, week_days)
puts "#{ds.strftime("%a, %b %d, %Y")} + #{week_days} -> #{de.strftime("%a, %b %d, %Y")}"
end
end
display("April 8", 15)
Mon, Apr 08, 2019 + 15 -> Mon, Apr 29, 2019
Tue, Apr 09, 2019 + 15 -> Tue, Apr 30, 2019
Wed, Apr 10, 2019 + 15 -> Wed, May 01, 2019
Thu, Apr 11, 2019 + 15 -> Thu, May 02, 2019
Fri, Apr 12, 2019 + 15 -> Fri, May 03, 2019
Sat, Apr 13, 2019 + 15 -> Fri, May 03, 2019
Sun, Apr 14, 2019 + 15 -> Fri, May 03, 2019
display("April 8", 17)
Mon, Apr 08, 2019 + 17 -> Wed, May 01, 2019
Tue, Apr 09, 2019 + 17 -> Thu, May 02, 2019
Wed, Apr 10, 2019 + 17 -> Fri, May 03, 2019
Thu, Apr 11, 2019 + 17 -> Mon, May 06, 2019
Fri, Apr 12, 2019 + 17 -> Tue, May 07, 2019
Sat, Apr 13, 2019 + 17 -> Tue, May 07, 2019
Sun, Apr 14, 2019 + 17 -> Tue, May 07, 2019

Get min and max value from this array of hashes

I have an array that contains a hash in each row containing created_at and a value. How do I get the min and max from the array for the value fields?
The array is called - channels_counts_for_history_graph
and
channels_counts_for_history_graph.max[1]
Gives me the max date rather than the max value?
[[Sun, 30 Dec 2018 15:03:55 UTC +00:00, 4305],
[Sun, 30 Dec 2018 15:05:42 UTC +00:00, 4305],
[Mon, 31 Dec 2018 09:24:06 UTC +00:00, 4306],
[Sat, 05 Jan 2019 09:04:50 UTC +00:00, 4308],
[Tue, 01 Jan 2019 11:26:04 UTC +00:00, 4306],
[Wed, 02 Jan 2019 17:24:19 UTC +00:00, 4305]]
Any help appreciated.
Thanks
I suggest using Enumerable#minmax_by to get the min and the max value in just one method call:
array = [['Sun, 30 Dec 2018 15:03:55 UTC +00:00', 4305],['Sun, 30 Dec 2018 15:05:42 UTC +00:00', 4305],['Mon, 31 Dec 2018 09:24:06 UTC +00:00', 4306],['Sat, 05 Jan 2019 09:04:50 UTC +00:00', 4308],['Tue, 01 Jan 2019 11:26:04 UTC +00:00', 4306],['Wed, 02 Jan 2019 17:24:19 UTC +00:00', 4305]]
array.minmax_by(&:last)
#=> [["Sun, 30 Dec 2018 15:03:55 UTC +00:00", 4305], ["Sat, 05 Jan 2019 09:04:50 UTC +00:00", 4308]]
By default when you sort an array sorts by the first element first.
You can reverse the array for the purposes of the sort.
channel_counts_for_history_graph.map(&:reverse).max[0]
I may guess that this is what you were asking for:
[{ created_at: Date.new(2017, 1, 1) }, { created_at: Date.new(2019, 1, 1) }, { created_at: Date.new(2018, 1, 1) }]
.minmax_by { |value| value[:created_at] }

Ruby: Datetime to UTC conversion

I am trying to convert the below date and time combination to UTC
from_date: "2017-06-19",from_time: "14:00"
to_date: "2017-06-19", to_time: "23:00"
Timezone: EDT
I am using below piece of code for conversion
Date.parse(dt).to_datetime + Time.parse(t).utc.seconds_since_midnight.seconds
And it gives the wrong date value for the to_date & to_time combination.
Output:
Date.parse(from_date).to_datetime +
Time.parse(from_time).utc.seconds_since_midnight.seconds
#⇒ **Mon, 19 Jun 2017 18:00:00 +0000**
Date.parse(to_date).to_datetime +
Time.parse(to_time).utc.seconds_since_midnight.seconds
#⇒ **Mon, 19 Jun 2017 03:00:00 +0000**
Above conversion should give "Tue, 20 Jun 2017 03:00:00 +0000" instead.
Below line of codes worked for me:
parsed_date = Time.zone.parse(from_date).strftime('%Y-%m-%d')
parsed_time = Time.zone.parse(from_time).strftime('%T')
Time.parse(parsed_date + ' ' + parsed_time).utc.strftime('%F %T')
require 'time'
from = Time.parse "2017-06-19 14:00 US/Eastern"
=> 2017-06-19 14:00:00 -0400
from.utc
=> 2017-06-19 18:00:00 UTC
to = Time.parse "2017-06-19 23:00 US/Eastern"
=> 2017-06-19 23:00:00 -0400
to.utc
=> 2017-06-20 03:00:00 UTC
Though you can also specify the timezone offset without using the string, doing it this way handles Daylight Savings Time.
I think this is shorter:
from_date = "2017-06-19"
from_time = "14:00"
DateTime.strptime("#{from_date}T#{from_time}ZEDT", "%Y-%m-%dT%H:%MZ%z").utc
=> Mon, 19 Jun 2017 18:00:00 +000
to_date = "2017-06-19"
to_time = "23:00"
DateTime.strptime("#{to_date}T#{to_time}ZEDT", "%Y-%m-%dT%H:%MZ%z").utc
=> Tue, 20 Jun 2017 03:00:00 +0000

Resources