How to insert previous month of data into database Nifi? - apache-nifi

I have data in which i need to compare month of data if it is previous month then it should be insert otherwise not.
Example:
23.12.2016 12:02:23,Koji,24
22.01.2016 01:21:22,Mahi,24
Now i need to get first column of data (23.12.2016 12:02:23) and then get month (12) on it.
Compared that with before of current month like.,
If current month is 'JAN_2017',then get before of 'JAN_2017' it should be 'Dec_2016'
For First row,
compare this 'Dec_2016'[month before] with month of data 'Dec_2016' [23.12.2016].
It matched then insert into database.
EDIT 1:
i have already tried with your suggestions.
"UpdateAttribute to add a new attribute with the previous month value, and then RouteOnAttribute to determine if the flowfile should be inserted "
i have used below expression language in RouteOnAttribute,
${literal('Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec'):getDelimitedField(${csv.1:toDate('dd.MM.yyyy hh:mm:ss'):format('MM')}):equals(${literal('Dec,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov'):getDelimitedField(${now():toDate(' Z MM dd HH:mm:ss.SSS yyyy'):format('MM'):toNumber()})})}
it could be failed in below data.,
23.12.2015,Andy,21
23.12.2017,Present,32
My data may contains some past years and future years
It matches with my expression it also inserted.
I need to check month with year in data.
How can i check it?

The easiest answer is to use the ExecuteScript processor with simple date logic (this will allow you to use the Groovy/Java date framework to correctly handle things like leap years, time zones, etc.).
If you really don't want to do that, you could probably use a regex and Expression Language in UpdateAttribute to add a new attribute with the previous month value, and then RouteOnAttribute to determine if the flowfile should be inserted into the database.
Here's a simple Groovy test demonstrating the logic. You'll need to add the code to process the session, flowfile, etc.
#Test
public void textScriptShouldFindPreviousMonth() throws Exception {
// Arrange
def input = ["23.12.2016 12:02:23,Koji,24", "22.01.2016 01:21:22,Mahi,24"]
def EXPECTED = ["NOV_2016", "DEC_2015"]
// Act
input.eachWithIndex { String data, int i ->
Calendar calendar = Date.parse("dd.MM.yyyy", data.tokenize(" ")[0]).toCalendar()
calendar.add(Calendar.MONTH, -1)
String result = calendar.format("MMM_yyyy").toUpperCase()
// Assert
assert result == EXPECTED[i]
}
}

Related

How to get only first part of the date ('DD') from an entire date?

I've converted a date into string. Now, I want to parse a string to get the DD part from DD-MM-YYYY.
For e.g.
If the date is 03-05-2017 (DD-MM-YYYY) then the goal is to get only first part of the string i.e. 03 (DD).
You've tagged this question as a ServiceNow question, so I assume you're using the ServiceNow GlideDateTime Class to derive the date as a string. If that is correct, did you know that you can actually derive the day of the month directly from the GlideDateTime object? You can use the getDayOfMonth(), getDayOfMonthLocalTime(), or getDayOfMonthUTC().
You could of course, also use String.prototype.indxOf() to get the first hyphen's location, and then return everything up to that location using String.prototype.slice().
Or, if you're certain that the day of the month in the string will contain an initial zero, you can simply .slice() out a new string from index 0 through index 2.
var date = '03-05-2017';
var newDate = date.slice(0, 2);
console.log(newDate); //==>Prints "03".
var alternateNewDate = date.slice(0, date.indexOf('-'));
console.log(alternateNewDate); //==>Prints "03".

Retrieve database records between two weekdays

I have several records in my database, the table has a column named "weekday" where I store a weekday like "mon" or "fri". Now from the frontend when a user does search the parameters posted to the server are startday and endDay.
Now I would like to retrieve all records between startDay and endDay. We can assume startDay is "mon" and endDay is "sun". I do not currently know how to do this.
Create another table with the names of the days and their corresponding number. Then you'd just need to join up your current table with the days table by name, and then use the numbers in that table to do your queries.
Not exactly practical, but it is possible to convert sun,mon,tue to numbers using MySQL.
Setup a static year and week number like 201610 for the 10th week of this year, then use a combination of DATE_FORMAT with STR_TO_DATE:
DATE_FORMAT(STR_TO_DATE('201610 mon', '%X%V %a'), '%w')
DATE_FORMAT(STR_TO_DATE('201610 sun', '%X%V %a'), '%w')
DATE_FORMAT(STR_TO_DATE('201610 tue', '%X%V %a'), '%w')
These 3 statements will evaluate to 0,1,2 respectively.
The main thing this is doing is converting the %a format (Sun-Sat) to the %w format (0-6)
well i don't know the architecture of your application as i think storing and querying a week day string is not appropriate, but i can tell you a work around this.
make a helper function which return you an array of weekdays in the range i-e
function getWeekDaysArray($startWeekDay, $endWeekDay) {
returns $daysArray['mon','tue','wed'];
}
$daysRangeArray = getWeekDaysArray('mon', 'wed');
now with this array you can query in table
DB::table('TableName')->whereIn('week_day', $daysRangeArray)->get();
Hope this help

Subtract One row's value from another row in Pig

I'm trying to develop a sample program using Pig to analyse some log files. I want to analyze the running time of different jobs. When I read in the log file of the job, I get the start time and the end time of the job, like this:
(Wed,03/20/13,01:03:37,EDT)
(Wed,03/20/13,01:05:00,EDT)
Now, to calculate the elapsed time, I need to subtract these 2 timestamps, but since both timestamps are in the same bag, I'm not sure how to compare them. So I'm looking for an idea on how to do this. thanks!
Is there a unique ID for the job that is in both log lines? Also is there something to indicate which event is start, and which is end?
If so, you could read the dataset twice, once for start events, once for end-events, and join the two together. Then you'll have one record with both events in it.
so:
A = FOREACH logline GENERATE id, type, timestamp;
START = FILTER A BY (type == 'start');
END = FILTER A BY (type == 'end');
JOINED = JOIN START by ID, END by ID;
DIFF = FOREACH JOINED GENERATE (START.timestamp - END.timestamp); // or whatever;

SOQL - single row per each group

I have the following SOQL query to display List of ABCs in my Page block table.
Public List<ABC__c> getABC(){
List<ABC__c> ListABC = [Select WB1__c, WB2__c, WB3__c, Number, tentative__c, Actual__c, PrepTime__c, Forecast__c from ABC__c ORDER BY WB3__c];
return ListABC;
}
As you can see in the above image, WB3 has number of records for A, B and C. But I want to display only 1 record for each WB3 group based on Actual__c. Only latest Actual__c must be displayed for each WB3 Group.
i.e., Ideally I want to display only 3 rows(one each for A,B,C) in this example.
For this, I have used GROUPBY and displayed the result using AggregateResults. Here is the result.
I got the Latest Actual Date for each WB3 as shown above. But the Tentative date is not corresponding to it. The Tentative Date is also the MAX in the list.
Here is the code I used
public List<SiteMonitoringOverview> getSPM(){
AggregateResult[] AgR = [Select WB_3__c, MAX(Tentaive_Date__c) dtTentativeDate , MAX(Actual_Date__c) LatestCDate FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
if(AgR.size()>0){
for(AggregateResult SalesList : AgR){
CustSumList.add(new SiteMonitoringOverview(String.ValueOf(SalesList.ge​t('WB_3__c')), String.valueOf(SalesList.get('dtTentativeDate')), String.valueOF(SalesList.get('LatestCDate')) ));
}
}
return CustSumList;
}
I am forced to use MAX() for tentative date. I want the corresponding Tentative date of the MAX Actual Date. Not the Max Tentative Date.
For group A, the Tentative Date of Max Actual Date is 12/09/2012. But it is displaying the MAX tentative date: 27/02/2013. It should display 12/09/2012. This is because I am using MAX(Tentative_Date__c) in my code. Every column in the SOQL query must be either GROUPED or AGGREGATED. That's weird.
How do I get the required 3 rows in this example?
Any suggestions? Any different approach (looping within in groups)? how?
Just ran into this issue myself. The solution I came up with only works if you want the oldest or newest record from each grouping. Unfortunately it probably won't work in your case. I'll still leave this here incase it does happen to help someone searching for a solution to this issue.
AggregateResult[] groupedResults = [Select Max(Id), WBS_3__c FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
Calling MAX or MIN on the Id will let you get 1 record per group condition. You can then query other information. I my case I just need 1 record from each group and didn't really care which one it was.

why Mutation does not make inserts for existing columns

I am loading initial data (url list for a crawler) to Cassandra with status crawled=0. Then using Hadoop I crawl all the links and try to change crawled from 0 to something else, for example 1 or 2, or 3. When I check in Cassandra cli interface get ColumnFamily['www.somedomain.com'] the value of crawler column remains the same. If during initial import I have not mentioned crawled column, it adds correctly. This is only one part of the algorithm and I need further updates of this column with other Map/Reduce jobs, etc.
In Thrift and Cassandra API it is said that we have only inserts and deletions. Insert should work as an update.
For crawled column I have UTF8 type.
Mutation class is like this:
private static Mutation getMutationCrawled(Text crawledVal)
{
Text column = new Text();
column.set("crawled");
Column c = new Column();
c.setName(ByteBuffer.wrap(Arrays.copyOf(column.getBytes(), column.getLength())));
c.setValue(ByteBuffer.wrap(crawledVal.getBytes()));
c.setTimestamp(System.currentTimeMillis());
Mutation m = new Mutation();
m.setColumn_or_supercolumn(new ColumnOrSuperColumn());
m.column_or_supercolumn.setColumn(c);
return m;
}
Cassandra resolves conflicts using the timestamp of the mutation, with the largest timestamp winning. You can set the timestamp value to whatever you want, but the convention is to set the timestamp as a value in micro seconds. In the example above, you set the timestamp with,
c.setTimestamp(System.currentTimeMillis());
Most likely the initial import code to populate the values is setting the timestamp in micro seconds. The micro second timestamp values are larger than the millisecond timestamp values, so your updates are being ignored.

Resources