I need to process a dataset in Pig, which is available once per day at midnight. Therefor I have an Oozie coordinator that takes care of the scheduling and spawns a workflow every day at 00:00.
The file names follow the URI scheme
hdfs://${dataRoot}/input/raw${YEAR}${MONTH}${DAY}${HOUR}.avro
where ${HOUR} is always '00'.
Each entry in the dataset contains a UNIX timestamp and I want to filter out those entries which have a timestamp before 11:45pm (23:45). As I need to run on datasets from the past, the value of the timestamp defining the threshold needs to be set dynamically according to the day currently processed. For example, proessing the dataset from December, 12th 2013 needs the threshold 1418337900. For this reason, setting the threshold must be done by the coordinator.
To the best of my knowledge, there is no possibility to transfrom a formatted date into a UNIX timestamp in EL. I came up with a quite hacky solution:
The coordinator passes date and time of the threshold to the respective workflow which starts the parameterized instance of the Pig script.
Excerpt of the coordinator.xml:
<property>
<name>threshold</name>
<value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -15, 'MINUTE'), 'yyyyMMddHHmm')}</value>
</property>
Excerpt of the workflow.xml:
<action name="foo">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>${applicationPath}/flights.pig</script>
<param>jobInput=${jobInput}</param>
<param>jobOutput=${jobOutput}</param>
<param>threshold=${threshold}</param>
</pig>
<ok to="end"/>
<error to="error"/>
</action>
The Pig script needs to convert this formatted datetime into a UNIX timestamp. Therefor, I have writte a UDF:
public class UnixTime extends EvalFunc<Long> {
private long myTimestamp = 0L;
private static long convertDateTime(String dt, String format)
throws IOException {
DateFormat formatter;
Date date = null;
formatter = new SimpleDateFormat(format);
try {
date = formatter.parse(dt);
} catch (ParseException ex) {
throw new IOException("Illegal Date: " + dt + " format: " + format);
}
return date.getTime() / 1000L;
}
public UnixTime(String dt, String format) throws IOException {
myTimestamp = convertDateTime(dt, format);
}
#Override
public Long exec(Tuple input) throws IOException {
return myTimestamp;
}
}
In the Pig script, a macro is created, initializing the UDF with the input of the coordinator/workflow. Then, you can filter the timestamps.
DEFINE THRESH mystuff.pig.UnixTime('$threshold', 'yyyyMMddHHmm');
d = LOAD '$jobInput' USING PigStorage(',') AS (time: long, value: chararray);
f = FILTER d BY d <= THRESH();
...
The problem that I have leads me to the more general question, if it is possible to transform an input parameter in Pig and use it again as some kind of constant.
Is there a better way to solve this problem or is my approach needlessly complicated?
Edit: TL;DR
After more searching I found someone with the same problem:
http://grokbase.com/t/pig/user/125gszzxnx/survey-where-are-all-the-udfs-and-macros
Thanks Gaurav for recommending the UDFs in piggybank.
It seems that there is no performant solution without using declare and a shell script.
You can put the Pig script into a Python script and pass the value.
#!/usr/bin/python
import sys
import time
from org.apache.pig.scripting import Pig
P = Pig.compile("""d = LOAD '$jobInput' USING PigStorage(',') AS (time: long, value: chararray);
f = FILTER d BY d <= '$thresh';
""")
jobinput = {whatever you defined}
thresh = {whatever you defined in the UDF}
Q = P.bind({'thresh':thresh,'jobinput':jobinput})
results = Q.runSingle()
if results.isSuccessful() == "FAILED":
raise "Pig job failed"
Related
Actually I'm trying to to shift the time, the application I'm working on is in UTC and I'm working in IST.
I've used both BEAN Shell pre processor and shiftTime function
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
int AddSeconds= 00; //this variable needs to be customized as per your need
int AddMinutes= 392; //this variable needs to be customized as per your need
int AddHours= 00; //this variable needs to be customized as per your need
Date now = new Date();
Calendar c = Calendar.getInstance();
c.setTime(now);
c.add(Calendar.SECOND, AddSeconds);
c.add(Calendar.MINUTE, AddMinutes);
c.add(Calendar.HOUR, AddHours);
Date NewTime = c.getTime();
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
String mytime = df.format(NewTime);
vars.put("NewTime",mytime);
Shift Time :
"${__timeShift(yyyy-MM-dd'T'HH:mm:ss.SSS'Z',,PT393M,,)}"
Bun when I run the HTTP Request in Jmeter the time format is coming in 12Hrs only instead of 24 Hr
Also Time shift is taking in weird manner, I've tried all options from Stackoverflow from last 2 days and unable to achieve my task to convert IST to UTC.
This is what I'm using in Jmeter Post body
enter image description here
And this is what I'm getting as result
enter image description here
Time formats are getting totally mismatched here, can someone please help me to convert IST to UTC correctly while playing with these time formats and functions.
I don't think you can use __timeShift() function for getting the date in the different timezone, it will return you the current (default) one
So if you need to add 392 minutes to the current time in the time zone different from yours - you will have to go for __groovy() function and use TimeCategory class
Example code:
${__groovy(def now = new Date(); use(groovy.time.TimeCategory) { def nowPlusOneYear = now + 392.minute; return nowPlusOneYear.format("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"\,TimeZone.getTimeZone('IST')) },)}
Demo:
If you need to get the time in UTC just change IST to UTC
${__groovy(def now = new Date(); use(groovy.time.TimeCategory) { def nowPlusOneYear = now + 392.minute; return nowPlusOneYear.format("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"\,TimeZone.getTimeZone('UTC')) },)}
More information: Creating and Testing Dates in JMeter - Learn How
Also be informed that since JMeter 3.1 you're supposed to be using JSR223 Test Elements and Groovy language for scripting
Just adding one more way to do this via bean shell preprocessor and it worked for me pretty well.
Here is the code to use... here -325 minutes is the difference between IST and UTC
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
int AddSeconds= 00; //this variable needs to be customized as per your need
int AddMinutes= -325; //this variable needs to be customized as per your need
int AddHours= 00; //this variable needs to be customized as per your need
Date now = new Date();
Calendar c = Calendar.getInstance();
c.setTime(now);
c.add(Calendar.SECOND, AddSeconds);
c.add(Calendar.MINUTE, AddMinutes);
c.add(Calendar.HOUR, AddHours);
Date NewTime = c.getTime();
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
String mytime = df.format(NewTime);
vars.put("NewTime",mytime);
// NewTime is your jmeter variable
I've read through the available q and a on SO, but nothing I have found answers my question of how to format my time in 12hour format.
Following is my code that runs a query on a MySQL database and returns results, checking to see if an appointment is within 15 minutes of login so an alert can pop.
public void apptCheck(int userId) throws SQLException {
// this method checks for an appointment occurring within 15 minutes of login
Statement apptStatement = DBQuery.getStatement();
String apptQuery = "Select apt.start, cs.customerName from DBName.appointment apt "
+ "JOIN DBName.customer cs ON cs.customerId = apt.customerId WHERE "
+ "userId = " + userId + " AND start >= NOW() AND start < NOW() + interval 16 minute";
apptStatement.execute(apptQuery);
ResultSet apptRs = apptStatement.getResultSet();
while(apptRs.next()) {
Timestamp apptTime = apptRs.getTimestamp("start");
ResourceBundle languageRB = ResourceBundle.getBundle("wgucms/RB", Locale.getDefault());
Alert apptCheck = new Alert(AlertType.INFORMATION);
apptCheck.setHeaderText(null);
apptCheck.setContentText(languageRB.getString("apptSoon") + " " + apptTime.toInstant().atZone(ZoneId.systemDefault()));
apptCheck.showAndWait();
}
My result is:
I want the time to display 3:00, not the 19:00 - 06:00. How can I make that happen?
ZonedDateTime zonedDateTime=ZonedDateTime.of(apptTime.toLocalDateTime(),ZoneId.systemDefault());
You can use ZonedDateTime and format the time as you want.
docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html ZonedDateTime has a lot of features you can see all here and you can get the hour, minute, day etc.
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("hh:mm:ss");
String formattedString = zonedDateTime.format(formatter);
if you only want time in 12hour format you can use this
I found the solution which will perform the UTC to local time conversion and then format the time so that the resulting alert is in 12 hour time format without the date or time zone info. Here is the full code:
while(apptRs.next()) {
Timestamp apptTime = apptRs.getTimestamp("start");
// perform time conversion from UTC to User Local Time
ZoneId zidApptTime = ZoneId.systemDefault();
ZonedDateTime newZDTApptTime = apptTime.toLocalDateTime().atZone(ZoneId.of("UTC"));
ZonedDateTime convertedApptTime = newZDTApptTime.withZoneSameInstant(zidApptTime);
ResourceBundle languageRB = ResourceBundle.getBundle("wgucms/RB", Locale.getDefault());
Alert apptCheck = new Alert(AlertType.INFORMATION);
apptCheck.setHeaderText(null);
// set the Alert text and format in 12 hour format
apptCheck.setContentText(languageRB.getString("apptSoon") +
convertedApptTime.toInstant().atZone(ZoneId.systemDefault())
.format(DateTimeFormatter.ofPattern("h:mm a")) + ".");
apptCheck.showAndWait();
}
i have a time format like this:
string s = DateTime.Now.ToString();
which gives me output like
11/29/2013 6:26:13PM
Now how can i convert this output into millisecond in windowsPhone???
Updated:
First i want to save the current time when the user launch my app. after that whenever the user launch my app again then i also get the time and compare the current launching time with previously stored time and check whether the time difference becomes "one day" or not.
For this comparison i need to covert 11/29/2013 6:26:13PM this into millisecond.
Another question tell me how can i convert "6:26:13PM" only this into millisecond??
If I understood correctly just do this:
Create a date from your input:
DateTime yourInitialDateTime = DateTime.Parse("11/29/2013 6:26:13PM");
After that
TimeSpan span = DateTime.Now - yourInitialDateTime;
So in span.TotalDays you will have how many days has passed.
Edit
If you have only the time of day and want to know the millisecond of that time you must add a date and subtract it with hour 0:00:00 like this:
string dummyDate = "01/01/0001";
DateTime end = DateTime.Parse(dummyDate + " " + "6:26:13PM");
var milli = end.Subtract(new DateTime()).TotalMilliseconds;
That is it.
Try this.
var ThatDay = DateTime.Now.AddDays(-1); //This is hard coded but you have to get from where you are storing.
var Today = DateTime.Now;
var Diff = (Today - ThatDay).Milliseconds;
var FriendlyDiff = (Today - ThatDay).ToFriendlyDisplay(5);
public static class TimeSpanExtensions
{
private enum TimeSpanElement
{
Millisecond,
Second,
Minute,
Hour,
Day
}
public static string ToFriendlyDisplay(this TimeSpan timeSpan, int maxNrOfElements)
{
maxNrOfElements = Math.Max(Math.Min(maxNrOfElements, 5), 1);
var parts = new[]
{
Tuple.Create(TimeSpanElement.Day, timeSpan.Days),
Tuple.Create(TimeSpanElement.Hour, timeSpan.Hours),
Tuple.Create(TimeSpanElement.Minute, timeSpan.Minutes),
Tuple.Create(TimeSpanElement.Second, timeSpan.Seconds),
Tuple.Create(TimeSpanElement.Millisecond, timeSpan.Milliseconds)
}
.SkipWhile(i => i.Item2 <= 0)
.Take(maxNrOfElements);
return string.Join(", ", parts.Select(p => string.Format("{0} {1}{2}", p.Item2, p.Item1, p.Item2 > 1 ? "s" : string.Empty)));
}
}
I need help , I have an arrayList of objects . This object contains multiple fields I'm interested in this question by two date fields (date_panne date_mise and running) and two other time fields (heure_panne and time start),
And I would like to obtain the sum of the difference between (date_panne, heure_panne) and (date_mise_en_marche; heure_mise_en_marche) to give the total time of failure.
if someone can help me please I will be gratful this is my function :
public String disponibile() throws Exception {
int nbreArrets = 0;
List<Intervention> allInterventions = interventionDAO.fetchAllIntervention();
List<Intervention> listInterventions = new ArrayList<Intervention>();
for (Intervention currentIntervention : allInterventions) {
if (currentIntervention.getId_machine() == this.intervention.getId_machine()
&& currentIntervention.getDate_panne().compareTo(getProductionStartDate()) >= 0
&& currentIntervention.getDate_panne().compareTo(getProductionEndDate()) <= 0) {
listInterventions.add(currentIntervention);
}
}
savedInterventionList = listInterventions;
return "successView" ;
}
Assuming the the dates are truncated to the day and are of type java.util.Date, and that the times only contain hours, minutes, seconds and milliseconds and are also of type Date, start by creating a method like
private Date combine(Date dateOnly, Date timeOnly) {
Calendar dateCalendar = Calendar.getInstance();
dateCalendar.setTime(dateOnly);
Calendar timeCalendar = Calendar.getInstance();
timeCalendar.setTime(timeOnly);
dateCalendar.add(Calendar.HOUR_OF_DAY, timeCalendar.get(Calendar.HOUR_OF_DAY));
dateCalendar.add(Calendar.MINUTE, timeCalendar.get(Calendar.MINUTE));
dateCalendar.add(Calendar.SECOND, timeCalendar.get(Calendar.SECOND));
dateCalendar.add(Calendar.MILLISECOND, timeCalendar.get(Calendar.MILLISECOND));
return dateCalendar.getTime();
}
Now, it's simply a matter of looping through the interventions you want to sum, computing the difference between the dates as milliseconds, and add them:
long totalMillis = 0L;
for (Intervention intervention : interventions) {
Date marche = combine(intervention.getDateMiseEnMarche(), intervention.getTimeMiseEnMarche());
Date panne = combine(intervention.getDatePanne(), intervention.getTimePanne());
long differenceInMillis = marche.getTime() - panne.getTime();
totalMillis += differenceInMillis;
}
I am trying to perform incremental backup , I have already checked Export option but couldn't figure out start time option.Also please suggest on CopyTable , how can I restore.
Using CopyTable you just receive copy of given table on the same or another cluster (actually CopyTable MapReduce job). No miracle.
Its your own decision how to restore. Obvious options are:
Use the same tool to copy table back.
Just get / put selected rows (what I think you need here). Please pay attention you should keep timestamps while putting data back.
Actually for incremental backup it's enough for you to write job which scans table and gets/puts rows with given timestamps into table with the name calculated by date. Restore should work in reverse direction - read table with calculated name and put its record with the same timestamp.
I'd also recommend to you following technique: table snapshot (CDH 4.2.1 uses HBase 0.94.2). It looks not applicable for incremental backup but maybe you find something useful here like additional API. From the point of view of backup now it looks nice.
Hope this will help somehow.
The source code suggests
int versions = args.length > 2? Integer.parseInt(args[2]): 1;
long startTime = args.length > 3? Long.parseLong(args[3]): 0L;
long endTime = args.length > 4? Long.parseLong(args[4]): Long.MAX_VALUE;
The accepted answer doesn't pass version as a parameter. How did it work then?
hbase org.apache.hadoop.hbase.mapreduce.Export test /bkp_destination/test 1369060183200 1369063567260023219
From source code this boils down to -
1369060183200 - args[2] - version
1369063567260023219 - args[3] - starttime
Attaching source for ref:
private static Scan getConfiguredScanForJob(Configuration conf, String[] args) throws IOException {
Scan s = new Scan();
// Optional arguments.
// Set Scan Versions
int versions = args.length > 2? Integer.parseInt(args[2]): 1;
s.setMaxVersions(versions);
// Set Scan Range
long startTime = args.length > 3? Long.parseLong(args[3]): 0L;
long endTime = args.length > 4? Long.parseLong(args[4]): Long.MAX_VALUE;
s.setTimeRange(startTime, endTime);
// Set cache blocks
s.setCacheBlocks(false);
// set Start and Stop row
if (conf.get(TableInputFormat.SCAN_ROW_START) != null) {
s.setStartRow(Bytes.toBytesBinary(conf.get(TableInputFormat.SCAN_ROW_START)));
}
if (conf.get(TableInputFormat.SCAN_ROW_STOP) != null) {
s.setStopRow(Bytes.toBytesBinary(conf.get(TableInputFormat.SCAN_ROW_STOP)));
}
// Set Scan Column Family
boolean raw = Boolean.parseBoolean(conf.get(RAW_SCAN));
if (raw) {
s.setRaw(raw);
}
if (conf.get(TableInputFormat.SCAN_COLUMN_FAMILY) != null) {
s.addFamily(Bytes.toBytes(conf.get(TableInputFormat.SCAN_COLUMN_FAMILY)));
}
// Set RowFilter or Prefix Filter if applicable.
Filter exportFilter = getExportFilter(args);
if (exportFilter!= null) {
LOG.info("Setting Scan Filter for Export.");
s.setFilter(exportFilter);
}
int batching = conf.getInt(EXPORT_BATCHING, -1);
if (batching != -1){
try {
s.setBatch(batching);
} catch (IncompatibleFilterException e) {
LOG.error("Batching could not be set", e);
}
}
LOG.info("versions=" + versions + ", starttime=" + startTime +
", endtime=" + endTime + ", keepDeletedCells=" + raw);
return s;
}
Found out the issue here, the hbase documentation says
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
so after trying a few of combinations, I found out that it is converted to a real example like below code
hbase org.apache.hadoop.hbase.mapreduce.Export test /bkp_destination/test 1369060183200 1369063567260023219
where
test is tablename,
/bkp_destination/test is backup destination folder,
1369060183200 is starttime,
1369063567260023219 is endtime