What happens if some hadoop map-reduce Job or some hdfs command is running and in the between kerberos ticket expires? - hadoop

Lets say we have kerberos secured Hadoop Cluster. ( For example Kdc is configured to provide kerberos tickets with validity of 2 hour and renewal period of 3 hours after this so total 5 hours. )
Cluster also includes tools like Hive and Pig.
With respect to given scenario can any one help me understand the behaviors in each cases mentioned below(Note: Job: Map-Reduce Job, Job Submitted Through Hive Or Pig)
Lets say job takes 4 hours to run, I am triggering the job, for next two hours user who triggered the job has valid ticket, job runs fine after two hours Job is out of valid kerberos ticket but it has renewal period left, Will job renew it? Will it stop execution? Is there any property to enable or disable such automatically renewal behavior.
Lets say job takes 10 hours to run, I am trigerring the job, for next two hours user who triggered the job has valid ticket, lets say this time after two hours we don't have renewal period left, Will Job continue? Will Job fail with authentication error. Will Job request for new ticket?
Let say Job takes 10 hours to run And Kdc is configured to provide 2 hours ticket. we setup a cron job which runs at every 6 hours and gets a new ticket(Technically User will not run out of valid kerberos ticket any time), Will new generated ticket cause any problem of the running job through old ticket?
I want to understand working behavior of this Jobs as well as tools like kerberos, hive and pig.

Related

How to run Spring boot cron job at different time interval when same code is deployed on 2 diff servers who are started and running at same time?

We are going to deploy same code on two different servers which will be started and running at the same time but the problem is both servers will be started at the same time will run cron job at the same time, then the cron job process will run by 2 servers which will be duplicated. So I want to start cron job on both servers should start at a different initial time so that 1 server can read some rows from DB and finish its task and change the status of the same row which is processed, so that other server can read only new entries.

Scheduling the JDBC consumer job in Stream Sets

I need to schedule the JDBC consumer job to run everyday morning at 5 am, as far as I know, I can make the job run at 5 am when I start the job at 5 am and put 24 hours in the query interval.
But I need to schedule the first instance to start at 5 am without starting it manually (i'm lazy to wake up at 5 am :P) Is there a way to achieve this?
(Copying my answer from Ask StreamSets)
There is no built-in scheduler in SDC, but you could use cron and the StreamSets CLI to start the pipeline.

Spark Launcher Jobs not starting because of token cant be found in cache after 24 hours

I have a Java Application, which runs continuously and checks a table in database for new records. When a New record is added in the table, the Java application do a unzip file and puts into HDFS location and then a Spark Job gets triggered(I am pro-grammatically triggering the Spark Job using 'SparkLauncher" class inside the Java Application), which does the processing for newly added file in HDFS location.
I have scheduled the Java Application in cluster using Oozie Java Action.
The cluster is HDP kerberized cluster.
The Job is working perfectly fine for 24 hours. All the unzip happens and spark job is running.
But after 24 hours the unzip happens in Java Application but the Spark Job is not get triggered in Resource Manager.
Exception : Exception encountered while connecting to the server :INFO: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (owner=****, renewer=oozie mr token, realUser=oozie, issueDate=1498798762481, maxDate=1499403562481, sequenceNumber=36550, masterKeyId=619) can't be found in cache
As per my understanding, after 24 hours oozie is renewing the token, and that token is not getting updated for the Spark launcher Job. The spark Launcher is still looking for the older Token which is not available in cache.
Please help me, how I can make Spark Launcher to look for the new-token.
As per my understanding, after 24 hours oozie is renewing the token
Why? Can you point to any documentation, source code, blog?
Remember that Oozie is a scheduler for batch jobs, and its canonical use case (at Yahoo!) is for triggering hourly jobs.
Only a pathological batch job would run for more than 24h, therefore renewal of the Hadoop delegation token is not really useful in Oozie.
But your Java thing acts as a service, running continuously, and needing automatic restart if it ever crashes. So you should consider...
either Slider, if you really want to run it inside YARN (although there
are many, many drawbacks -- how do you inspect the
logs of a running YARN job? how can you make sure that the app starts on time and is not delayed by a lack of resources? how can you make sure that your app will not be killed because YARN needs resources for a high-priority job?) but it is probably overkill for simply running your toy app
or a plain Linux service running on some Edge Node -- it's a Do-It-Yourself task, but not extremely complicated, and there are tutorials on the web
If you insist on using Oozie, in spite of all the limitations of both YARN and Oozie, then you have to change the way your app runs -- for instance, schedule the Coordinator to launch a job every 12h and pass the "nominal time" as Workflow property, edit the Workflow to pass that time to the Java app, edit the Java code so that the app exits at (arg + 11:58) and clears the way for the next exec.

Oozie not cleaning up old jobs from Oozie database

I have the following properties set in my oozie-site.xml (Using safety-valve in Cloudera Manager)
oozie.services.ext - org.apache.oozie.service.PurgeService
oozie.service.PurgeService.older.than - 15
oozie.service.PurgeService.coord.older.than - 7
oozie.service.PurgeService.bundle.older.than - 7
oozie.service.PurgeService.purge.interval - 60
However, I still see some old jobs which are KILLED or completed as old as September 2014
To give an example,
I have a Coordinator which is currently in RUNNING state. When I use the Oozie Web Console to list the instances of that Co-ordinator i.e. Click on Co-ordinator tab and click on my co-ordinator and in the pop up I see the oldest job of all materialised workflow jobs (co-ordinator actions) of September 2014.
I assume the property responsible for cleaning this up is oozie.service.PurgeService.older.than which I have set to 15 days.
So what am I missing here?
The problem is for long running coordinator jobs with high frequency. all child workflows are never purged as the coord job is still running.
The solution is to (quoting from the external link),
What you can do as a workaround, is split up your long-running
Coordinators. For example, instead of making your Coordinator run for
years? forever?, make it run for, say, 6 months. And have an
identical Coordinator scheduled to start exactly when that one ends.
This will allow Oozie to cleanup the old child Workflows from that
Coordinator every 6 months. Otherwise, you can schedule a cron job
to manually delete old jobs from the Database. However, please be
careful about this. When deleting a workflow job from the WF_JOBS
table, you'll also need to delete the workflow actions from the
WF_ACTIONS table that belong to it, as well as the coordinator action
from the WF_ACTIONS table that it belongs to. If you miss something,
it will likely cause problems.
References:
https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-not-cleaning-up-old-jobs-from-Oozie-database/m-p/30692#U30692
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/zkWa2kDMyyo
http://qnalist.com/questions/5404909/oozie-purging
JIRA Link:
https://issues.apache.org/jira/browse/OOZIE-1532

Oozie Hive action on AWS - unpredictable ip sources break the job

I've been having a few days of unalloyed torture getting Hive jobs to run via Oozie on an AWS 5 machine cluster. The simplest job that involved the live metastore succeeds or fails unpredictably. The error messages are pretty unhelpful:
Hive failed, error message[Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [1]]
Thanks Oozie!
After a lot of fun changing just about every imaginable setting, I studied hivemetastore.log carefully (we have mySQL as the metastore) and realised that every successful request came from 172.31.40.3. Unsuccessful requests came from 172.31.40.2,172.31.40.4 and 172.31.40.5 . The Hive console app makes requests without problems on 172.31.40.1
This is getting somewhere after nearly week of having no idea whatsover is going on. The question is now, what do I need to change to allow all requests from 172.31.40.1-5 in? Or funnel Oozie requests solely through 172.31.40.1 or 172.31.40.3, either.
Why would only 172.31.40.1 and 172.31.40.3 work?
all ideas and suggestions warmly received.
many thanks
Toby
this was so simple in the end - the Oozie client was only installed on 2 of the 5 machines in the cluster. Corresponding, of course, to the 2 IP addresses that could make successful requests to the hive metastore
Once we installed the Oozie client onto all the machines in the cluster, all the jobs were automatically accepted and ran OK
obvious when you know the answer ...

Resources