hadoop mapreduce - API for getting job log - hadoop

I’m developing a hadoop mapreduce application and i need to present the end user the task log.
(same as hue does).
is there a java-api that extract the logs of specific job?
i tried "JobClient" API without any success.

the Job Attempts API of the HistoryServer provides a link to the logs of each task

Related

How to find hadoop applications ran by oozie (hadoop) job

We know that first oozie runs a hadoop job and using that job it runs other hadoop applications. So I want to find the list of those hadoop applications (eg. application_231232133) ran by oozie (hadoop) job. Currently there is no such api or command.
If you're using Oozie 5.0 or higher then the application type of those jobs is "Oozie Launcher", not "MapReduce" so they are easy to filter out.
You may use Oozie REST API http://oozie.apache.org/docs/4.2.0/WebServicesAPI.html#Job_Information which return externalId attribute for each action filled by hadoop application id.

Get status when running job without hadoop

When I run a hadoop job with the hadoop application it prints a lot of stuff. Among them, It show the relative progress of the job ("map: 30%, reduce: 0%" and stuff like that). But, when running a job without the application it does not print anything, not even errors. Is there a way to get that level of logging without the application? That is, without running [hadoop_folder]/bin/hadoop jar <my_jar> <indexer> <args>....
You can get this information from Application Master (assuming you use YARN and not MR1 where you would get it from Job Tracker). There is usually web UI where you can find this information. Details will depend on your Hadoop installation / distribution.
In case of Hadoop v1 check Job tracker web URL and in case of Hadoop v2 check Application Master web UI

Hadoop ResourceManager not show any job's record

I install Hadoop MultiNode cluster based on this link http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/
then I try to run wordcount example in my environment, but when I access to Resource Manager http://HadoopMaster:8088 to see the job's details, no records show in UI.
I also search this problem, one guy give the solution like that Hadoop is not showing my job in the job tracker even though it is running but in my case, I'm just running hadoop's example, in which wordcount also didn't add any extra configuration for yarn.
Anyone has install successfully Hadoop2 Muiltinode and Hadoop web UI works correctly can help me about this issue or can give a link to install correctly.
Whether you got the output of word-count job?

Retrieving tasktracker logs for a particular job programatically

Hi am working with OozieClient API.
I need to retrieve the task tracker logs for a particular workflow job using the OozieClient API. If not with OozieClient API any other way using a program is also fine. As of now with the OozieClient i am able to get the job log using client.getJobLog(), but i need task tracker logs and not job logs. Kindly help.
Try retrieving the yarn application Id from oozie using OozieClient API.
Once you have this ID you can make a call to history server using its rest api/or history server's client library, to fetch the Log dir path using "jobAttempts" api.
Now you can browse this directory using hadoop client.

How to access Hive log information

I am trying to analyze the performance of the Hive queries. Though I was able to make Hive queries with Java but I still need to access the log information getting generated after each query. Instead of using a hack to read the latest log on the disk and using regex to extract the numbers I am looking for a graceful method if already available.
Any pointers will be helpful. Thanks in advance.
-lg
Query execution details like Status,Finished at, Finished in are displayed in Job Tracer, you can access job tracker programmatically . Related info at this link
How could I programmatically get all the job tracker and tasktracker information that is displayed by Hadoop in the web interface?
Once hive starts running a corresponding map-reduce job starts. The logs of this hadoop job can be found on the corresponding tasktracker on which each task runs.
Use jobclient API to retrieve these logs programmatically.

Resources