Log is not getting printed in Flume server console - client-server

I have a standalone java application which uses log4j2 to print the logger statements.
I have configured flume client and flume server. Flume server is working fine. When I
run the java application, I get the connection created in server but the logger statements
are not getting printed.
Here is the logger statements in Flume server console:
INFO [lifecycleSupervisor-1-5] (org.apache.flume.source.AvroSource.start:168) - Avro source r1 started.
INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x00633e5e, /127.0.0.1:56177 => /127.0.0.1:8800] OPEN
INFO [pool-8-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x00633e5e, /127.0.0.1:56177 => /127.0.0.1:8800] BOUND: /127.0.0.1:8800
INFO [pool-8-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x00633e5e, /127.0.0.1:56177 => /127.0.0.1:8800] CONNECTED: /127.0.0.1:56177
Please help me with this issue. How should I get the logger statements from java standalone application in Flume server console?
Thanks in advance.

Why would you want to see the log events in the server console? When you have your system running at full capacity that is a lot of events and you will unnecessarily slow down your system.
I would recommend using the Flume file roll sink to output the events to a log file so that you can confirm they are getting through.
If you really want, you can turn up the logging on the Flume server itself. I think DEBUG or TRACE level will start producing a lot of information about what's happening - including the actual log message.
Update
I think you are confusing two things.
Flume transports and stores log statements. Generally, those log statements are sent from the source system using a logging framework like log4j. Flume can then store them - for example in files using file_role sink.
Flume itself logs its behaviour. It, too, uses log4j and will write to a file. However, there is nothing in the Flume docs about Flume writing the events it is transporting to its own log.

I got the answer. The downloaded files for Flume server was not complete. I was missing Flume.bat in D:\apache-flume-1.3.1-bin\bin folder. So I got the window based apache flume server file downloaded once again. You can download it here
Set FLUME_HOME in flume.bat file which is available into D:\apache-flume-1.3.1-bin\bin folder.
Now in command prompt, under D:\apache-flume-1.3.1-bin\bin run flume.bat to start flume server.
Now when I run the client application, all the logger statements are shown in flume server console and the file appender mentioned in flume server log4j.properties.

Related

How to get app runtime on hadoop yarn

Will yarn store informations about finished app including runtime on hdfs? I just want to get the app runtime through some files on the hdfs(if there did exist such file, I have checked the logs and there is no runtime informations) without using any monitoring software.
You can use the ResourceManager REST to fetch the information of all the Finished applications.
http://resource_manager_host:port/ws/v1/cluster/apps?state=FINISHED
A GET request to the URL will return a JSON response (XML can also be obtained). The response has to be parsed for elapsedTime for each application to get the running time of the application.
To look up persistent job history file, you will need to check Job History Server or Timeline Server instead of Resource Manager:
Job history is aggregated onto HDFS, and can be seen from job history server UI (or REST API). The history files are stored on mapreduce.jobhistory.done-dir on HDFS.
Job history can also be aggregated by timeline server (filesystem based, aka ATS 1.5) and can be seen from timeline server UI (or REST API). The history files are stored on yarn.timeline-service.entity-group-fs-store.done-dir on HDFS.

Does Embedded flume agent need hadoop to function on cluster?

I am trying to write embedded flume agent in my web service to transfer my logs to another hadoop cluster where my flume agent is running. To work with Embedded flume agent, do we need hadoop to be run in server where my web service is running.
TLDR: I think, no.
Longer version: I haven't checked, but in the developer guide (https://flume.apache.org/FlumeDeveloperGuide.html#embedded-agent) it says
Note: The embedded agent has a dependency on hadoop-core.jar.
(https://flume.apache.org/FlumeDeveloperGuide.html#embedded-agent)
And in the User Guide (https://flume.apache.org/FlumeUserGuide.html#hdfs-sink), you can specify the HDFS path:
HDFS directory path (eg hdfs://namenode/flume/webdata/)
On the other hand, are you sure you want to work with the embedded agent instead of running Flume where you want to put the data and use HTTP Source for example? (https://flume.apache.org/FlumeUserGuide.html#http-source) (...or any other source you can send data to)

Bluemix Analytics for Apache Hadoop Big SQL - How to access logs for debug?

I am using Big SQL from Analytics for Apache Hadoop in Bluemix and would like to look into logs in order to debug (e.g. map reduce job log - usually available under http://my-mapreduce-server.com:19888/jobhistory, bigsql.log from the Big SQL worker nodes).
Is there a way in Bluemix to access those logs?
Log files for most IOP components (e.g. MapReduce Job History Log, Resource Manager Log) are accessible from Ambari console's Quick Links. Just navigate to the respective service page. Log files for BigSQL is currently not available. Since the cluster is not hosted as Bluemix appls, they cannot be retrieved using the Bluemix cf command.

How can I configure Hadoop to send my MapReduce logs to graylog2

I'm working with Hadoop 1.2.1 to create a series of chained map reduce jobs which will be run regularly in our production environment. At the moment, we are using graylog2 to get centralized access to logs and I would like to have the logs from my job sent to our log server.
I've added the gelfj jar file to /usr/share/hadoop/lib and modified the /etc/hadoop/task-log4j.properties to output logs to graylog but so far am not seeing anything arrive on the graylog2 side. Can anyone confirm that using an alternate log appender is possible for hadoop jobs and what config file should be updated to do so?
Figured this out.
In the hadoop configuration directory (/etc/hadoop in my case) there is a file named task-log4j.properties. Overriding the appender "log4j.appender.TLA" affect the log output from tasks.

Flume: Send files to HDFS via APIs

I am new to Apache Flume-ng. I want to send files from client-agent to server-agent, who will ultimately write files to HDFS. I have seen http://cuddletech.com/blog/?p=795 . This is the best which one i found till now. But it is via script not via APIs. I want to do it via Flume APIs. Please help me in this regard. And tell me steps, how to start and organize code.
I think you should maybe explain more about what you want to achieve.
The link you post appears to be just fine for your needs. You need to start a Flume agent on your client to read the files and send them using the Avro sink. Then you need a Flume agent on your server which uses an Avro source to read the events and write them where you want.
If you want to send events directly from an application then have a look at the embedded agent in Flume 1.4 or the Flume appender in log4j2 or (worse) the log4j appender in Flume.
Check this http://flume.apache.org/FlumeDeveloperGuide.html
You can write client to send events or use Embedded agent.
As for the code organization, it is up to you.

Resources