Mapreduce job failing even the required class is present - hadoop

Hi my job is failing due to runtime exception, saying, mapper class not found.
Below is the exception:
14/06/16 05:52:56 INFO mapred.JobClient: Task Id : attempt_201406071432_1142_m_000028_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.cloudera.sa.omniture.mr.OmnitureRawDataMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: Class com.cloudera.sa.omniture.mr.OmnitureRawDataMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
... 8 more
But in my jar i used to run the job had the Mapper class present, still i am facing the issue. pls guide.
Below info. on my jar file confirms the presence of Mapper class in jar file.
[svtdphpd#d-hcr75y1 testJARs]$ jar tvf MalformedAnalysis.jar | grep Omniture
1405 Mon Jun 16 15:59:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureRawDataMapper$HITDATA_PROBLEM.class
3728 Mon Jun 16 15:59:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureRawDataMapper.class
5280 Mon Jun 16 16:00:46 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureToRCFileJob.class
6436 Mon Jun 16 16:03:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureDataFileRecordReader.class
3642 Mon Jun 16 16:00:16 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureRawDataReducer.class
1792 Mon Jun 16 15:59:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureDataFileInputFormat.class
Below is the Job configuration:
// Create job
Job job = Job.getInstance(config, "LoadOmnitureData");
job.setJarByClass(OmnitureToRCFileJob.class);
// Add named output for malformed records
MultipleOutputs.addNamedOutput(job, "Malformed",
TextOutputFormat.class, NullWritable.class, Text.class);
FileInputFormat.addInputPath(job, new Path(inputPath));
job.setInputFormatClass(OmnitureDataFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(OmnitureRawDataReducer.class);
job.setMapperClass(OmnitureRawDataMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
TextOutputFormat.setOutputPath(job, new Path(outputPath));
job.setNumReduceTasks(numReduceTasks);

Related

mvn gluon:runagent failure - Unknown Attribute 'queryAllDeclaredMethods' .. in definition of class ..HelloController

trying to get my first native build working.
(using Windows 10, jdk 17, javafx17, gluon 1.0.9, gluon graalvm (graalvm-svm-windows-gluon-21.2.0-dev.zip))
I am able to run mvn gluonfx:run
(and click on the 1 button I have in my test UI)
However when I run: mvn gluonfx:runagent, I get:
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Error: Error parsing reflection configuration in file:/C:/devel/repos/Gluon-SingleViewProject-jdk17/target/classes/META-INF%5cnative-image%5creflect-config.json:
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Unknown attribute 'queryAllDeclaredMethods' (supported attributes: allDeclaredConstructors, allPublicConstructors, allDeclaredMethods, allPublicMethods, allDeclaredFields, allPublicFields, methods, fields) in defintion of class com.gluonapplication.HelloController
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Verify that the configuration matches the schema described in the -H:PrintFlags=+ output for option ReflectionConfigurationResources.
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] com.oracle.svm.core.util.UserError$UserException: Error parsing reflection configuration in file:/C:/devel/repos/Gluon-SingleViewProject-jdk17/target/classes/META-INF%5cnative-image%5creflect-config.json:
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Unknown attribute 'queryAllDeclaredMethods' (supported attributes: allDeclaredConstructors, allPublicConstructors, allDeclaredMethods, allPublicMethods, allDeclaredFields, allPublicFields, methods, fields) in defintion of class com.gluonapplication.HelloController
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Verify that the configuration matches the schema described in the -H:PrintFlags=+ output for option ReflectionConfigurationResources.
The helloController, simply consists of 1 method atm:
public class HelloController {
public void pressButton(ActionEvent ae){
System.out.println("hello, source pressed: " + ae.getSource());
}
}
Any suggestions/tips greatly appreciated...(based on the error above..looks like build process is may be calling an unsupported method for jdk 17?)

Apache Phoenix UPSERT timeouts on large data

Exception in thread "main"
org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions:
Mon May 09 20:45:17 PDT 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=214280: row '�.�A`' on tabl..
Any suggestions?

Make Cygnus use WebHDFS to write to local HDFS

I'm trying to make a local Orion+Cygnus persist Orion's data on a local HDFS through WebHDFS.
On Cygnus' instructions on gitub, very little is mentioned about WebHDFS, as the configuration is more about HttpFS.
On the .md OrionHDFSsink it's said that hdfs_port=50070 is for WebHDFS, as indeed my HDFS is. So I would expect by setting the port this way cygnus would automatically use WebHDFS, but on my case it doesn't seem to be working this way.
So, here's my agent_1.conf:
cygnusagent.sources = http-source
cygnusagent.sinks = hdfs-sink
cygnusagent.channels = hdfs-channel
# source configuration
cygnusagent.sources.http-source.channels = hdfs-channel
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnusagent.sources.http-source.port = 5050
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
cygnusagent.sources.http-source.handler.notification_target = /notify
cygnusagent.sources.http-source.handler.default_service = def_serv
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
cygnusagent.sources.http-source.handler.events_ttl = 4
cygnusagent.sources.http-source.interceptors = ts gi
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# OrionHDFSSink configuration
cygnusagent.sinks.hdfs-sink.channel = hdfs-channel
cygnusagent.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.OrionHDFSSink
cygnusagent.sinks.hdfs-sink.hdfs_host = localHDFS.ip
cygnusagent.sinks.hdfs-sink.hdfs_port = 50070
cygnusagent.sinks.hdfs-sink.hdfs_username = HDFSrootUser
cygnusagent.sinks.hdfs-sink.attr_persistence = column
# hdfs-channel configuration
cygnusagent.channels.hdfs-channel.type = memory
cygnusagent.channels.hdfs-channel.capacity = 1000
cygnusagent.channels.hdfs-channel.transactionCapacity = 100
When I update an Entity on Orion, to whom Cygnus is subbed, Cygnus logs the following:
02 Sep 2015 20:09:12,353 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:150) - Starting transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:12,362 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:236) - Received data ({ "subscriptionId" : "55e735c9b89e8535f8ca5ef2", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "Reading", "isPattern" : "false", "id" : "Reading1.1", "attributes" : [ { "name" : "Cost", "type" : "double", "value" : "32" }, { "name" : "Reading_ID", "type" : "integer", "value" : "14" }, { "name" : "Threshold", "type" : "double", "value" : "30" }, { "name" : "email", "type" : "string", "value" : "arthurmvieira#hotmail.com" } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
02 Sep 2015 20:09:12,366 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:258) - Event put in the channel (id=2020008711, ttl=4)
02 Sep 2015 20:09:12,432 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=4, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:12,549 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:12,557 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:12,558 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=3)
02 Sep 2015 20:09:12,558 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:13,560 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=3, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:13,574 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:13,574 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:13,575 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=2)
02 Sep 2015 20:09:13,575 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:15,576 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=2, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:15,590 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:15,599 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:15,600 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=1)
02 Sep 2015 20:09:15,600 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:18,601 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=1, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:18,615 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:18,618 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:18,621 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=0)
02 Sep 2015 20:09:18,621 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:22,622 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=0, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:22,635 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:22,635 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:22,635 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:163) - The event TTL has expired, it is no more re-injected in the channel (id=2020008711, ttl=0)
02 Sep 2015 20:09:22,635 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
So you can see it's trying to use HttpFS, as it logs the response:
HttpFS response: 503 Service unavailable
...on each writing try.
How should I configure the agent to use WebHDFS?
Thank you
I don't know what was happening, but the configuration mentioned is correct and is working now.
After several tries at rebooting the instance, rewriting the config files and other log errors than the one mentioned, it worked.
At some point Cygnus was trying to write to localhost:50075, instead of {localHDFS.ip}:50070, but that was gone after rebooting cygnus.
All instances are at their latest version (important).
Cygnus configuration for WebHDFS is just about setting the port to 50070, nothing else is required.
Regarding the connections you mention to 50075, they are correct as well, since that's the behaviour of WebHDFS: when you want to upload data to HDFS, first the client (in this case, Cygnus) accesses the Namenode through TCP/50070 port, then the namenode responds with a redirection location pointing to the datanode where the data will be effectively uploaded; such a redirection uses the TCP/50075 port, and thus that datanode:50075 must be accessible by the client (Cygnus). That's why we are using HttpFS in the global instance of Cosmos at FIWARE Lab: HttpFS works as a gateway hiding the details of the datanodes, and a single entry point and port (14000) is required.

hadoop too many logs on screen

I start learning hadoop using hive recently. As a beginner I am not so familiar with all the logs showing on the screen. So it's better to see a clean version of all important logs. I learn hive based on Rutberglen's "Programming Hive" book.
Just started, and I got numerous of logs after the first command. While on the book, it's just "OK, Time taken: 3.543 seconds".
Anyone has solution to reduce these logs?
PS:below are the logs I got from command "create table x (a int);"
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Sep 28, 2014 12:10:28 AM org.apache.hadoop.hive.conf.HiveConf <clinit>
WARNING: hive-site.xml not found on CLASSPATH
Logging initialized using configuration in jar:file:/Users/admin/Documents/Study/software /Programming/Hive/hive-0.9.0-bin/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Sep 28, 2014 12:10:28 AM SessionState printInfo
INFO: Logging initialized using configuration in jar:file:/Users/admin/Documents/Study/software/Programming/Hive/hive-0.9.0-bin/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/admin/hive_job_log_admin_201409280010_720612579.txt
Sep 28, 2014 12:10:28 AM hive.ql.exec.HiveHistory printInfo
INFO: Hive history file=/tmp/admin/hive_job_log_admin_201409280010_720612579.txt
hive> CREATE TABLE x (a INT);
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=Driver.run>
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=compile>
Sep 28, 2014 12:10:31 AM hive.ql.parse.ParseDriver parse
INFO: Parsing command: CREATE TABLE x (a INT)
Sep 28, 2014 12:10:31 AM hive.ql.parse.ParseDriver parse
INFO: Parse Completed
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.parse.SemanticAnalyzer analyzeInternal
INFO: Starting Semantic Analysis
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.parse.SemanticAnalyzer analyzeCreateTable
INFO: Creating table x position=13
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver compile
INFO: Semantic Analysis Completed
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver getSchema
INFO: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=compile start=1411877431127 end=1411877431388 duration=261>
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=Driver.execute>
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver execute
INFO: Starting command: CREATE TABLE x (a INT)
Sep 28, 2014 12:10:31 AM hive.ql.exec.DDLTask createTable
INFO: Default to LazySimpleSerDe for table x
Sep 28, 2014 12:10:31 AM hive.log getDDLFromFieldSchema
INFO: DDL: struct x { i32 a}
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.metastore.HiveMetaStore newRawStore
INFO: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.metastore.ObjectStore initialize
INFO: ObjectStore, initialize called
Sep 28, 2014 12:10:32 AM org.apache.hadoop.hive.metastore.ObjectStore getPMF
INFO: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Sep 28, 2014 12:10:32 AM org.apache.hadoop.hive.metastore.ObjectStore setConf
INFO: Initialized ObjectStore
Sep 28, 2014 12:10:33 AM org.apache.hadoop.hive.metastore.HiveMetaStore logInfo
INFO: 0: create_table: db=default tbl=x
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=Driver.execute start=1411877431389 end=1411877434527 duration=3138>
OK
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver printInfo
INFO: OK
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=releaseLocks>
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=releaseLocks start=1411877434529 end=1411877434529 duration=0>
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=Driver.run start=1411877431126 end=1411877434530 duration=3404>
Time taken: 3.407 seconds
Sep 28, 2014 12:10:34 AM CliDriver printInfo
INFO: Time taken: 3.407 seconds
Try starting hive shell as follows :
hive --hiveconf hive.root.logger=WARN,console
If you wanted to make this change persistent, modify the logger property file HIVE_CONF_DIR/hive-log4j.properties file. If you don't have this file in your HIVE_CONF_DIR, create this file by copying the contents of hive-log4j.default in the HIVE_CONF_DIR.

Writing Tweets to the HDFS using Flume doesn't work

I used the Cloudera CDH5 QuickStart VM with VMware and all the services are installed via the Cloudera Manager.
I created a /user/flume/tweets and a flume user and group. I restarted all the services but, doesn't matter how long I wait, no tweets will be written to the HDFS. The /user/flume/tweets/ directory is still EMPTY!
Why?
This is my flume.conf:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = **
TwitterAgent.sources.Twitter.consumerSecret = **
TwitterAgent.sources.Twitter.accessToken = **
TwitterAgent.sources.Twitter.accessTokenSecret = ***
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost.localdomain:804/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
This is what I get on flume log:
[cloudera#localhost ~]$ tail -f /var/log/flume-ng/flume.log
27 May 2014 21:40:28,536 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:HDFS
27 May 2014 21:40:28,536 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:HDFS
27 May 2014 21:40:28,536 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:HDFS
27 May 2014 21:40:28,537 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:HDFS
27 May 2014 21:40:28,537 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:HDFS
27 May 2014 21:40:28,562 WARN [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid:319) - Agent configuration for 'agent' does not contain any channels. Marking it as invalid.
27 May 2014 21:40:28,564 WARN [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:127) - Agent configuration invalid for agent 'agent'. It will be removed.
27 May 2014 21:40:28,564 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) - Post-validation flume configuration contains configuration for agents: [TwitterAgent]
27 May 2014 21:40:28,564 WARN [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:138) - No configuration found for this host:agent
27 May 2014 21:40:28,592 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:138) - Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
How can I fix that?
Thanks in advance.
Did you setup the Flume Configuration using Cloudera Manager? Please follow this link http://javet.org/?p=279 for implementing twitter firehose in CDH5.

Resources