Cassandra hadoop word count example not working - hadoop

I am trying to run the hadoop examples provided in the cassandra examples/hadoop_cql3_word_count. I am using the cassandra-2.0 branch. I have a single node cassandra running locally. I was able to run the ./bin/word_count_setup step successfully but when I run the ./bin/word_count step I am getting the following error :
java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:661)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:297)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:163)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: InvalidRequestException(why:consistency level LOCAL_ONE not compatible with replication strategy (org.apache.cassandra.locator.SimpleStrategy))
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:52627)
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:52604)
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:52519)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1785)
at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1770)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:631)
... 6 more
----------
Has anyone seen this before ? Am I missing something ?

Related

Hive queries failing on Tez but succeeding on Map-Reduce when connecting from Beeline

I am running into a weird error. I am running a simple select * query with a where clause, following is the summary of query execution status
Connecting to Hive from EMR (Tez engine) - succeeding
Connecting to Hive from EMR (MR engine) - succeeding
Connecting to Hive from Beeline (Tez engine) - failing
Connecting to Hive from Beeline (MR engine) - succeeding
I need to solve for point 3.
This is the error trace I am getting and unable to find what the root cause of this failure is and what this error log is trying to convey.
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)' SQL<select `ID`, `ISDELETED`, `ACCOUNTID`, `CREATEDBYID`, `CREATEDDATE`, `FIELD`, `OLDVALUE`, `NEWVALUE`, `AUDIT_UPD_TS`, `SRC_OP_TYP`, `GG_INGEST_TS` from `t4i_ent_sfdc_b2b_psa`.`sf_accounthistory` x WHERE SRC_OP_TYP='NA'>```
I was able to solve this. The problem was I was connecting my application to Hive via JDBC without specifying a user. For queries where simple streaming of data was required, it was succeeding, but where Map-Reduce jobs were being triggered to write to HDFS, the writing operation was failing with the error
Failed to execute tez graph.
org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
To resolve this, I added the user=hadoop; in the JDBC URL and the queries run fine now.
Try invoking beeline (Tez engine) as below and then run your query :
beeline -u "jdbc:hive2://<host>:<port>,/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-batch?tez.queue.name=<yarn-queue-name>"
If above doesn't work then try to fix any issue in SQL. I see 'x' before Where clause in your sql query, that may be the issue. Try to remove that and run your query.
`sf_accounthistory` x WHERE SRC_OP_TYP='NA'
Hope this is helpful

Debugging Spark standalone cluster with idea

I am trying to debug a Spark Application on a local cluster using a master and a worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager with start-master.sh and it works.But I want to how Spark Application works in the spark cluster, so I want to start the cluster in debug mode. I read the start-master.sh codeļ¼Œ mock the args and start org.apache.spark.deploy.master.Master main method.Unfortunately it gets a NoClassDefFoundError,I can't open the webui. I want to know where the problem is.
The Error is :
Exception in thread "dispatcher-event-loop-1" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/thread/ThreadPool
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:81)
at org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:48)
at org.apache.spark.deploy.master.ui.MasterWebUI.<init>(MasterWebUI.scala:43)
at org.apache.spark.deploy.master.Master.onStart(Master.scala:131)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:122)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:216)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.util.thread.ThreadPool
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
my debug configurations is:
enter image description here
Thanks!
I would suggest not to even use a spark standalone cluster for debugging.
You can run spark locally in the your IDE with breakpoints.
Spark provides you option to run locally pointing to local filesystem as HDFS.
Please follow the following link to know more about how to write test cases for local mode in spark
http://bytepadding.com/big-data/spark/word-count-in-spark/

Hive does not start: Error creating path /hive/cluster/delegation/METASTORE/keys

I ran into a problem on a kerberized cluster where hive would not start.
Symptoms:
Services start succesfully (and did not stop)
In Ambari an alert appeared which mentioned that the Hive metastore failed
Starting hive on the command line did not succeed (it just kept hanging)
Via beeline I was able to see metadata, but not get actual data
I found the following error in /var/log/hive/hivemetastore.log
2016-08-29 10:12:49,047 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5934)) - Metastore Thrift Server threw an exception...
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error creating path /hive/cluster/delegation/METASTORE/keys
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.ensurePath(ZooKeeperTokenStore.java:166)
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.initClientAndPaths(ZooKeeperTokenStore.java:236)
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.init(ZooKeeperTokenStore.java:469)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server.startDelegationTokenSecretManager(HadoopThriftAuthBridge.java:444)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6015)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5930)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hive/cluster/delegation/METASTORE/keys
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:691)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:675)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:672)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:423)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:257)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:205)
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.ensurePath(ZooKeeperTokenStore.java:160)
... 11 more
Note that I actually tried several things, so I am not sure whether this is the full solution, but here is the final step, which I believe to be the critical one:
After a long search I indirectly found this site: https://community.hortonworks.com/articles/49040/hive-metastore-crashes-on-nullpointerexception-wit.html
Here is the relevant fragment that helped me resolve the issue:
This is a known issue being tracked in the following Hortonworks bug:
https://hortonworks.jira.com/browse/BUG-42602
WORKAROUND:
Set the hive.cluster.delegation.token.store.class to the following:
hive.cluster.delegation.token.store.class=org.apache.hadoop.hive.thrift.DBTokenStore
If using Ambari, this setting can be changed by clicking on the Hive
service on the Ambari Dashboard, navigating to the "Configs" tab, and
modifying the parameter in the "Advanced Hive-site" section of the
Hive configs. Save the changes and restart Hive from the Ambari User
Interface when prompted.
If not using ambari, this setting can be located in the
/etc/hive/conf/hive-site.xml file. Make sure this change is made on
all applicable nodes on the cluster. Once the changes are made, the
Hive services must be restarted.

How to set User/Group permission with Hadoop/Kerberos Setup?

I am trying to setup Hadoop with Kerberos
I am following the CDH3 Security Guide.
Things went pretty well so far (HFDS works ok etc), but I am getting the following error when I try to submit the Job.
I run HDFS server as user HDFS and Hadoop as user called mapred. I Submit the job using user called bob, who is in mapred group.
Following are values I have for taskcontroller.cfg
mapred.local.dir=/opt/hadoop-work/local/
hadoop.log.dir=/opt/hadoop-1.0.3/logs
mapreduce.tasktracker.group=mapred
min.user.id=1000
Error I am getting is
java.io.IOException: Job initialization failed (24) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
Can't get group information for mapred - Success.
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:192)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1228)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:185)
... 8 more
Error always comes with value given to "mapreduce.tasktracker.group=mapred" in the taskcontroller.cfg.
I have been debugging and looking in, and I think the problem is I have setup the permission among different users and groups wrong.
Any help is greatly appreciated.

Hbase Hadoop cluster.. java.io.IOException: java.lang.NoSuchMethodExceptio

I am trying to set up a hbase cluster which is running on top of a hadoop cluster.
Both clusters are up and running but
when I try to create a table in Hbase client..am seeing the following error in the logs!!
compute-0-11 : is the name node for the hadoop cluster.
2012-03-18 01:18:54,696 WARN org.apache.hadoop.hbase.util.FSUtils:
Unable to create version file at hdfs://compute-0-11:9000/hbase, retrying:
java.io.IOException: java.lang.NoSuchMethodException:
org.apache.hadoop.hdfs.protocol.ClientProtocol.create(java.lang.String, org.apache.hadoop.fs.permission.FsPermission, java.lang.String, boolean, boolean, short, long)
at java.lang.Class.getMethod(Class.java:1605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
Please help..
Usually it happens when there is a mismatch between versions of software in your cluster and your job / client.
Errors like class was expected / interface found usually happens when different versions of hadoop APIs are used (napred vs napreduce).

Resources