Hive does not start: Error creating path /hive/cluster/delegation/METASTORE/keys - hadoop

I ran into a problem on a kerberized cluster where hive would not start.
Symptoms:
Services start succesfully (and did not stop)
In Ambari an alert appeared which mentioned that the Hive metastore failed
Starting hive on the command line did not succeed (it just kept hanging)
Via beeline I was able to see metadata, but not get actual data
I found the following error in /var/log/hive/hivemetastore.log
2016-08-29 10:12:49,047 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5934)) - Metastore Thrift Server threw an exception...
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: Error creating path /hive/cluster/delegation/METASTORE/keys
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.ensurePath(ZooKeeperTokenStore.java:166)
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.initClientAndPaths(ZooKeeperTokenStore.java:236)
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.init(ZooKeeperTokenStore.java:469)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server.startDelegationTokenSecretManager(HadoopThriftAuthBridge.java:444)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6015)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5930)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hive/cluster/delegation/METASTORE/keys
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:691)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:675)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:672)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:423)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:257)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:205)
at org.apache.hadoop.hive.thrift.ZooKeeperTokenStore.ensurePath(ZooKeeperTokenStore.java:160)
... 11 more

Note that I actually tried several things, so I am not sure whether this is the full solution, but here is the final step, which I believe to be the critical one:
After a long search I indirectly found this site: https://community.hortonworks.com/articles/49040/hive-metastore-crashes-on-nullpointerexception-wit.html
Here is the relevant fragment that helped me resolve the issue:
This is a known issue being tracked in the following Hortonworks bug:
https://hortonworks.jira.com/browse/BUG-42602
WORKAROUND:
Set the hive.cluster.delegation.token.store.class to the following:
hive.cluster.delegation.token.store.class=org.apache.hadoop.hive.thrift.DBTokenStore
If using Ambari, this setting can be changed by clicking on the Hive
service on the Ambari Dashboard, navigating to the "Configs" tab, and
modifying the parameter in the "Advanced Hive-site" section of the
Hive configs. Save the changes and restart Hive from the Ambari User
Interface when prompted.
If not using ambari, this setting can be located in the
/etc/hive/conf/hive-site.xml file. Make sure this change is made on
all applicable nodes on the cluster. Once the changes are made, the
Hive services must be restarted.

Related

Apache Drill (Embedded): Failure setting up ZK for client

I am new to Apache Drill, and currently I am following the instructions from this link here to learn about it:
Drill in 10 minutes
However, after checking that I had the pre-requisites, I hit an error when I execute the steps in 'Start Drill on Windows' section.
Open Command Prompt.
Open the apache-drill- folder.
Go to the bin directory. For example: cd bin
Type the following command on the command line: sqlline.bat -u "jdbc:drill:zk=local"
Error: Failure in connecting to Drill:
org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for
client. (state= ,code=0) java.sql.SQLException: Failure in connecting
to Drill: org.apache.drill.exec.rpc.RpcException: Failure setting up
ZK for client.
at org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:167)
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:72)
at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
at org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:143)
at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213)
at sqlline.Commands.connect(Commands.java:1083)
at sqlline.Commands.connect(Commands.java:1015)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
at sqlline.SqlLine.dispatch(SqlLine.java:742)
at sqlline.SqlLine.initArgs(SqlLine.java:528)
at sqlline.SqlLine.begin(SqlLine.java:596)
at sqlline.SqlLine.start(SqlLine.java:375)
at sqlline.SqlLine.main(SqlLine.java:268)
Caused by: org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for
client.
at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:329)
at org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:158)
... 18 more
Caused by: java.io.IOException: Failure to connect to the zookeeper cluster service within the allotted time of 10000 mi
lliseconds.
at org.apache.drill.exec.coord.zk.ZKClusterCoordinator.start(ZKClusterCoordinator.java:123)
at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:327)
... 19 more
local (The system cannot find the file specified)
apache drill 1.11.0
Where is the 'local' file, and where can I get it?
Try drill bit in the command instead of zk because zookeeper has nothing to do if you are using the drill in embedded mode
"jdbc:drill:drillbit=local"
I had this issue, but was using Powershell, instead of command prompt.
Try running cmd /r 'sqlline.bat -u "jdbc:drill:zk=local"'

org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CATALOG exception with phoenix 4.5.2

I've been trying to integrate Phoenix 4.5.2 to my existing hadoop cluster.
Hadoop Version : 2.7.1
HBase Version : 1.1.2
When I try to create table from my phoenix client I'm getting following exception. But I'm able to create table successfully from HBase console.
org.apache.phoenix.exception.PhoenixIOException: SYSTEM.CATALOG
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1051)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1014)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1259)
at org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:113)
at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1937)
at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:751)
at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:186)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:320)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:312)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:310)
at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1422)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1927)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1896)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1896)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
at org.apache.tools.ant.taskdefs.JDBCTask.getConnection(JDBCTask.java:370)
at org.apache.tools.ant.taskdefs.SQLExec.getConnection(SQLExec.java:940)
at org.apache.tools.ant.taskdefs.SQLExec.execute(SQLExec.java:612)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1393)
at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
at org.apache.tools.ant.Project.executeTargets(Project.java:1248)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:440)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1393)
at org.apache.tools.ant.Project.executeTarget(Project.java:1364)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1248)
at org.apache.tools.ant.Main.runBuild(Main.java:851)
at org.apache.tools.ant.Main.startAnt(Main.java:235)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
Caused by: org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CATALOG
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1257)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1139)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1096)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:931)
at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:496)
at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:736)
at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:706)
at org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1760)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1715)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1695)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1034)
... 49 more
Please suggest what's going wrong here and Phoenix 4.5.2 is compatible with HBase 1.1.2 or not.
Clearing zookeeper will solve the problem.
bin/hbase clean --cleanZk
Note:- you need to shutdown your master and regionserver before using above command.
Here's what you need to do:
Login into hbase zookeeper cli: hbase zkcli
check to see if the SYSTEM.CATALOG, SYSTEM.SEQUENCE, SYSTEM.STATS, and SYSTEM.FUNCTION exists by issuing: ls /hbase/table
If so... then this is the solution:)
Remove all the aforementioned directories: i.e. from the same cli issue rmr /hbase/table/SYSTEM.CATALOG. Do this for all four of the aforementioned files
Retry saline... and smile
I have an embedded HBase, I had this error after deleting manually HBase data, so I have had to clean Zookeeper and the rest of HBase with:
bin/hbase clean --cleanAll
That solved the problem for me.
According to your given logs,
Pheonix was trying to create a table called SYSTEM.CATALOG in Hbase. But, due to some problem in Hbase, it wasn't able to create it.
The table "SYSTEM.CATALOG" is created the 1st time the connection is made from Pheonix to Hbase.
Therefore, I would recommend checking your Pheonix Configuration for connectivity issues like wrong mapping of IP-Address , forgetting to create a Password less SSH etc.
The above suggestions of cleaning hbase data with only zookeeper running did not help me. What fiannl helped me overcome this error and allow creation of the SYSTEM.CATALOG table and the other associated tables was by copying the hbase-site.xml from the hbase installation into the bin folder of the apache-phoenix installation.
In my case the hbase conf was in /usr/local/hbase/conf which I copied into /usr/local/apache-phoenix/bin. I just renamed the original hbase-site.xml that was already present in the bin folder to something else, just as a back up.
This solved the problem of connectivity from sqlline.py as well as SQuirreL client.
This was apart from ensuring the phoenix client libraries were in /squirrel/lib folder for squirrel to work.
Check your hdfs://.../hbase/data/default/ is exist SYSTEM.CATALOG ? enter image description here
if there is nothing, you must try to use bin/hbase clean --cleanZk before you use the command, you must stop hbase Master and regionServers,but still keep ZK alive.

Job via Oozie HDP 2.1 not creating job.splitmetainfo

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

Accumulo on Cloudera CDH4 - Access denied when starting components

I have a small cluster up and running with Cloudera CDH4 Hadoop and Map Reduce v1. Namenode/Secondary Namenode/Jobtracker all on different machines. My three servers are also acting as Zookeeper servers.
I'm trying to install Accumulo 1.4.4 on top of this cluster. I get the same behavior with Accumulo 1.5.0. I am able to bin/accumulo init and initialize Accumulo, but starting the individual components fail. I'm trying to make my Namenode the Accumulo master.
bin/start-server.sh localhost monitor spits out a very encouraging Starting monitor on localhost, but nothing gets started. If I examine logs/monitor_localhost.err I find a stacktrace:
-bash-4.1$ cat logs/monitor_localhost.err
Thread "monitor" died null
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.accumulo.start.Main$1.run(Main.java:91)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2464)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2456)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2323)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
at org.apache.accumulo.core.file.FileUtil.getFileSystem(FileUtil.java:554)
at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceIDFromHdfs(ZooKeeperInstance.java:258)
at org.apache.accumulo.server.conf.ZooConfiguration.getInstance(ZooConfiguration.java:65)
at org.apache.accumulo.server.conf.ServerConfiguration.getZooConfiguration(ServerConfiguration.java:49)
at org.apache.accumulo.server.conf.ServerConfiguration.getSystemConfiguration(ServerConfiguration.java:58)
at org.apache.accumulo.server.monitor.Monitor.run(Monitor.java:440)
at org.apache.accumulo.server.monitor.Monitor.main(Monitor.java:433)
... 6 more
Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessDeclaredMembers)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:399)
at java.security.AccessController.checkPermission(AccessController.java:557)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.Class.checkMemberAccess(Class.java:2237)
at java.lang.Class.getDeclaredFields(Class.java:1805)
at org.apache.hadoop.util.ReflectionUtils.getDeclaredFieldsIncludingInherited(ReflectionUtils.java:315)
at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder.initRegistry(MetricsSourceBuilder.java:92)
at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder.<init>(MetricsSourceBuilder.java:56)
at org.apache.hadoop.metrics2.lib.MetricsAnnotations.newSourceBuilder(MetricsAnnotations.java:42)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:212)
at org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:54)
at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:97)
at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:190)
... 18 more
The AccessControlException: access denied looks like the important line to me, but I can't imagine what access is being restricted. I'm running everything as the hdfs user, which owns the entire /opt/accumulo-1.4.4/ directory where accumulo is un-tarred. The /accumulo directory in HDFS is also owned by the hdfs user. SELinux is permissive. Searching online has proved fruitless, has anyone dealt with this error before?
Much thanks.
I started browsing the Apache accumulo-users mailing list archive and came across the solution.
http://mail-archives.apache.org/mod_mbox/accumulo-user/201312.mbox/%3CB9CB2B2BF27F0F46B8ECF781831E00E710970A9F%400015-its-exmb10.us.saic.com%3E
I was copying the accumulo.policy.example to accumulo.policy because I thought I needed it in my configuration. Once I deleted the accumulo.policy file my issues went away and I've been able to stand up Accumulo (1.5.0 at least, 1.4.4 still has some issues for me)

How to set User/Group permission with Hadoop/Kerberos Setup?

I am trying to setup Hadoop with Kerberos
I am following the CDH3 Security Guide.
Things went pretty well so far (HFDS works ok etc), but I am getting the following error when I try to submit the Job.
I run HDFS server as user HDFS and Hadoop as user called mapred. I Submit the job using user called bob, who is in mapred group.
Following are values I have for taskcontroller.cfg
mapred.local.dir=/opt/hadoop-work/local/
hadoop.log.dir=/opt/hadoop-1.0.3/logs
mapreduce.tasktracker.group=mapred
min.user.id=1000
Error I am getting is
java.io.IOException: Job initialization failed (24) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
Can't get group information for mapred - Success.
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:192)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1228)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:185)
... 8 more
Error always comes with value given to "mapreduce.tasktracker.group=mapred" in the taskcontroller.cfg.
I have been debugging and looking in, and I think the problem is I have setup the permission among different users and groups wrong.
Any help is greatly appreciated.

Resources