Ambari is not able to start the Namenode - hadoop

I have a problem with my Ambari server, it is not able to start the Namenode. I'm using HDP 2.0.6, Ambari 1.4.1. It is worth to mention this is happening once I've enabled the Kerberos security, I mean, when it is disabled there is no error.
The error is:
2015-02-04 16:01:48,680 ERROR namenode.EditLogInputStream (EditLogFileInputStream.java:nextOpImpl(173)) - caught exception initializing http://int-iot-hadoop-fe-02.novalocal:8480/getJournal?jid=integration&segmentTxId=1&storageInfo=-47%3A1493795199%3A0%3ACID-a5152e6c-64ab-4978-9f1c-e4613a09454d
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Fetch of http://int-iot-hadoop-fe-02.novalocal:8480/getJournal?jid=integration&segmentTxId=1&storageInfo=-47%3A1493795199%3A0%3ACID-a5152e6c-64ab-4978-9f1c-e4613a09454d failed with status code 500
Response message:
getedit failed. java.lang.IllegalArgumentException: Does not contain a valid host:port authority: null at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:211) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:210) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.isValidRequestor(GetJournalEditServlet.java:93) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.checkRequestorOrSendError(GetJournalEditServlet.java:128) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.doGet(GetJournalEditServlet.java:174) at
...
It seems the problem is about retrieving the Secondary Namenode http address, which in fact is set to null in hdfs-site-xml (I do not know why):
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>null</value>
</property>
I've tried to set that parameter's value to the appropriate one, but nothing works:
By manually editing the hdfs-site.xml files and running hdfs namenode, but nothing occurs.
By manually editing the hdfs-site.xml files and starting the whole HDFS from Ambari, but nothing occurs. Even, the dfs.namenode.secondary.http-address parameter is set to null again!
Through Ambari UI > HDFS services > config tab > hdfs-site.xml list > add new property... the problem is that dfs.namenode.secondary.http-address is not listed by the UI does not allow me to add it because it says... it is already existing! :)
I've tried to add the value in /usr/lib/ambari-server/web/data/configuration/hdfs-site.json, thinking this could be the place where Ambari stores the values that are show in the UI, but no success.
I've also noted that a site-XXXX.pp file is created under /var/lib/ambari-agent/data/ each time the HDFS service is restarted from the Amabri UI, and I've found each one of these files has:
[root#int-iot-hadoop-fe-02 ~]# cat /var/lib/ambari-agent/data/site-3228.pp | grep dfs.namenode.secondary.http-address
"dfs.namenode.secondary.http-address" => 'null',
I think other candidate file for configuring this property could be /var/lib/ambari-agent/puppet/modules/hdp-hadoop/manifests/params.pp. There is a ### hdfs-site section, but I'm not able to figure out which is the name of the puppet variable associated to the dfs.namenode.secondary.http-address property.
Any ideas? Thanks!

I have a workaround to make it work under ambari environment:
In the ambari node modify:
/usr/lib/ambari-server/web/javascripts/app.js
/usr/lib/ambari-server/web/javascripts/app.js.map
changing from:
{
"name": "dfs.namenode.secondary.http-address",
"templateName": ["snamenode_host"],
"foreignKey": null,
"value": "<templateName[0]>:50090",
"filename": "hdfs-site.xml"
},
to the specific value for your secondary namenode and not the template one:
{
"name": "dfs.namenode.secondary.http-address",
"templateName": ["snamenode_host"],
"foreignKey": null,
"value": "my.secondary.namenode.domain:50090",
"filename": "hdfs-site.xml"
},
rename /usr/lib/ambari-server/web/javascripts/app.js.gz to /usr/lib/ambari-server/web/javascripts/app.js.gz.old
gzip the app.js so a new app.js.gz is generated in the same directory
Refresh your ambari web and force an HDFS restart, this will regenerate the appropiate /etc/hadoop/conf/hdfs-site.xml, if it does not, you coud add in the ambari web a new property and then delete it in order to force the changes when you press the save button.
Hope this helps.
--mLG

Partially fixed: it is necessary to stop all the HDFS services (Journal Node, Namenodes and Datanodes) before editing the hdfs-site.xml file. Then, of course, Ambari "start button" cannot be used because the configuration would be smashed... thus it is necessary to re-start all the services manually. This is not the definitive solution since it is desirable this changes of configuration could be done from Ambari UI...

Related

Ambari User & Group Management for Custom Ambari Services

I have been working with Custom Ambari Services for quite some time. I have been able to install several different custom components. I have created several management packs and consider myself very experienced in making third party services work in Ambari.
Whenever I install a custom service I get a user KeyError, for example Elasticsearch:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
BeforeAnyHook().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 50, in setup_users
groups = params.user_to_groups_dict[user],
KeyError: u'elasticsearch'
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-15.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-15.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
A known work around is to execute a python command to turn off user/group management:
python /var/lib/ambari-server/resources/scripts/configs.py -u admin -p admin -n [CLUSTER_NAME] -l [CLUSTER_FQDN] -t 8080 -a set -c cluster-env -k ignore_groupsusers_create -v true
However, this leaves the cluster in an undesirable state if you want to install native services again. If I execute the python command to turn user/group management back on, the next native service install will again fail on the third party user key object error.
Is there a database table that contains the list or key value object of users and groups that ambari manages? Satisfying the original error seems like the only turnkey solution.
I have tried to locate the key value object myself, I have also tried creating the users groups, I have even tried modifying the agent/server code executing the install. Next I will try more but I thought maybe this would be a good first post for SO.
Stuck with the same error for a few hours, here is the result of the investigation.
First of all, we need to know that the Ambari has one main group for all services in stack.
Secondly, the creation of the user is quite hidden with one look you will never guess when and where the creation will be and from where the parameters will come.
And for the last there is quiestion, how we setup the params.user_to_groups_dict[user]?
The 'main group' is set in <stack_name>/<stack_version>/configuration/cluster-env.xml, for me it was HDP/3.0/configuration/cluster-env.xml:
<property>
<name>user_group</name>
<display-name>Hadoop Group</display-name>
<value>hadoop</value>
<property-type>GROUP</property-type>
<description>Hadoop user group.</description>
<value-attributes>
<type>user</type>
<overridable>false</overridable>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
That parameter will be used everywhere in services to claim the group, for example zookeeper has the env.xml such as:
<property>
<name>zk_user</name>
<display-name>ZooKeeper User</display-name>
<value>sdp-zookeeper</value>
<property-type>USER</property-type>
<description>ZooKeeper User.</description>
<value-attributes>
<type>user</type>
<overridable>false</overridable>
<user-groups>
<property>
<type>cluster-env</type>
<name>user_group</name>
</property>
</user-groups>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
And there is a magic in value-attributes: user-groups with one property that links to the cluster-env to user_group parameter. This is the connection that we are looking for.
The answer is setup the user parameter like zookeeper user.
The Wizard searches the stack and find the right users/groups to manage of the services you have chosen with wizard.
The map that contains the params.user_to_groups_dict will be created in runtime the cluster wizard and avalialbe at /var/lib/ambari-agent/data/command-xy.json:
"clusterLevelParams": {
"stack_version": "3.0",
"not_managed_hdfs_path_list": "[\"/tmp\"]",
"hooks_folder": "stack-hooks",
"stack_name": "HDP",
"group_list": "[\"sdp-hadoop\",\"users\"]",
"user_groups": "{\"httpfs\":[\"hadoop\"],\"ambari-qa\":[\"hadoop\",\"users\"],\"hdfs\":[\"hadoop\"],\"zookeeper\":[\"hadoop\"]}",
"cluster_name": "test",
"dfs_type": "HDFS",
"user_list": "[\"httpfs\",\"hdfs\",\"ambari-qa\",\"sdp-zookeeper\"]"
},

Hive Browser Throwing Error

I am trying to put some basic query in hive editor in hue browser , but it is returning the following error whereas my Hivecli works fine and able to execute queries. Could someone help me?
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=1,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='r\t\x80\xac\x1a\xa0K\xf8\xa4\xa0\x85?\x03!\x88\xa9',
guid='\x852\x0c\x87b\x7fJ\xe2\x9f\xee\x00\xc9\xeeo\x06\xbc')),
orientation=4, maxRows=-1):
TFetchResultsResp(status=TStatus(errorCode=0, errorMessage="Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]",
sqlState=None,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]:24:23",
'org.apache.hive.service.cli.operation.OperationManager:getOperationLogRowSet:OperationManager.java:229',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:687',
'sun.reflect.GeneratedMethodAccessor14:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:606',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:415',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy19:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745'], statusCode=3), results=None,
hasMoreRows=None)
This error could be either due to HiveServer2 not running or Hue does not have access to hive_conf_dir.
Check whether the HiveServer2 has been started and is running. It uses the port 10000 by default.
netstat -ntpl | grep 10000
If it is not running, start the HiveServer2
$HIVE_HOME/bin/hiveserver2
Also check the Hue configuration file hue.ini. The hive_conf_dir property must be set under [beeswax] section. If not set, add this property under [beeswax]
hive_conf_dir=$HIVE_HOME/conf
Restart supervisor after making these changes.

PXF JSON plugin error

Using HDP 2.4 and HAWQ 2.0
Wanted to read json data kept in HDFS path into HAWQ external table?
Followed below steps to add new json plugin into PXF and read data.
Download plugin "json-pxf-ext-3.0.1.0-1.jar" from
https://bintray.com/big-data/maven/pxf-plugins/view#
Copy the plugin into path /usr/lib/pxf.
Create External table
CREATE EXTERNAL TABLE ext_json_mytestfile ( created_at TEXT,
id_str TEXT, text TEXT, source TEXT, "user.id" INTEGER,
"user.location" TEXT,
"coordinates.type" TEXT,
"coordinates.coordinates[0]" DOUBLE PRECISION,
"coordinates.coordinates[1]" DOUBLE PRECISION)
LOCATION ('pxf://localhost:51200/tmp/hawq_test.json'
'?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.plugins.json.JsonAccessor'
'&RESOLVER=org.apache.hawq.pxf.plugins.json.JsonResolver'
'&ANALYZER=org.apache.hawq.pxf.plugins.hdfs.HdfsAnalyzer')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import')
LOG ERRORS INTO err_json_mytestfile SEGMENT REJECT LIMIT 10 ROWS;
When execute the above DDL table create successfully. After that trying to execute select query
select * from ext_json_mytestfile;
But getting error: -
ERROR: remote component error (500) from 'localhost:51200': type Exception report message java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor (libchurl.c:878) (seg4 sandbox.hortonworks.com:40000 pid=117710) (dispatcher.c:1801)
DETAIL: External table ext_json_mytestfile
Any help would be much appreciated.
It seems that referenced jar file has old package name as com.pivotal.*. The JSON PXF extension is still incubating, the jar pxf-json-3.0.0.jar is built for JDK 1.7 as Single node HDB VM is using JDK 1.7 and uploaded to dropbox.
https://www.dropbox.com/s/9ljnv7jiin866mp/pxf-json-3.0.0.jar?dl=0
Echo'ing the details of the above comments so that the steps are performed correctly to ensure the PXF service recognize the jar file. The below steps assume that Hawq/HDB is managed by Ambari. If not, the manual steps as mentioned by the previous updates should work.
Copy the pxf-json-3.0.0.jar to /usr/lib/pxf/ of all your HAWQ nodes (master and segments).
In Ambari managed PXF, add the below line by going through Ambari Admin -> PXF -> Advanced pxf-public-classpath
/usr/lib/pxf/pxf-json-3.0.0.jar
In Ambari managed PXF, add this snippet to your pxf profile xml at the end by going through Ambari Admin -> PXF -> Advanced pxf-profiles
<profile>
<name>Json</name>
<description>
JSON Accessor</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.json.JsonAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.json.JsonResolver</resolver>
</plugins>
</profile>
Restart PXF service via Ambari
Did you add the jar file location to /etc//conf/pxf-public.classpath?
Did you try:
copying PXF JSON jar file to /usr/lib/pxf
updating /etc/pxf/conf/pxf-profiles.xml to include the Json plug-in profile if not already present
(per comment above) updating the /etc/pxf/conf/pxf-public.classpath
restarting the PXF service either via Ambari or command line (sudo service pxf-service restart)
likely didn't add json jar in classpath.
Create External Table DDL will always succeed as it was just a definition.
Only when you run queries, HAWQ will check the run time jar dependencies.
Yes, the jar json-pxf-ext-3.0.1.0-1.jar" from https://bintray.com/big-data/maven/pxf-plugins/view# has old package name as com.pivotal.*. The previous update has edited with details to download the correct jar from dropbox

Spring-xd strange too many open files error

I upgraded from spring-xd 1.2.1 to 1.3.0, and have both under /opt on my system. After starting xd in single node (but configured to use Zookeeper), I tried to create another stream (e.g. "time | log"), and spring-xd throws the following exception:
java.io.FileNotFoundException: /opt/spring-xd-1.2.1.RELEASE/xd/config/modules/modules.yml (Too many open files)
I changed ulimit -n 60000, but it didn't solve the problem. The strange thing is why it still points to spring-xd-1.2.1.RELEASE? I have started both xd-singlenode and xd-shell under /opt/spring-xd-1.3.1.RELEASE
EDIT: add xd-singlenode running process output just to show it's pointing to 1.3.1:
/usr/java/default/bin/java -Dspring.application.name=admin
-Dlogging.config=file:/opt/spring-xd-1.3.0.RELEASE/xd/config//
/xd-singlenode-logback.groovy -Dxd.home=/opt/spring-xd-1.3.0.RELEASE/xd
-Dspring.config.location=file:/opt/spring-xd-1.3.0.RELEASE/xd/config//
-Dxd.config.home=file:/opt
/spring-xd-1.3.0.RELEASE/xd/config//
-Dspring.config.name=servers,application
-Dxd.module.config.location=file:/opt/spring-xd-1.3.0.RELEASE/xd/config//modules/
-Dxd.module.config.name=modules -classpath
/opt/spring-xd-1.3.0.RELEASE/xd/modules/processor/scripts:/opt/spring-xd
-1.3.0.RELEASE/xd/config:/opt/spring-xd-1.3.0.RELEASE/xd/lib/activation-
...
have you updated your environment variables? specifically XD_CONFIG_LOCATION based on the error shown above.

Oozie Workflow and Coordinator

I have 2 properties files one for workflow and one for coordinator.
./job.properties and ./coordinator/job.properties
2 files are identical except in coordinator there are a few additional variables set. below are those variables
coordstartTime=2013-04-08T18:40Z
coordendTime=2020-04-08T18:40Z
coordTimeZone=GMT
oozie.coord.application.path=${workflowRoot}/coordinator
wfPath=${workflowRoot}/workflow-master.xml
Everything is fine when I run the workflow but I am getting error when I run coordinator
error :
Error: E0301 : E0301: Invalid resource [filename]
that filename exists and when I do hadoop fs -ls [filename] it is listed.
What am I doing wrong here.
thanks
Problem was both
oozie.wf.application.path
and
oozie.coord.application.path
existed in the coordinator properties file.
I removed oozie.wf.application.path and the coordinator worked.
thanks

Resources