Ambari User & Group Management for Custom Ambari Services - hortonworks-data-platform

I have been working with Custom Ambari Services for quite some time. I have been able to install several different custom components. I have created several management packs and consider myself very experienced in making third party services work in Ambari.
Whenever I install a custom service I get a user KeyError, for example Elasticsearch:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
BeforeAnyHook().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 50, in setup_users
groups = params.user_to_groups_dict[user],
KeyError: u'elasticsearch'
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-15.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-15.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
A known work around is to execute a python command to turn off user/group management:
python /var/lib/ambari-server/resources/scripts/configs.py -u admin -p admin -n [CLUSTER_NAME] -l [CLUSTER_FQDN] -t 8080 -a set -c cluster-env -k ignore_groupsusers_create -v true
However, this leaves the cluster in an undesirable state if you want to install native services again. If I execute the python command to turn user/group management back on, the next native service install will again fail on the third party user key object error.
Is there a database table that contains the list or key value object of users and groups that ambari manages? Satisfying the original error seems like the only turnkey solution.
I have tried to locate the key value object myself, I have also tried creating the users groups, I have even tried modifying the agent/server code executing the install. Next I will try more but I thought maybe this would be a good first post for SO.

Stuck with the same error for a few hours, here is the result of the investigation.
First of all, we need to know that the Ambari has one main group for all services in stack.
Secondly, the creation of the user is quite hidden with one look you will never guess when and where the creation will be and from where the parameters will come.
And for the last there is quiestion, how we setup the params.user_to_groups_dict[user]?
The 'main group' is set in <stack_name>/<stack_version>/configuration/cluster-env.xml, for me it was HDP/3.0/configuration/cluster-env.xml:
<property>
<name>user_group</name>
<display-name>Hadoop Group</display-name>
<value>hadoop</value>
<property-type>GROUP</property-type>
<description>Hadoop user group.</description>
<value-attributes>
<type>user</type>
<overridable>false</overridable>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
That parameter will be used everywhere in services to claim the group, for example zookeeper has the env.xml such as:
<property>
<name>zk_user</name>
<display-name>ZooKeeper User</display-name>
<value>sdp-zookeeper</value>
<property-type>USER</property-type>
<description>ZooKeeper User.</description>
<value-attributes>
<type>user</type>
<overridable>false</overridable>
<user-groups>
<property>
<type>cluster-env</type>
<name>user_group</name>
</property>
</user-groups>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
And there is a magic in value-attributes: user-groups with one property that links to the cluster-env to user_group parameter. This is the connection that we are looking for.
The answer is setup the user parameter like zookeeper user.
The Wizard searches the stack and find the right users/groups to manage of the services you have chosen with wizard.
The map that contains the params.user_to_groups_dict will be created in runtime the cluster wizard and avalialbe at /var/lib/ambari-agent/data/command-xy.json:
"clusterLevelParams": {
"stack_version": "3.0",
"not_managed_hdfs_path_list": "[\"/tmp\"]",
"hooks_folder": "stack-hooks",
"stack_name": "HDP",
"group_list": "[\"sdp-hadoop\",\"users\"]",
"user_groups": "{\"httpfs\":[\"hadoop\"],\"ambari-qa\":[\"hadoop\",\"users\"],\"hdfs\":[\"hadoop\"],\"zookeeper\":[\"hadoop\"]}",
"cluster_name": "test",
"dfs_type": "HDFS",
"user_list": "[\"httpfs\",\"hdfs\",\"ambari-qa\",\"sdp-zookeeper\"]"
},

Related

How to read TimeoutStartSec value in systemD configuration from Application via Dbus interfaces

In my service configuration TimeoutStartSec == 100s.
According to man page.. my Application need to notify to systemD sd_notify(READY=1) during <100s. If not service is put into failed state.
https://www.freedesktop.org/software/systemd/man/systemd.service.html
But in case of i want to do something ( eg just print out some log said : startup is not done in time ) . before my service is actually set to failed state .
Is there any change to do that...
My idea is create a timer which have same value with TimeoutStartSec == xx s
then i can manage to do something before timer expired.
But the question is TimeoutStartSec == xx is dynamicaly configured by user - in my project..
So i would expect some Dbus interface which will offer to read TimeoutStartSec from my application...
I checked
https://www.freedesktop.org/wiki/Software/systemd/dbus/
but did not found a corresponding property.
I am using systemD on Linux which freely use systemD Dbus interfaces.
I found solution .
SystemD actually provide that info
dbus-send --system --dest=org.freedesktop.systemd1 --print-reply /org/freedesktop/systemd1/unit/ServiceName_2eservice \
org.freedesktop.DBus.Properties.Get string:org.freedesktop.systemd1.Service string:TimeoutStartUSec
Note: your name of service need to modify to get exactly object path ServiceName.service adapt to ServiceName_2eservice

sctp_core_destroy(): SCTP API not initialized in kamailio start

Hi I have installed Kamalio it start first time but when I stop and start it again it gives sctp_core_destroy(): SCTP API not initialized . I have already installed sctp module.
yyerror_at(): parse error in config file /etc/kamailio/kamailio.cfg
load_module(): could not find module <db_mysql> in </usr/lib/kamailio/modules>
[sctp_core.c:53]: sctp_core_destroy(): SCTP API not initialized
From the log it is obvious that you have successfully compiled & installed SCTP module, however it could NOT be initialized.
Note that is error could must often than not be as a result of other errors in your cfg file.
Few tips:
Can you run kamailio -c and to be sure there is NO error in your cfg.
Found error? use this command to monitor what the exact issue is. Run from a different terminal tail -fn200 /var/log/syslog
On the second terminal try restarting you Kamalio server sudo service kamalio restart
Revisit terminal 1 and look out for the first line with CRITICAL output like the one below CRITICAL: <core> [core/cfg.y:3413]: yyerror_at(): parse error in config file /usr/local/etc/kamailio/kamailio.cfg, line 366, column 41: syntax error
Line 366 mostly is the issue so visit that file at that line (366) to fix the proble
sudo nano +366 /usr/local/etc/kamailio/kamailio.cfg
Let me know if it helps

libnetwork: Error: unknown command "/var/run/docker/netns/582bd184e561" for "some_app"

I am trying to setup a network in the container (using Docker's libnetwork and libcontainer), but I keep running into this issue. As far as I can tell it's looking into some_app to get some sandbox information?
INFO[3808] No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4]
INFO[3808] IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]
Error: unknown command "/var/run/docker/netns/582bd184e561" for "some_app"
Run 'some_app --help' for usage.
ERRO[3808] Resolver Setup/Start failed for container 6b81802576bd4f16aa117061f81b5c3e, "setup not done yet"
ERRO[3808] failed to add interface vethef0a693 to sandbox: failed in prefunc: failed to set namespace on link "vethef0a693": invalid argument
ERRO[3808] failed to add interface vethef0a693 to sandbox: failed in prefunc: failed to set namespace on link "vethef0a693": invalid argument
I was wondering if anyone could help me make sense of this and perhaps prevent it. Are these two separate errors?
Thank you
Here is the library I am trying to use
It took me a while to figure this out, but here goes:
Just like in Docker, libnetwork creates a veth interface pair. It then moves one end of the veth pair into the container namespace. During this process libnetwork tries to execute commands registered at runtime on the current instance of the binary (some_app in this case).
These commands do not exist on the external interface of some_app however. They are injected later using a library called reexec. For this to work, reexec needs to be initialized like this:
if reexec.Init() {
return
}
Also note that according to this thread libnetwork is currently not supported for applications outside of Docker.
NB: I discovered this by reading the source code, so I might be wrong but my issue went away after this.

Chef::Exceptions::ValidationFailed error during EncryptedDataBagItem.load due to supposed regex mismatch

I'm bootstrapping a node with a cookbook that worked fine with chef-client as of November, unfortunately the following code:
45: #Configure PostgreSQL cluster -- create pertinent databases, users, and groups based on uploaded, decrypted shell here-document.
47>> here_doc_name = Chef::EncryptedDataBagItem.load("database_configs", "tlcworx_#{node["tlcworx_db"]["environment"]}")["filename"]
48: here_doc_content = Chef::EncryptedDataBagItem.load("database_configs", "tlcworx_#{node["tlcworx_db"]["environment"]}")["content"]
49:
50: open("#{node["tlcworx_db"]["tmp_dir"]}/#{here_doc_name}", 'w') { |f| f.puts here_doc_content }
Has rendered up the following error that halts the bootstrap:
Chef::Exceptions::ValidationFailed: Option data_bag's value {"encrypted_data"=>"PffgOkpIpdoEJO8khrUOUQwqv2/vqrtzOf1U/z/a5xD4KqSH2/CkD1zHndzW\nwJL1\n", "iv"=>"d/kiiPRQWQoKBTU5WF8NPw==\n", "version"=>1, "cipher"=>"aes-256-cbc"} does not match regular expression /^[\-[:alnum:]_]+$/
Obviously, I'm supplying the same --secret-file as I did back then via knife CLI argument. Running knife data bag edit database_configs tlcworx_uat --secret-file /path/to/secret.pem decrypts the cookbook content appropriately, and doesn't error out. I've never seen this error before, and looking at other instances of this error I see they involve direct CLI operations in which the data bag in question is not named such as this instance. Again, this is only upon bootstrap when a server's chef-client is communicating with the remote chef-server.
I was hoping someone could provide some insight as to what could be causing the error. Chef client version is 12.7.2.
Thanks in advance for any help on the matter!
For the future, we're pretty sure this is a side effect of a bug with DataBagItem.to_hash mutating it's data. Will be fixed in the next release of Chef.

Ambari is not able to start the Namenode

I have a problem with my Ambari server, it is not able to start the Namenode. I'm using HDP 2.0.6, Ambari 1.4.1. It is worth to mention this is happening once I've enabled the Kerberos security, I mean, when it is disabled there is no error.
The error is:
2015-02-04 16:01:48,680 ERROR namenode.EditLogInputStream (EditLogFileInputStream.java:nextOpImpl(173)) - caught exception initializing http://int-iot-hadoop-fe-02.novalocal:8480/getJournal?jid=integration&segmentTxId=1&storageInfo=-47%3A1493795199%3A0%3ACID-a5152e6c-64ab-4978-9f1c-e4613a09454d
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Fetch of http://int-iot-hadoop-fe-02.novalocal:8480/getJournal?jid=integration&segmentTxId=1&storageInfo=-47%3A1493795199%3A0%3ACID-a5152e6c-64ab-4978-9f1c-e4613a09454d failed with status code 500
Response message:
getedit failed. java.lang.IllegalArgumentException: Does not contain a valid host:port authority: null at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:211) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:210) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.isValidRequestor(GetJournalEditServlet.java:93) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.checkRequestorOrSendError(GetJournalEditServlet.java:128) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.doGet(GetJournalEditServlet.java:174) at
...
It seems the problem is about retrieving the Secondary Namenode http address, which in fact is set to null in hdfs-site-xml (I do not know why):
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>null</value>
</property>
I've tried to set that parameter's value to the appropriate one, but nothing works:
By manually editing the hdfs-site.xml files and running hdfs namenode, but nothing occurs.
By manually editing the hdfs-site.xml files and starting the whole HDFS from Ambari, but nothing occurs. Even, the dfs.namenode.secondary.http-address parameter is set to null again!
Through Ambari UI > HDFS services > config tab > hdfs-site.xml list > add new property... the problem is that dfs.namenode.secondary.http-address is not listed by the UI does not allow me to add it because it says... it is already existing! :)
I've tried to add the value in /usr/lib/ambari-server/web/data/configuration/hdfs-site.json, thinking this could be the place where Ambari stores the values that are show in the UI, but no success.
I've also noted that a site-XXXX.pp file is created under /var/lib/ambari-agent/data/ each time the HDFS service is restarted from the Amabri UI, and I've found each one of these files has:
[root#int-iot-hadoop-fe-02 ~]# cat /var/lib/ambari-agent/data/site-3228.pp | grep dfs.namenode.secondary.http-address
"dfs.namenode.secondary.http-address" => 'null',
I think other candidate file for configuring this property could be /var/lib/ambari-agent/puppet/modules/hdp-hadoop/manifests/params.pp. There is a ### hdfs-site section, but I'm not able to figure out which is the name of the puppet variable associated to the dfs.namenode.secondary.http-address property.
Any ideas? Thanks!
I have a workaround to make it work under ambari environment:
In the ambari node modify:
/usr/lib/ambari-server/web/javascripts/app.js
/usr/lib/ambari-server/web/javascripts/app.js.map
changing from:
{
"name": "dfs.namenode.secondary.http-address",
"templateName": ["snamenode_host"],
"foreignKey": null,
"value": "<templateName[0]>:50090",
"filename": "hdfs-site.xml"
},
to the specific value for your secondary namenode and not the template one:
{
"name": "dfs.namenode.secondary.http-address",
"templateName": ["snamenode_host"],
"foreignKey": null,
"value": "my.secondary.namenode.domain:50090",
"filename": "hdfs-site.xml"
},
rename /usr/lib/ambari-server/web/javascripts/app.js.gz to /usr/lib/ambari-server/web/javascripts/app.js.gz.old
gzip the app.js so a new app.js.gz is generated in the same directory
Refresh your ambari web and force an HDFS restart, this will regenerate the appropiate /etc/hadoop/conf/hdfs-site.xml, if it does not, you coud add in the ambari web a new property and then delete it in order to force the changes when you press the save button.
Hope this helps.
--mLG
Partially fixed: it is necessary to stop all the HDFS services (Journal Node, Namenodes and Datanodes) before editing the hdfs-site.xml file. Then, of course, Ambari "start button" cannot be used because the configuration would be smashed... thus it is necessary to re-start all the services manually. This is not the definitive solution since it is desirable this changes of configuration could be done from Ambari UI...

Resources