Install DataNode by Ambari - hadoop

I have
OS Red Hat Enterprise Linux Server release 7.4 (Maipo)
Ambari Version 2.5.1.0
HDP 2.6
After finished deploy components 2 datanodes not can start.
Tried start returned error:
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 127. -bash: /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh: No such file or directory
I tried to delete component and make new install by Ambari.
Installed completed without error
2018-02-27 20:47:31,550 - Execute['ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.6 | tail -1`'] {'only_if': 'ls -d /usr/hdp/2.6*'}
2018-02-27 20:47:31,554 - Skipping Execute['ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.6 | tail -1`'] due to only_if
2018-02-27 20:47:31,554 - FS Type:
2018-02-27 20:47:31,554 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {u'final': {u'fs.defaultFS': u'true'}}, 'owner': 'hdfs', 'only_if': 'ls /usr/hdp/current/hadoop-client/conf', 'configurations': ...}
2018-02-27 20:47:31,568 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2018-02-27 20:47:31,569 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2018-02-27 20:47:31,583 - Could not load 'version' from /var/lib/ambari-agent/data/structured-out-3374.json
Command completed successfully!
BUT new start show more again error.
I checked folder /usr/hdp/current/hadoop-client/
In folder new files for example /sbin/hadoop-daemon.sh did not appear.
How to do it again deploy component DataNode by Ambari?

I'd guess the issue is caused by wrong symlinks at /usr/hdp. You may even try to fix them manually, the structure is trivial enough. Through the issue does not sound like a common one after a plain stack deployment.
Are you running Ambari with non-root/custom user? Maybe Ambari has not sufficient permissions? See https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-security/content/how_to_configure_ambari_server_for_non-root.html
Ambari Version 2.5.1.0 is considerably outdated, so it would make sense to update Ambari and see whether it helps.
Also, if you want to whipe out everything see https://github.com/hortonworks/HDP-Public-Utilities/blob/master/Installation/cleanup_script.sh
Also, it may be more productive to ask Ambari-related questions here https://community.hortonworks.com/

Related

Ansible on SLES: zypper plugin not able to install PostgreSQL 14

I am trying my hand on Ansible after having a very nice training in it. Currently, my task is to create a playbook that sets up a PostgreSQL cluster (with Patroni and etcd).
However, while installing PostgreSQL should be a pretty easy task, doing it using the zypper plugin throws an error. First, the part of the playbook that should install PostgreSQL:
- name: Installation PostgreSQL 14 Latest ohne Recommendations
become: true
zypper:
disable_recommends: true
name:
postgresql14-server
postgresql14-contrib
postgresql14-devel
update_cache: true
when: ansible_host in pgservers
The error message given is this:
fatal: [goeccdb22l]: FAILED! => {"changed": false, "cmd": ["/usr/bin/zypper", "--quiet", "--non-interactive", "--xmlout", "install", "--type", "package", "--auto-agree-with-licenses", "--no-recommends", "--", "+postgresql14-server postgresql14-contrib postgresql14-devel"], "msg": "No provider of '+postgresql14-server postgresql14-contrib postgresql14-devel' found.", "rc": 104, "stderr": "", "stderr_lines": [], "stdout": "<?xml version='1.0'?>\n<stream>\n<message type=\"error\">No provider of &apos;+postgresql14-server postgresql14-contrib postgresql14-devel&apos; found.</message>\n</stream>\n", "stdout_lines": ["<?xml version='1.0'?>", "<stream>", "<message type=\"error\">No provider of &apos;+postgresql14-server postgresql14-contrib postgresql14-devel&apos; found.</message>", "</stream>"]}
Let's extract the error message:
"msg": "No provider of '+postgresql14-server postgresql14-contrib postgresql14-devel' found."
I tried to replicate the problem using the shell on the target server. However, running the command seems to be able to install the packages:
ansible#goeccdb22l:~> sudo /usr/bin/zypper install --type package --auto-agree-with-licenses --no-recommends -- +postgresql14-server postgresql14-contrib postgresql14-devel
Loading repository data...
Reading installed packages...
Resolving package dependencies...
The following 12 NEW packages are going to be installed:
libecpg6 libopenssl-1_1-devel libpq5 postgresql postgresql14 postgresql14-contrib postgresql14-devel postgresql14-server postgresql-contrib postgresql-devel postgresql-server zlib-devel
The following package needs additional customer contract to get support:
postgresql14
12 new packages to install.
Overall download size: 8.0 MiB. Already cached: 0 B. After the operation, additional 35.4 MiB will be used.
Continue? [y/n/v/...? shows all options] (y):
I've removed only the --quiet and --non-interactive options from the command, but kept all other given options.
The best idea I have is that the user/privilege escalation workings could be different from me logging in as the Ansible user to the target and just using sudo before the command.
Edit 1: I have developed an idea what the problem could be. As I mentioned above, when I tested the command, I removed two options: --quiet and --non-interactive. Testing the command with those two options gives the message:
The flag --quiet is not known.
However, using man zypper, I can clearly see that --quiet is a documented option:
-q, --quiet
Suppress normal output. Brief (esp. result notification) messages and error messages will still be printed, though. If used together with conflicting --verbose option, the --verbose option takes preference.
Now, my idea is that Ansible calls the command it documents in the return XML, but that because somehow --quiet is not understood it interprets that as nothing providing the requested package list. So that would leave two questions:
Why is --quiet not understood, yet documented? Is that a problem of SLES vs. OpenSuse?
How to work around that?
As the Ansible zypper module has no option to suppress the --quiet option I don't see any chance of working around it with parameters. The last option would be to split the zypper task into smaller shell tasks which I would like to avoid if possible.
So, with the help of a knowledgeable sysadmin I was able to diagnose the problem.
The list of packages documented above was not in fact a list. As I missed the dashes in front of the packages, Ansible accepted the packages with the newlines and everything as one package name and tried to install it.
The solution is to change the packages into a list of packages by prefixing them with dashes/minus signs.
The problem with --quiet was that it is a positional argument, and I used the wrong position when testing it.

Datadog agent do not send data

I run into an issue with my Datadog agent. I installed Agent version 7.35.0 on an EC2 ubuntu machine. After I restarted the agent I got this error:
Apr 10 11:24:24 ip-10-100-0-33 agent[9951]: 2022-04-10 11:24:24 UTC | CORE | WARN |(pkg/collector/python/datadog_agent.go:124 in LogMessage) | disk:e5dffb8bef24336f |(disk.py:136) | Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission denied: '/sys/kernel/debug/tracing'. You can exclude this mountpoint in the settings if it is invalid.
From what I've seen on threads, they gave this answer:
Can you add "tracefs" to the "file_system_blacklist" configuration to see if that unblocks you? We can add it by default if it does.
But I do not completely understand this answer, and I am not sure what should I change to fix this issue.
If anyone experiences this kind of thing and can help me it would be super helpful
Thank you!
With Datadog Agent 7:
mv /etc/datadog-agent/conf.d/disk.d/conf.yaml.default /etc/datadog-agent/conf.d/disk.d/conf.yaml
Then in /etc/datadog-agent/conf.d/disk.d/conf.yaml, uncomment file_system_global_exclude and underneath it add - tracefs:
init_config:
file_system_global_exclude:
- tracefs

Spark Controller installation fails via ambari

When we are trying to install Spark Controller via Ambari, it is giving error.
below is the error we are getting:
stderr: /var/lib/ambari-agent/data/errors-403.txt
File
"/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/SparkController/package/scripts/controller_conf.py",
line 10, in controller_conf recursive = True
File
"/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 147, in init raise Fail("%s received unsupported argument %s"
% (self, key)) resource_management.core.exceptions.Fail:
Directory['/usr/sap/spark/controller/conf'] received unsupported
argument recursive
stdout: /var/lib/ambari-agent/data/output-403.txt
2016-12-15 08:44:36,441 - Skipping installation of existing package curl
2016-12-15 08:44:36,441 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2016-12-15 08:44:36,496 - Skipping installation of existing package hdp-select Start installing
2016-12-15 08:44:36,668 - Execute['cp -r /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/SparkController/package/files/sap/spark /usr/sap'] {}
2016-12-15 08:44:36,685 - Execute['chown hanaes:sapsys /var/log/hanaes'] {} Configuring... Command failed after 1 tries
Versions:
Ambari : 2.4.2.0
Spark : 1.5.2.2.3
Spark Controller : 1.6.1
Raised Customer message towards SAP and the resolution was: "Known issue for Spark Controller 1.6.2, so please updagrade to Spark Controller 2.0".
After upgrading to Spark Controller 2.0 the installation was successful. Hence closing this thread.

Issue while installing custom logstash gem filter with logstash 2.3.1

I was using logstash 1.5.4 and was able to install custom logstash filter received from third party team, now I have updated the logstash from 1.5.4 to 2.3.1 and when I try to install logstash filter it gives me following error:
failed: [site1elk01] => {"changed": true, "cmd": "/opt/logstash/bin/logstash-plugin install /opt/logstash/logstash-filter-abc-1.0.0.gem", "delta": "0:00:11.883231", "end": "2016-04-25 00:38:30.545048", "rc": 1, "start": "2016-04-25 00:38:18.661817", "warnings": []}
stderr: Error Bundler::GemspecError, retrying 1/10
There was a Errno::ENOENT while loading logstash-filter-abc.gemspec:
No such file or directory - git from
/opt/logstash/vendor/local_gems/bdd6b4de/logstash-filter-abc-1.0.0/logstash-filter-abc.gemspec:14:in `eval_gemspec'
No such file or directory - git from
/opt/logstash/vendor/local_gems/a945cf06/logstash-filter-abc-1.0.0/logstash-filter-abc.gemspec:14:in `eval_gemspec'
Too many retries, aborting, caused by Bundler::GemspecError
ERROR: Installation Aborted, message: There was a Errno::ENOENT while loading logstash-filter-abc.gemspec:
No such file or directory - git from
/opt/logstash/vendor/local_gems/a945cf06/logstash-filter-abc-1.0.0/logstash-filter-abc.gemspec:14:in `eval_gemspec'
stdout: Validating /opt/logstash/logstash-filter-abc-1.0.0.gem
Installing logstash-filter-abc
Earlier in 1.5.4 I was using below command to install the filter:
/opt/logstash/bin/plugin install /opt/logstash/logstash-filter-abc-1.0.0.gem
Now after it gave me error to use logstash-plugin from bin folder I am using following:
/opt/logstash/bin/logstash-plugin install /opt/logstash/logstash-filter-abc-1.0.0.gem
Any help here as what could be the issue?
Thanks in advance!!
If you are installing a plugin, that is in your local directory not official ones. You should modify,
"gemfile"
in logstash directory. Thats why you get this error.

EC2 AMI 2.4/2.5 dsc20/dsc21 dependency issue

I'm trying to create a 1 node Datastax community edition cluster using this guidelines on EC2 m3.xlarge (eu-west).
Here is the provided parameters:
--clustername cassandra
--totalnodes 1
--version community
As mentioned in the guideline, I opened those ports:
22
8888
1024-65355
Here is the error I found in ~/datastax_ami/ami.log:
The following packages have unmet dependencies:
dsc20 : Depends: cassandra (= 2.0.14) but 2.1.4 is to be installed
[ERROR] 04/21/15-12:58:29 sudo service cassandra stop:
cassandra: unrecognized service
[EXEC] 04/21/15-12:58:29 sudo rm -rf /var/lib/cassandra
[EXEC] 04/21/15-12:58:29 sudo rm -rf /var/log/cassandra
[EXEC] 04/21/15-12:58:29 sudo mkdir -p /var/lib/cassandra
[EXEC] 04/21/15-12:58:29 sudo mkdir -p /var/log/cassandra
[ERROR] 04/21/15-12:58:29 sudo chown -R cassandra:cassandra /var/lib/cassandra:
chown: invalid user: `cassandra:cassandra'
[ERROR] 04/21/15-12:58:29 sudo chown -R cassandra:cassandra /var/log/cassandra:
chown: invalid user: `cassandra:cassandra'
[INFO] Reflector loop...
[INFO] 04/21/15-12:58:29 Reflector: Received 1 of 1 responses from: [u'172.31.46.236']
[INFO] Seed list: set([u'172.31.46.236'])
[INFO] OpsCenter: 172.31.46.236
[INFO] Options: {'username': None, 'cfsreplication': None, 'heapsize': None, 'reflector': None, 'clustername': 'cassandra', 'analyticsnodes': 0, 'seed_indexes': [0, 1, 1], 'realtimenodes': 1, 'java7': None, 'opscenter': 'no', 'totalnodes': 1, 'searchnodes': 0, 'release': None, 'opscenterinterface': None, 'version': 'community', 'dev': None, 'customreservation': None, 'password': None, 'email': None, 'raidonly': None, 'javaversion': None}
[ERROR] Exception seen in ds1_launcher.py:
Traceback (most recent call last):
File "/home/ubuntu/datastax_ami/ds1_launcher.py", line 33, in initial_configurations
ds2_configure.run()
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 1058, in run
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 521, in construct_yaml
IOError: [Errno 2] No such file or directory: '/etc/cassandra/cassandra.yaml'
Related GitHub issue: Add support for DSC 2.2 Versions #81
Does anyone what I've done wrong.
Thanks
There was a bug with the dependencies - as you encountered - Add support for DSC 2.2 Versions #81 and was fixed in AMI 2.5.
Therefore be sure to use the new AMI. Do not use:
DataStax Auto-Clustering AMI
which is the AMI 2.4 version, instead use:
DataStax Auto-Clustering AMI 2.5.1-pv
or
DataStax Auto-Clustering AMI 2.5.1-hvm
Per the github issue:
Proposed fix committed to dev-2.5 and dev-2.6. Will test today and release today.

Resources