Spark Controller installation fails via ambari - hadoop

When we are trying to install Spark Controller via Ambari, it is giving error.
below is the error we are getting:
stderr: /var/lib/ambari-agent/data/errors-403.txt
File
"/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/SparkController/package/scripts/controller_conf.py",
line 10, in controller_conf recursive = True
File
"/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 147, in init raise Fail("%s received unsupported argument %s"
% (self, key)) resource_management.core.exceptions.Fail:
Directory['/usr/sap/spark/controller/conf'] received unsupported
argument recursive
stdout: /var/lib/ambari-agent/data/output-403.txt
2016-12-15 08:44:36,441 - Skipping installation of existing package curl
2016-12-15 08:44:36,441 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2016-12-15 08:44:36,496 - Skipping installation of existing package hdp-select Start installing
2016-12-15 08:44:36,668 - Execute['cp -r /var/lib/ambari-agent/cache/stacks/HDP/2.3/services/SparkController/package/files/sap/spark /usr/sap'] {}
2016-12-15 08:44:36,685 - Execute['chown hanaes:sapsys /var/log/hanaes'] {} Configuring... Command failed after 1 tries
Versions:
Ambari : 2.4.2.0
Spark : 1.5.2.2.3
Spark Controller : 1.6.1

Raised Customer message towards SAP and the resolution was: "Known issue for Spark Controller 1.6.2, so please updagrade to Spark Controller 2.0".
After upgrading to Spark Controller 2.0 the installation was successful. Hence closing this thread.

Related

Running Pyspark on Pycharm

On a Mac (v. 10.14.5), I am trying to run PySpark programs in PyCharm (professional edition, v. 19.2).
I know my simple PySpark program is fine, because when I run it with spark-submit outside PyCharm from the terminal, using Spark I installed via brew, it works as expected. I have tried linking PyCharm to this version of Spark, but am getting other issues.
I followed multiple instructions online to install pyspark within Pycharm (Preferences -> Project Interpreter), and set the SPARK_HOME environment variable to the appropriate venv directory (Run -> Edit Configurations -> Environment Variables). For example, this stackoverflow thread.
But, I get an error message when I run the program:
Failed to find Spark jars directory (/Users/rahul/PycharmProjects/spark-demoII/venv/assembly/target/scala-2.12/jars).
You need to build Spark with the target "package" before running this program.
Traceback (most recent call last):
File "/Users/rahul/PycharmProjects/spark-demoII/run.py", line 6, in <module>
sc = SparkContext("local", "SimpleApp")
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
Process finished with exit code 1
Anyone know how to get PyCharm to run Pyspark programs on a similar machine?
In response to #pissal suggestion:
I tried that previously but that version of spark does work. I tried it again anyway: after switching to a virtual environment, I did a pip install pyspark. To ensure that this version of spark works, I ran a spark-submit run.py (outside of PyCharm), and here is the error message.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/rahul/.virtualenvs/test1/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
at java.base/java.lang.String.substring(String.java:1909)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
... 25 more
So the reason this was happening was that pyspark has not been updated to use the latest version of Java. After removing Java version 13, I made sure my home brew installation of spark uses java version 1.8. Then added the following to the Environment Variables in Run -> Edit Configurations in Pycharm:
SPARK_HOME=/usr/local/Cellar/apache-spark/2.4.4/libexec
With these settings I can run pyspark jobs in PyCharm.

Install DataNode by Ambari

I have
OS Red Hat Enterprise Linux Server release 7.4 (Maipo)
Ambari Version 2.5.1.0
HDP 2.6
After finished deploy components 2 datanodes not can start.
Tried start returned error:
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 127. -bash: /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh: No such file or directory
I tried to delete component and make new install by Ambari.
Installed completed without error
2018-02-27 20:47:31,550 - Execute['ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.6 | tail -1`'] {'only_if': 'ls -d /usr/hdp/2.6*'}
2018-02-27 20:47:31,554 - Skipping Execute['ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.6 | tail -1`'] due to only_if
2018-02-27 20:47:31,554 - FS Type:
2018-02-27 20:47:31,554 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {u'final': {u'fs.defaultFS': u'true'}}, 'owner': 'hdfs', 'only_if': 'ls /usr/hdp/current/hadoop-client/conf', 'configurations': ...}
2018-02-27 20:47:31,568 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2018-02-27 20:47:31,569 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2018-02-27 20:47:31,583 - Could not load 'version' from /var/lib/ambari-agent/data/structured-out-3374.json
Command completed successfully!
BUT new start show more again error.
I checked folder /usr/hdp/current/hadoop-client/
In folder new files for example /sbin/hadoop-daemon.sh did not appear.
How to do it again deploy component DataNode by Ambari?
I'd guess the issue is caused by wrong symlinks at /usr/hdp. You may even try to fix them manually, the structure is trivial enough. Through the issue does not sound like a common one after a plain stack deployment.
Are you running Ambari with non-root/custom user? Maybe Ambari has not sufficient permissions? See https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-security/content/how_to_configure_ambari_server_for_non-root.html
Ambari Version 2.5.1.0 is considerably outdated, so it would make sense to update Ambari and see whether it helps.
Also, if you want to whipe out everything see https://github.com/hortonworks/HDP-Public-Utilities/blob/master/Installation/cleanup_script.sh
Also, it may be more productive to ask Ambari-related questions here https://community.hortonworks.com/

metron installation over ambari-managed cluster failed

I am using centos6.5 in which successfully deployed ambari-server2.4 with 3 nodes where hadoop,yarn,strom are successfully running.
I am using this url as reference for metron installation:
https://community.hortonworks.com/articles/60805/deploying-a-fresh-metron-cluster-using-ambari-serv.html
I am getting stuck when i try to install metron in ambari-server, adding metron in ambari-repository list is succeed but when installing metron nodes and elasticsearch over target node i get below exception:
Ambari-server Console error:
2017-05-25 16:33:01,138 - Installing package elasticsearch-2.3.3 ('/usr/bin/yum -d 0 -e 0 -y install elasticsearch-2.3.3')
2017-05-25 16:33:02,132 - Execution of '/usr/bin/yum -d 0 -e 0 -y install elasticsearch-2.3.3' returned 1. Error: Cannot retrieve repository metadata (repomd.xml) for repository: METRON-0.4.0. Please verify its path and try again
2017-05-25 16:33:02,132 - Failed to install package elasticsearch-2.3.3. Executing '/usr/bin/yum clean metadata'
2017-05-25 16:33:02,497 - Retrying to install package elasticsearch-2.3.3 after 30 seconds
Command failed after 1 tries
Terminal error:
file:///localrepo/repodata/repomd.xml: [Errno 14] Could not open/read file:///localrepo/repodata/repomd.xml
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: METRON-0.4.0. Please verify its path and try again

Issue while installing custom logstash gem filter with logstash 2.3.1

I was using logstash 1.5.4 and was able to install custom logstash filter received from third party team, now I have updated the logstash from 1.5.4 to 2.3.1 and when I try to install logstash filter it gives me following error:
failed: [site1elk01] => {"changed": true, "cmd": "/opt/logstash/bin/logstash-plugin install /opt/logstash/logstash-filter-abc-1.0.0.gem", "delta": "0:00:11.883231", "end": "2016-04-25 00:38:30.545048", "rc": 1, "start": "2016-04-25 00:38:18.661817", "warnings": []}
stderr: Error Bundler::GemspecError, retrying 1/10
There was a Errno::ENOENT while loading logstash-filter-abc.gemspec:
No such file or directory - git from
/opt/logstash/vendor/local_gems/bdd6b4de/logstash-filter-abc-1.0.0/logstash-filter-abc.gemspec:14:in `eval_gemspec'
No such file or directory - git from
/opt/logstash/vendor/local_gems/a945cf06/logstash-filter-abc-1.0.0/logstash-filter-abc.gemspec:14:in `eval_gemspec'
Too many retries, aborting, caused by Bundler::GemspecError
ERROR: Installation Aborted, message: There was a Errno::ENOENT while loading logstash-filter-abc.gemspec:
No such file or directory - git from
/opt/logstash/vendor/local_gems/a945cf06/logstash-filter-abc-1.0.0/logstash-filter-abc.gemspec:14:in `eval_gemspec'
stdout: Validating /opt/logstash/logstash-filter-abc-1.0.0.gem
Installing logstash-filter-abc
Earlier in 1.5.4 I was using below command to install the filter:
/opt/logstash/bin/plugin install /opt/logstash/logstash-filter-abc-1.0.0.gem
Now after it gave me error to use logstash-plugin from bin folder I am using following:
/opt/logstash/bin/logstash-plugin install /opt/logstash/logstash-filter-abc-1.0.0.gem
Any help here as what could be the issue?
Thanks in advance!!
If you are installing a plugin, that is in your local directory not official ones. You should modify,
"gemfile"
in logstash directory. Thats why you get this error.

Hadoop and Hive Homes in CDH4

I'm trying to configure RHive in the CDH4 environment.
When reading a package 'RHive' in R, the error below got returned.
I'm guessing that's due to wrong homes.
If so, what would be the correct ones?
Or if that's not the reason, what's wrong with that?
Any help would be very appreciated.
Thanks.
> Sys.setenv(HIVE_HOME="/etc/hive")
> Sys.setenv(HADOOP_HOME="/etc/hadoop")
> library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-7. For overview type '?RHive'.
HIVE_HOME=/etc/hive
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
In addition: Warning message:
In file(file, "rt") :
cannot open file '/etc/hadoop/conf/slaves': No such file or directory
Error: package/namespace load failed for 'RHive'
Had the problems but solved it. Downside is that I have to keep track of a bunch of sym links
After struggling with install RHive_0.0-7.tar.gz on CDH 4.7.x and getting:
Warning in file(file, "rt") :
cannot open file '/etc/hadoop/conf/slaves': No such file or directory
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
In /etc/hadoop/conf
I added a the following sym link ----> ln -s /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/etc/hadoop/conf.empty/slaves slaves
(why Cloudera CHD 4.7 installs in /opt without creating the proper sym links from /usr/lib is puzzling)
I also defined the followingin /usr/lib64/R/etc/Renviron
## set hive paths
HIVE_HOME='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive'
HADOOP_HOME='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop'
LD_LIBRARY_PATH='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop'
At a shell prompt I ran R CMD INSTALL RHive_0.0-7.tar.gz
Installation Happiness!!
++++++
Inside R-Studio (server)
>
> library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-7. For overview type ‘?RHive’.
HIVE_HOME=/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
>
+++++++
You should set the HADOOP_CONF_DIR separately.
Try export $HADOOP_CONF_DIR=/etc/hadoop/conf/conf.pseudo
The conf.pseudo has the slaves file.
Though I'd be curious to see if you can make RHive work with CDH4.

Resources