Installing Apache Spark using yum - hadoop

I am in the process of installing spark in my organization's HDP box. I run yum install spark and it installs Spark 1.4.1. How do I install Spark 2.0? Please help!

Spark 2 is supported (as a technical preview) in HDP 2.5. You can get the specific HDP 2.5 repo added to your yum repo directory and then install the same. Spark 1.6.2 is the version default in HDP 2.5.
wget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0/hdp.repo
sudo cp hdp.repo /etc/yum.repos.d/hdp.repo
sudo yum install spark2-master
or
sudo yum install spark2 (also seems to be doing the same when i tried)
see whats new in HDP 2.5 http://hortonworks.com/products/data-center/hdp/
For full list of repos see https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/download-links-250.html

Related

pyspark 3.x in pypi limited to hadoop 2.7.4

I want to pip install pyspark into my python3 virtual environment but the only choice I have is the PyPI version compiled with Hadoop 2.7.4 dependencies. I need a Hadoop 3.x version since 2.7.4 is too old for modern AWS S3 integration.
Does anyone know why there isn't an option to pip install pyspark with Hadoop 3.x support?
Is my only option to build my own pyspark from source?

Homebrew install specific version of hadoop 2.8.0 instead of 3.1.1

How do i install hadoop 2.8.0 instead of hadoop 3.1.1 via brew ?
alternatively, how to use brew to install downloded hadoop 2.8.0 from my desktop ?

RabbitMQ RPM installation failed on Red Hat Enterprise Linux 7.2 (even with erlang installed)

In the homepage of RabbitMQ it says
First install erlang
Then install RabbitMQ by
rpm --import https://www.rabbitmq.com/rabbitmq-release-signing-key.asc
yum install rabbitmq-server-3.6.3-1.noarch.rpm
I installed erlang but when install RabbitMQ it failed, the error says
Requires: erlang>=R16B-03
But I have already installed erlang 19.0, what's the problem? Someone in other article suggested RabbitMQ doesn't support erlang 19.0 right now, then what should I do? I have already installed erlang 19.0 and tried installing erlang 18.3 without deleting erlang 19.0, because I don't know how to uninstall erlang :( , it still fails (but $erl shows the version is 18.3).
RabbitMQ will support Erlang 19.0 starting from the version 3.6.4 (currently in RC1).
I suggest to use the zero dependency Erlang/OTP 18.3.4 package here:
https://github.com/rabbitmq/erlang-rpm/releases/tag/v1.3.0
install it in this way:
wget https://github.com/rabbitmq/erlang-rpm/releases/download/v1.3.0/erlang-18.3.4-1.el7.centos.x86_64.rpm
rpm -i erlang-18.3.4-1.el7.centos.x86_64.rpm
To remove your current erlang installation try using:
sudo yum remove erlang*

How to uninstall all versions of hadoop completely from the system?

I had installed CDH5 with Mvr1 in ubuntu 14.04 LTS (single node) in pseudo-distributed mode using this tutorial
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_qs_mrv1_pseudo.html
I used the command
sudo apt-get install hadoop-0.20-conf-pseudo
to install the package in pseduo distributed mode.
I then tried to uninstall it and migrate to YARN (MvR2). But in doing so, my datanode fails to start up every time. I removed Mvr1 and installed YARN using this tutorial:
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_qs_yarn_pseudo.html.
I used the command
sudo apt-get remove hadoop-0.20-conf-pseudo hadoop-0.20-mapreduce-*
to uninstall Mvr1
and
sudo apt-get install hadoop-conf-pseudo
to install YARN.
Can you suggest me how to remove all versions of hadoop completely from my system and verify that no file remains before I do a fresh installation?
Do a:
sudo dpkg -l | grep hadoop
to see what packages are installed and then go through the list running:
sudo apt-get remove
on anything that pops up. That should remove hadoop completely from your system.

How to run a recent version of pypy (e.g., 2.3.1) on Heroku, Rackspace, AWS?

I'd like to use pypy 2.3.1 as a runtime env. for a Flask app with numpy. I've tested this on Heroku, but it only support pypy v. 1.9. Has anyone had luck with running a recent v. of pypy, e.g., pypy 2.3.1, on either Heroku, Rackspace, AWS or similar?
On an Ubuntu Rackspace cloud server (or any other Ubuntu machine) you can get the latest version of PyPy by running:
sudo add-apt-repository ppa:pypy/ppa
sudo apt-get update
sudo apt-get install pypy pypy-dev

Resources