How to install cloudera impala on EMR? - hadoop

Is there anyway i can install the only impala without cloudera manager and without cdh. I will be using the apache version of hadoop?

Yes, it is absolutely possible. Add the repository into your sources.list file and update the repository after that.
deb [arch=amd64]
http://archive.cloudera.com/impala/ubuntu/precise/amd64/impala
precise-impala1 contrib deb-src
http://archive.cloudera.com/impala/ubuntu/precise/amd64/impala
precise-impala1 contrib
After that, it's merely :
sudo apt-get install impala (Binaries for daemons)
sudo apt-get install impala-server (Service start/stop script)
sudo apt-get install impala-state-store (Service start/stop script)
But do not forget to meet all the prerequisites. For a detailed info you can go here

You can view detailed instructions on how to install and use Impala with Amazon EMR here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-impala.html

EMR is based on a Amazon Hadoop distribution that runs on top of Debian squeeze. So, yes it's possible using Cloudera's DEB repo.
You will need to SSH to your EMR master node, find the address on EMR console.
You will also need to enable security rules on the security group you have assigned to your EMR cluster, if you intend to connect to Impala using a JDBC/ODBC client form the outside world.

Related

Installing and Managing Jenkins on Amazon Linux

I'm looking to move Jenkins to Amazon EC2 running Amazon Linux.
Currently we have Jenkins installed as a package (via yum). I'm considering running Jenkins as the contained jenkins.war on EC2 (for auto-upgrades and ease of deployment).
Unfortunately I've been unable to find much documentation regarding managing jenkins as the latter.
I'm trying to determine:
Which installation is preferred, and why?
If running as a contained jar:
How do I start/stop jenkins?
Should I create a jenkins user?
Installation Steps :
Please launch an Amazon Linux instance using Amazon Linux AMI.
Login to your Amazon Linux instance.
Become root using “sudo su -” command.
Update your repositories
yum update
Get Jenkins repository using below command
wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
Get Jenkins repository key
rpm --import http://pkg.jenkins-ci.org/redhat-stable/jenkins-ci.org.key
Install jenkins package
yum install jenkins
Start jenkins and make sure it starts automatically at system startup
service jenkins start
chkconfig jenkins on
Open your browser and navigate to http://<Elastic-IP>:8080. You will see jenkins dashboard.
That’s it. You have your jenkins setup up and running. Now, you can create jobs to build the code.
Reference: http://sanketdangi.com/post/62715793234/install-configure-jenkins-on-amazon-linux
Jenkins Installation Ubuntu 14.04/16.01
Please follow the steps given below.
Switch to root user sudo su -
sudo apt-get update
sudo apt-get install default-jdk
sudo apt-get install default-jre
wget -q -O - https://pkg.jenkins.io/debian/jenkins-ci.org.key | sudo apt-key add -
echo deb https://pkg.jenkins.io/debian-stable binary/ | sudo tee /etc/apt/sources.list.d/jenkins.list
sudo apt-get update
apt-get install jenkins
Get jenkins Password from:- vi /var/lib/jenkins/secrets/initialAdminPassword
Browse:- eg: 192.168.xx.xx:8080

On-Prem Install without using yum command

If datacenter doesn't allow commands like yum, rpm, is there an alternative way to do an on-prem install of OPDK?
They will actually have to install their own local yum repository -- this means building up a rpm repository outside the data center and then brining that into the data center. Then you point yum on the Apigee machine to the internal yum repository.
We have a couple clients who have done this (who probably don't want it advertised so contact me directly for who has done it this way).
You can setup yum client and yum server. Yum client being the machine in the datacenter and yum server which is the machine from which you can pull all required repositories.
Yum will be used to download repositories for open ldap , postgres and qpid.
If for initial testing you are setting up an "sa" ( standalone) installation without analytics you would not require yum. ( If we use analytics all dependencies for analytics(postgres and qpid ) are installed via yum)

Cloudera Installation Error I want to know can we cloudera manager for Hadoop single node Cluster on ubuntu?

I am using ubuntu 12.04 64bit, I installed and ran sample hadoop programs with single node successfully.
I am getting the following error while installing cloudera manager on my ubuntu
Refreshing repository metadata failed. See
/var/log/cloudera-manager-installer/2.refresh-repo.log for details.
Click OK to revert this installation.
I want to know can we install Cloudera for Hadoop's Single node cluster on ubuntu. Please response me that Is it possible to install cloudera manager for single node or not. Or else Am i want to create multiple nodes for using cloudera with my hadooop
Yes, CM can run in a single node.
This error is because CM can not use apt-get install to get the packages. Which tutorial do you follow?
However, you can manually add the cloudera repo. See this thread.

How to have lzo compression in hadoop mapreduce?

I want to use lzo to compress map output but I can't run it! The version of Hadoop I used is 0.20.2. I set:
conf.set("mapred.compress.map.output", "true")
conf.set("mapred.map.output.compression.codec",
"org.apache.hadoop.io.compress.LzoCodec");
When I run the jar file in Hadoop it shows an exception that can't write map output.
Do I have to install lzo?
What do I have to do to use lzo?
LZO's licence (GPL) is incompatible with that of Hadoop (Apache) and therefore it cannot be bundled with it. One needs to install LZO separately on the cluster.
The following steps are tested on Cloudera's Demo VM (CentOS 6.2, x64) that comes with full stack of CDH 4.2.0 and CM Free Edition installed, but they should work on any Linux based on Red Hat.
The installation consists of the following steps:
Installing LZO
sudo yum install lzop
sudo yum install lzo-devel
Installing ANT
sudo yum install ant ant-nodeps ant-junit java-devel
Downloading the source
git clone https://github.com/twitter/hadoop-lzo.git
Compiling Hadoop-LZO
ant compile-native tar
For further instructions and troubleshooting see https://github.com/twitter/hadoop-lzo
Copying Hapoop-LZO jar to Hadoop libs
sudo cp build/hadoop-lzo*.jar /usr/lib/hadoop/lib/
Moving native code to Hadoop native libs
sudo mv build/hadoop-lzo-0.4.17-SNAPSHOT/lib/native/Linux-amd64-64/ /usr/lib/hadoop/lib/native/
cp /usr/lib/hadoop/lib/native/Linux-amd64-64/libgplcompression.* /usr/lib/hadoop/lib/native/
Correct version number with the version you cloned
When working with a real cluster (as opposed to a pseudo-cluster) you need to rsync these to the rest of the machines
rsync /usr/lib/hadoop/lib/ to all hosts.
You can dry run this first with -n
Login to Cloudera Manager
Select from Services: mapreduce1->Configuration
Client->Compression
Add to Compression Codecs:
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
Search "valve"
Add to MapReduce Service Configuration Safety Valve
io.compression.codec.lzo.class=com.hadoop.compression.lzo.LzoCodec
mapred.child.env="JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/"
Add to MapReduce Service Environment Safety Valve
HADOOP_CLASSPATH=/usr/lib/hadoop/lib/*
That's it.
Your MarReduce jobs that use TextInputFormat should work seamlessly with .lzo files. However, if you choose to index the LZO files to make them splittable (using com.hadoop.compression.lzo.DistributedLzoIndexer), you will find that the indexer writes a .index file next to each .lzo file. This is a problem because your TextInputFormat will interpret these as part of the input. In this case you need to change your MR jobs to work with LzoTextInputFormat.
As of Hive, as long as you don't index the LZO files, the change is also transparent. If you start indexing (to take advantage of a better data distribution) you will need to update the input format to LzoTextInputFormat. If you use partitions, you can do it per partition.

How can I use/install "make" on the Amazon Linux AMI for EC2?

I'm a new user of Amazon EC2.
I want to compile the pptpd package on EC2, but receive the following error:
[root#ip-10-112-xxx-xxx /]# /var/tmp/rpm-tmp.2eILT0: line 58: /usr/bin/make: No such file or directory
I searched the entire root directory tree, but make isn't available:
[root#ip-10-112-59-187 /]# find . -name "make"
./etc/mail/make
I'm wondering whether make is actually installed on the Amazon Linux AMI initially? If not, how do I install it?
Preface
The Amazon Linux AMI is (loosely) based on CentOS and a perfectly decent OS for EC2, in fact it has been tailored by Amazon for EC2 specifically:
The Amazon Linux AMI is a supported and maintained Linux image
provided by Amazon Web Services for use on Amazon Elastic Compute
Cloud (Amazon EC2). It is designed to provide a stable, secure, and
high performance execution environment for applications running on
Amazon EC2. It also includes packages that enable easy integration
with AWS, [...]. Amazon Web Services provides ongoing security and
maintenance updates to all instances running the Amazon Linux AMI. [...] [emphasis mine]
However, it is indeed not as widely used yet as some other distributions, with the most popular likely being Ubuntu due to its popularity in general and its dedicated long time tailored support of EC2 in particular (see e.g. the EC2StartersGuide, the Ubuntu Cloud Images or the convenient listing of the Ubuntu AMIs for Amazon EC2 on alestic). This yields two drawbacks:
You'll find much more examples/tutorials/etc. for EC2 based on Ubuntu, making things easier eventually.
You'll find slightly less precompiled packages available for CentOS, requiring compiling your own eventually (but see below).
Solution
That said, CentOS (and the Amazon Linux AMI in turn) uses the Yum package manager to install and update packages from CentOS (and 3rd party) Repositories (Debian/Ubuntu use the APT package manager instead - the inherent concepts are very similar though), see e.g. section Adding Packages in Amazon Linux AMI Basics:
In addition to the packages included in the Amazon Linux AMI, Amazon
provides a yum repository consisting of common Linux applications for
use inside of Amazon EC2. The Amazon Linux AMI is configured to point
to this repository by default for all yum actions. The packages can be
installed by issuing yum commands. For example:
# sudo yum install httpd
Accordingly, you can install make via yum install make (you can get a listing of all readily available packages via yum list all).
Be advised though, that you might actually not need to do that, insofar the Amazon Linux AMI has been built to be binary-compatible with the CentOS series of releases, and therefore packages built to run on CentOS should also run on the Amazon Linux AMI. [emphasis mine]
The desired package pptpd is not part of the standard repositories on CentOS either though, but it is available in the 3rd party Extra Packages for Enterprise Linux (EPEL) repository (see Letter P) - I can't comment on the viability of using this one vs. compiling your own though.
Good luck!
Make is not installed by default on Amazon Linux AMIs. However, you can install it quite easily with yum. If you choose to only install make, you might get some errors later for other packages in the compilation process. If you are going to compile software, you might want to just install all of the development tools at once.
sudo yum groupinstall "Development Tools"
sudo yum groupinstall "Development Tools"
According to the documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/compile-software.html

Resources