Hadoop installation using Cloudera VMware - hadoop

Can anyone please let me know the minimum RAM required (of the host machine) for running Cloudera's hadoop on VMware workstation?
I have 6GB of RAM. The documentation says that the RAM required by the VM is 4 GB.
Still, when I run it, the CentOS is loaded and the VM crashes. I have no other active application running at the time.
Are there any other options apart from installing hadoop manually?

You may be running into your localhost running out or memory or some other issue preventing the machine from booting completely. There are a couple of other options if you don’t want to deal with a manual install:
If you have access to a docker environment try the the docker image they provide.
Run it in the cloud with AWS, GCE, Azure, they usually have a small allotment of personal/student credits available.
For AWS, EMR also makes it easy for you to run something repeatedly.
For really short durations, you could try the demo from Bitnami (https://bitnami.com/stack/hadoop) and just run whatever you need to there.

Related

Too little RAM in Kaa Server

I want to run a test with KAA, so I was trying to install the sandbox in my laptop but it has only 4GB in RAM, so when I try to set up the Virtual Machine the system won't let me set up over 1,6GB and the VM won't start.
So I was trying to install in other old laptop so I installed Ubuntu 16,04 and I followed all the step by step instructions in Kaaproyect's WEB. I could do it, but when I try to start the server can't do it. I was checking the Log error and say me that the problem is in the Java's Virtual machine, can't start because only have 2GB in RAM. I need to test a Little application so is it possible change this requirement in the Java machine and start the system?
PS: I can't buy more Ram.
I recommend you to use amazon AWS. The basic installation where you can run Kaa is free for one year, and it runs quite well there.

Hadoop Hortonworks servers down

I installed Hortonworks Hadoop system on my MacBook yesterday. Everything was fine, servers were working but I turned off and turn on again Virtual box today and try to connect Hadoop I saw so many servers are not working(MapReduce, Hive Yarn..).
It might be a silly question but I am so new. Why happened? When I turn on the virtual box Do I have to wait to see anything?
Here is the picture of my Dashboard
If everything is OK then probably you should check your virtual programs (virtual box or VMWare).
Virtual box is slower than VMWare. When you try to connect Hadoop database and you should see the screen that says you can go http://127.0.... then you can connect otherwise you can see your systems are down.

Mesos slaves on windows server 2012r2, what are my options?

I have a cluster of machines running windows server 2012R2.
I would like to manage them with mesos.
To the best of my knowledge, microsoft is actively contributing to mesos (DC/OS) and will support containers natively on windows server 2016. Furthermore, it looks like there is another type of container flavour using hyper-v.
I can run my mesos masters on linux hosts. However I need my slaves on windows server 2012R2 hosts. It is not clear to me which technologies are already available (and production-ready) for my windows server version.
What are my options to use mesos to manage the resources of my windows server machines ?
Is the mesos-agent for windows (server 2012 R2) production ready ?
Can I use containers (hyper-v or docker) ? If not, is the resource isolation working in Windows (in linux you can use cgroups) ?
Can I run any framework I like or there are some not compatible with windows ?
Mesos version 1.0.0 was recently released that allows you to run the slave and launcher on windows. Not the master unfortunately. Its still Linux, but it doesn't really ever need to be Windows? The slave was the important bit for bringing Windows machines into the Mesos domain.
I've just been investigating using the Mesos-Slave on windows. Pleased to say that it appears to be working OK (this opinion is subject to change as I'm still testing it). Production ready is something any business would have to decide for themselves.
Mesos have always had their own isolation technology, interestingly they have redone their own containerizer implementation and this now takes a number of container image formats, so you can use your Docker images as well as a few others, so this is going to suit you. There was a good presentation on this at MesosCon https://www.youtube.com/watch?v=rHUngcGgzVM
Docker's been stealing the show to some extent. But if you use Mesos-Agent, Windows 2016 and its container technology (Docker) isn't needed and therefore it should run on Windows 2012. I've not got around to trying this yet but its definitely a test worth trying, it opens up deployment options. Anyone?
One thing to remember about containers, they are not VM's. The guest image must be a derivative of the hosts OS, you can't run a Linux image on a Windows machine. Causing me a headache, I can't use servernano at the moment, so my image sizes are 4Gb+, the initial deploy time is hours.

How to install pyspark & spark for learning purpose on a laptop with limited resources?

I have a windows 7 laptop with 6GB RAM . What is the most RAM/resource efficient way to install pyspark & spark on this laptop just for learning purpose. I don't want to work on actual big data but small dataset is ideal since this is just for learning pyspark & spark in general. I would prefer the latest version of Spark.
FYI: I don't have hadoop installed.
Thanks
You've basically got three options:
Build everything from source
Install Virtualbox and use a pre-built VM like Cloudera Quickstart
Install Docker and find a suitable container
Getting everything up and running when you choose to build from source can be a pain. You've got to install the JDK, build hadoop and spark (both of which require you to install additional software to build them), set up a bunch of environment variables and then pray that didn't mess anything up.
VMs are nice, particularly the one from Cloudera, but you'll often be stuck with an older version of Spark and it might be tight with the resources you described.
I'd go with Docker.
Once you've got docker installed, it becomes very easy to try Spark (and lots of other technologies). My favorite containers for playing around use ipython or jupyter notebooks.
Install Docker:
https://docs.docker.com/installation/windows/
Jupyter Notebook Python, Spark, Mesos Stack
https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook
One thing to keep in mind is that you are going to have to allocate a certain amount of memory for the VM and the remaining memory still has to operate Windows. Windows 7 requires a minimum of 1 GB for a 32-bit OS or 2 GB for a 64-bit OS. So likely you are only going to wind up with around 4 GB of RAM for running the VM, which is not much.
Assuming you are 64-bit, note that Cloudera requires a minimum of 4 GB RAM to run CDH 5, but if you want to run Cloudera Express, you need 8 GB.
Running Docker from Windows will require you to use boot2docker, which keeps the entire VM in memory. It uses minimal memory (like around 27 MB) to run, so you should be fine there. A MUCH better solution than running VirtualBox!
Another option to consider would be to spin up a free machine on something like Amazon Web Services (http://aws.amazon.com) or Google Cloud (http://cloud.google.com). Particularly with the later, you can get a free trial amount of credits, which you could use to spin up a machine with more RAM than you would typically get with AWS.

Setup multinode Hadoop cluster using virtual machines on my laptop

I have a windows 7 laptop and I need to setup hadoop (mutlinode) cluster on it.
I have the following things ready -
virtual softwares, i.e. virtualbox and vmware player.
Two virtual machines, i.e.
Ubuntu - for Hadoop master and
Ubuntu - for (1X) Hadoop slave
Has anyone done a setup of such a cluster using Virtual machines on
your laptop ?
If yes please help me to install it.
I've searched over google but I am not getting how to configure this multi-node cluster on hadoop using VMs?
How to run two Ubuntu OS on windows 7 using VMware or virtualbox?
Should we use same Ubuntu version VM image or
vm images with different versions of Ubuntu linux?
Yes you can use ubuntu two node. I am using five nodes(1 master, 4 datanodes).
If you want install multi node in vm ware.
Just download ubutnu from this link: http://www.ubuntu.com/download/desktop
And install two machine. And install java and openssh.
And download shell script for multinode from this link::
https://github.com/tonyreddy/Apache-MultiNode-Insatallation-Shellscript
And try it .....
All the best............
Since you're running Hadoop on your laptop, obviously you're doing it for learning purposes or building POC or functional debugging.
Instead of going through the hassles of installing and setting up Hadoop and related Big-Data softwares, you can simply install a pre-configured pseudo-distributed VM.
Some good options are:
Cloudera QuickStart VM
Hortonworks Sandbox
I've been using the Cloudera's VM on my laptop for quite sometime now and it's been working great.
Cloudera and Hortonworks are the fastest way to get it up and running.
Make sure you have enough RAM installed on your laptop for the Operating system already running, else your laptop will restart abruptly often while you use the Virtual machines.
Let me give you an example -
If you are using Windows 10, it needs 3-5GB RAM to be used to work smoothly,
This means if you load a Virtual Machine of 5GB size in your RAM, Windows may crash when it does not find enough RAM to operate.
You must upgrade the RAM from 8GB to 12GB or best 16GB for smooth operation of your laptop.
Hope it helps

Resources