Can I run Hadoop with Mac pro mid 2010 13inch processor? - hadoop

My lap has 4gb ram, I want to upgrade to 8gb for Hadoop. But i am not sure that will work or not.

Yes you can start all Hadoop components I can imagine on such machine (even with 4GB, if you optimize well). I suggest using virtual evironments for such task (e.g. virtualbox).
But I am not sure your workload will survive or not (your jobs might be greedy).

Related

System too slow after minikube . (4gb ram + core i3 )

I have a core i3 processor 7th generation and 4gb ram in my system. I installed virtual box and then ran .. minikube start on my shell.
As the minikube starts, the system is heavily slowed down. It hangs at the drop of hat. I am learning kubernetes and want to make use of yaml files to deploy and learn which I can't in playground .
And as I delete minikube , system comes back to life.
So, I have two questions. Is the issue with RAM or with core i3 ? The prerequisites for minikube is 2 CPUs . Does that mean minikube alone will have two CPUs for itself and host will not have any?
Whats causing the issue?
Second one, is there any other way i can learn k8 without minikube? playground doesn't provide way for adding yaml file
This is probably not enough memory so the machine starts paging/swapping. Not enough cpu is a problem, too, of course.
If you are using Windows try stopping all programs and docker desktop when using minikube. (minikube uses 2gb ram by default)
Try upgrading to at least 8gb ram when working with virtual maschine. (minikube and docker desktop both create VMs for the linux environment.)
If you are using a linux machine use the docker driver for minikube.

Too little RAM in Kaa Server

I want to run a test with KAA, so I was trying to install the sandbox in my laptop but it has only 4GB in RAM, so when I try to set up the Virtual Machine the system won't let me set up over 1,6GB and the VM won't start.
So I was trying to install in other old laptop so I installed Ubuntu 16,04 and I followed all the step by step instructions in Kaaproyect's WEB. I could do it, but when I try to start the server can't do it. I was checking the Log error and say me that the problem is in the Java's Virtual machine, can't start because only have 2GB in RAM. I need to test a Little application so is it possible change this requirement in the Java machine and start the system?
PS: I can't buy more Ram.
I recommend you to use amazon AWS. The basic installation where you can run Kaa is free for one year, and it runs quite well there.

How to install pyspark & spark for learning purpose on a laptop with limited resources?

I have a windows 7 laptop with 6GB RAM . What is the most RAM/resource efficient way to install pyspark & spark on this laptop just for learning purpose. I don't want to work on actual big data but small dataset is ideal since this is just for learning pyspark & spark in general. I would prefer the latest version of Spark.
FYI: I don't have hadoop installed.
Thanks
You've basically got three options:
Build everything from source
Install Virtualbox and use a pre-built VM like Cloudera Quickstart
Install Docker and find a suitable container
Getting everything up and running when you choose to build from source can be a pain. You've got to install the JDK, build hadoop and spark (both of which require you to install additional software to build them), set up a bunch of environment variables and then pray that didn't mess anything up.
VMs are nice, particularly the one from Cloudera, but you'll often be stuck with an older version of Spark and it might be tight with the resources you described.
I'd go with Docker.
Once you've got docker installed, it becomes very easy to try Spark (and lots of other technologies). My favorite containers for playing around use ipython or jupyter notebooks.
Install Docker:
https://docs.docker.com/installation/windows/
Jupyter Notebook Python, Spark, Mesos Stack
https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook
One thing to keep in mind is that you are going to have to allocate a certain amount of memory for the VM and the remaining memory still has to operate Windows. Windows 7 requires a minimum of 1 GB for a 32-bit OS or 2 GB for a 64-bit OS. So likely you are only going to wind up with around 4 GB of RAM for running the VM, which is not much.
Assuming you are 64-bit, note that Cloudera requires a minimum of 4 GB RAM to run CDH 5, but if you want to run Cloudera Express, you need 8 GB.
Running Docker from Windows will require you to use boot2docker, which keeps the entire VM in memory. It uses minimal memory (like around 27 MB) to run, so you should be fine there. A MUCH better solution than running VirtualBox!
Another option to consider would be to spin up a free machine on something like Amazon Web Services (http://aws.amazon.com) or Google Cloud (http://cloud.google.com). Particularly with the later, you can get a free trial amount of credits, which you could use to spin up a machine with more RAM than you would typically get with AWS.

Setup multinode Hadoop cluster using virtual machines on my laptop

I have a windows 7 laptop and I need to setup hadoop (mutlinode) cluster on it.
I have the following things ready -
virtual softwares, i.e. virtualbox and vmware player.
Two virtual machines, i.e.
Ubuntu - for Hadoop master and
Ubuntu - for (1X) Hadoop slave
Has anyone done a setup of such a cluster using Virtual machines on
your laptop ?
If yes please help me to install it.
I've searched over google but I am not getting how to configure this multi-node cluster on hadoop using VMs?
How to run two Ubuntu OS on windows 7 using VMware or virtualbox?
Should we use same Ubuntu version VM image or
vm images with different versions of Ubuntu linux?
Yes you can use ubuntu two node. I am using five nodes(1 master, 4 datanodes).
If you want install multi node in vm ware.
Just download ubutnu from this link: http://www.ubuntu.com/download/desktop
And install two machine. And install java and openssh.
And download shell script for multinode from this link::
https://github.com/tonyreddy/Apache-MultiNode-Insatallation-Shellscript
And try it .....
All the best............
Since you're running Hadoop on your laptop, obviously you're doing it for learning purposes or building POC or functional debugging.
Instead of going through the hassles of installing and setting up Hadoop and related Big-Data softwares, you can simply install a pre-configured pseudo-distributed VM.
Some good options are:
Cloudera QuickStart VM
Hortonworks Sandbox
I've been using the Cloudera's VM on my laptop for quite sometime now and it's been working great.
Cloudera and Hortonworks are the fastest way to get it up and running.
Make sure you have enough RAM installed on your laptop for the Operating system already running, else your laptop will restart abruptly often while you use the Virtual machines.
Let me give you an example -
If you are using Windows 10, it needs 3-5GB RAM to be used to work smoothly,
This means if you load a Virtual Machine of 5GB size in your RAM, Windows may crash when it does not find enough RAM to operate.
You must upgrade the RAM from 8GB to 12GB or best 16GB for smooth operation of your laptop.
Hope it helps

Better performance from windows virtualboxes on ubuntu or from ubuntu virtualboxes on windows

I am planning to develop an automated test solution with multiple windows machines and multiple ubuntu machines that perform related/interdependent tasks. To start the project, I'd like to have one or two windows machines (virtual) and a few ubuntu machines (virtual) running on a single desktop. It seems likely that I will be pushing a single desktop to the limit here so I am trying to guess if I will have better luck if my host OS is ubuntu or if it is Windows 7. I would be able to use the host OS as one of the machines in my environment. The desktop is some sort of above average Dell, but nothing really impressive.
Does anyone have any insight here? I've worked mostly with VMWare in the past and my host was windows along with my VMs.
Note: VirtualBox is a type-2 hypervisor (it runs on the host OS, not on the hardware like a type-1 hypervisor) and tends to offer weaker performance than, for example, Hyper-V, ESX or XEN (type-1 hypervisors).
Therefore, if performance is a considerable concern, you may squeeze more juice out of Win8 or Windows Server 2012 box running, for example, Hyper-V. Further reading on this here and here (YMMV).
How your environment will run when hosted by a Windows vs. a Linux box is, frankly impossible to tell. I suggest you build your VM's and try dual-booting your machine in Windows and Linux and measuring your scenario. Be sure to have enough RAM in the host to allocate enough working RAM to each VM and enough IO throughput that your host doesn't end up dragging the perf of all VM's down if one VM saturates the machine's IO.
One last note of caution though: Don't completely trust fine-grained perf results measured on VM's - even the best hypervisors cannot truly replicate the perf' characteristics of code running on bare-metal. Treat your measurements as a guideline only.
Measure, then measure again. Measure again just to be sure ... and THEN tweak your config and re-measure, measure, measure ;)
My $0.02:
If its VirtualBox you are using I would go with Ubuntu for certain. I have an AMD 945 Phenom with 16GB of Ram with 12.04LTS 64bit . I can usually have 2 VM's running Windows and / or Ubuntu guests and never consume more than 7 GBs of RAM . If your running them in a testing solution you could expect to probably see 12 maybe 13 GBs of RAM, but the CPU might be your problem. My AMD Phenom runs great, but would be maxed out for sure. I use VMWare at work and on my Laptop and would recommend that if you were running a Windows Host. I also have VMWare on my Ubuntu host, but it just does not run as well as it does on Windows., at least for me.

Resources