Set up a Hadoop cluster at home (2PCs) - macos

Hi I was wondering if anyone has some recommendations to set up my cluster. Its mainly for own learning purposes. I am scraping news articles and want to try out some machines learning stuff for clustering etc. My data is around 1-10 GB.
At my disposal I got
Macbook pro with and SSD/ 8GB Memory / i5 (2 cores)
Macbook pro with and SSD/ 8GB Memory / i5 (2 cores)
Desktop PC with ubuntu 1.5 TB of HDD Space / 8GB Memory / i5 (4 cores)
My idea as of now is to use my macbook as a master node and set up 2-4 slave nodes via vm on my desktop pc. Maybe I can get 8-16 gigs extra memory.
I am not so much concerned about performance.
Or should I drop my idea and go the Amazon EC2 route?
Thanks in advance

If your data is less than 10 GB and performance is not a concern, the configuration that you have should be good enough to run Map Reduce / or many of the machine Learning programs. I had set up a 2 node cluster at my home on the laptops that are no match(less Memory and same cores) to the configuration you have and I could run my complex hadoop jobs at a considerable pace. Instead of shedding some money on Amazon EC2 you can go ahead with this.

Related

Usergrid system requirements?

question, how much resources I need to run apache usergrid?
I mean hardware resources, RAM CPU
I want to deploy apache usergrid to be used as backed in our apps, the apps have a low traffic now, are custom projects to be used in small users groups (<10k)
I want to know the minimum requirements to know if it is viable for us, thanks.
From what I see of usergrid, I can think that the most hungry for resources component will be Elasticsearch, so to have a production environment that's working well, I guess you should start following ES' requirements:
At least 8 GB of RAM
At least 4 cores (the more cores you give Elasticsearch, the more love you get as it tends to works with a lot of threading, i.e. give more cores rather than more CPU processing power)
Fast HDDs should perform fine
See this article on Elasticsearch.A last thing is that depending on your system, you can tune several settings on Elasticsearch to achieve a better throughput. (For instance see https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html)
I have deployed latest version of Usergrid i.e. 2.1, which is working smoothly in apache-cassandra-2.1.20, apache-tomcat-7.0.85, elasticsearch-1.7.6 on single node of Cassandra on Ubuntu 16.04 with 8 GB Ram and 180 GB SSD. Hope this will help you.

Can I run Hadoop with Mac pro mid 2010 13inch processor?

My lap has 4gb ram, I want to upgrade to 8gb for Hadoop. But i am not sure that will work or not.
Yes you can start all Hadoop components I can imagine on such machine (even with 4GB, if you optimize well). I suggest using virtual evironments for such task (e.g. virtualbox).
But I am not sure your workload will survive or not (your jobs might be greedy).

With VirtualBox, how does my system's CPU map to the virtual machine?

On my Mac, I have 8 cores working when I run htop.
I can modifyvm to have 8 cores in my virtual machine as well, but is it a one to one mapping?
Thus, if I set 4 virtual cores, do I have the power of 4 physical cores or do I have the power of 8 distributed on 4? Hence, does it make sense to have multiple cores in a vm regarding the overall performance or does it only make sense if you have processes that do not work well without parallelisation?

Unreal engine - complaints for needing 8 GB of memory even though I have 8GB memory

I have a Mac OS X with 8 GB 1600 MHz DDR3 RAM and when I start unreal engine it complaints about Low RAM and asks that for best performance I should have at least 8 GB of RAM.
I'm not sure why is this is case, is it possible that its getting a lower share of RAM or something similar?
The Unreal Editor version is 4.7.5
Edit: The processor is 2.3 GHz Intel Core i7, and is 64 bit.
Edit: The graphic processor is Intel Iris Pro 1024 MB.
Why would UE4 not be able to access all of your RAM? I do not consider that case very likely.
Also, I would not worry about the RAM usage of UE4 too much. I do a lot of work with the engine and it rarely uses more than 3GB of RAM. Just make sure that your system as a whole has enough RAM for the running processes to prevent swapping.
Anyway, the bottleneck in your system is probably the graphics processor, so if your engine runs too slowly you should reduce the performance setting inside the engine.

CPU utilization of virtual machine

I have one physical machine which has 4 CPUs. I want to have some VM on it. The goal of my work is finding CPU utilization. But I am confused how the CPU usage of VMs and physical machine are related. Is there a relation between CPU utilization of VMs and physical machine? How should I measure the CPU utilization of each VM? What is the CPU utilization of the physical machine?
If you are using any xen enabled hypervisor, you can use xenmon or xentop in your Dom0(physical machine) to check the utilization or performance of your VMs.
You can do so by typing xentop(it is /usr/sbin/xentop in my case) on the command line which will give you all the info you are looking for. Alternatively you can use xenmon -l command (/usr/sbin/xenmon.py python script) in my case which shows all the live information about your VMs.

Resources