Hadoop on Windows [closed] - windows

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am a Microsoft Windows user and new to Apache Hadoop. Most of the Hadoop tutorials I found are Linux based so I am finding it difficult to learn Hadoop on Windows. Any pointers to learn Hadoop on Windows would be best.

If your primary objective is to learn Hadoop then it does not matter you learn it on Windows or Linux, because everything is exactly same on both platforms. I have extensively used Hadoop on both platform and found all the commands and processing are identical on Windows and Linux. So here are my suggestions:
Download VMware VMPlayer on your Windows Machine
Download CDH Virtual Machine for VMware
https://ccp.cloudera.com/display/SUPPORT/Downloads
Access virtual machine in your Windows box and follow the tutorials exactly they are on Linux.

You can also try "Syncfusion BigData Studio" to run a single node cluster in your local machine along with required ecosystems installed with it; Also "Syncfusion BigData Platform" allows you to manage clusters in a much easy way without any manual configuration from user's end; These 2 setups are "100% free for everyone";
To download setup and know more, please refer: http://www.syncfusion.com/products/big-data

You can also try Amazon Elastic MapReduce, this is more efficient if you don't have any linux experience.

I managed to port Hadoop-1.0.1 on windows-7, cygwin-1.7, jdk1.7_x64.
but it's not for beginners: you will need to patch and recompile hadoop.
http://sourceforge.net/p/win-hadoop/wiki/Hadoop-on-Cygwin/

I use Hadoop natively on Windows as a virtual 2-node cluster running on one machine. It runs inside Cygwin (so no VM). Works well to try Hadoop out and I still use it to test new code in small before putting it on the cluster. You basically get every bit of functionality as with a full cluster. Getting it to work can be a bit tricky though.
I used the following short guide: Stanford Hadoop for Windows guide
Which worked fine for me. Very important is that you use 0.20.0! Higher version do not run under Cygwin. I think it is best to leave the number of default nodes to 2. This way you can test if splitting the work across multiple nodes works, but more simultaneous nodes can give you memory problems.

With the latest release of Hadoop 2.2 I see that the release notes mentions that this version has significant improvements for running Hadoop on Windows. I downloaded Hadoop 2.2 yesterday and I saw lot of .cmd file alon with .sh files which ensures that this version has scripts and batch files for running Hadoop on Windows environment. However while looking at the Apache Hadoop documentation I couldn't find any step-by-step instructions on how to install and run this newer version on Windows. Besides this it looks like that the newer version has YARN architecture embedded in it and the old configurations provided on some of the tutorials online may be outdated and not applicable anymore. Is there any good documentation for Hadoop 2.2 available online ? I want it specifically for running Hadoop under Win

I compiled Hadoop on Windows and it's released as zetabyte's "Apache Hadoop for Windows". There is a core/common package available and also a package with a GNU environment (bash, etc.)

Related

Suitable hadoop framework for ubuntu

I want to start working with Hadoop and BigData. I need an easy graphical interface to start. I try Hue but I couldn't get it configured.
Please help me to choose my suitable Hadoop.
I use Ubuntu 14.04.
I think Cloudera,sandbox(by hortonworks) is a easy way.Hard way is installation to Ubuntu.Also i have ubuntu 14.04 and Hadoop(hive,pig),Apache spark exist and i dont need open virtual machine.
There are 3 major Hadoop distributions that you can start with.
Cloudera
Hortonworks
MapR
Each one of them has a UI installer and manager. I think the best for you would be though, to use the virtual environment that these vendors provide.
The Hortonworks Developer Sandbox is an image including Hue as UI to get started. However, the downloadable sandbox image is based on CentOS.
If you want to install a Hortonworks Distribution on Ubuntu, you need to run an Ambari installation (Downloads - Hortonworks Hadoop). Be aware that Hue is not included into the default Ambari installation, but Hue can be installed easily separately. To run properly, Hue on Hortonworks still needs Python 2.6.x.
There are some distributions like Cloudera or Hortonworks but their package needs high machine configuration. For example RAM + 16GB and sometimes it's not possible for the user. In addition, they include some Hadoop related project that user doesn't need at all. If you want to enter this field seriously I strongly recommend installing Hadoop on your own. Doing that you do some configuration and will get familiar with many Hadoop concepts.
You can start using this install tutorial.

scidb installation on single debian server

I would like to try scidb as a replacement for hdf5. I would like to test it on my Debian laptop (no clusters) to give it a try.
Is this possible? Might be that Debian (as opposed to Ubuntu) is not supported?
I had no luck with the installation instructions. The deployment script tells that my OS is not supported. The scidb userguide says about some pre-built packages (for Ubuntu, at least). But there is no hint on how to obtain them.
SciDB is limited to RedHat / CentOS, and to Ubuntu as of the 14.9 release. Folk who want to run it on other distros generally compile from code.
Information about how to obtain the sources (as well as current documentation and community discussion) can be found on the forums here ... http://www.scidb.org/forum/. You'll need to register as a forum user.
Specifically, have a look at http://www.scidb.org/forum/viewtopic.php?f=16&t=364. There's a list of releases and links to code bundles there.
I installed SciDB several times using several ways (building from sources and installing from packages, installing the cluster version and the dev version).
Installation from packages
First, if you choose to install from packages (the easiest and fastest way), SciDB is very very sensitive about your Linux version. For example, for the last version of SciDB (14.8), if you choose to install on a Ubuntu, it has to be a Ubuntu 12.04 (and not a 14.04, a common mistake) 64 bits (meaning you have to install the AMD64 version even if you have an Intel processor). It won't work if you have a different version.
If you have an Ubuntu 12.04 AMD64, Paradigm4 provides a deployment script and a documentation with very simple steps:
https://github.com/Paradigm4/deployment
Installation from sources
It's not so difficult but it can be painful and time consuming. I did it because we had to compile a custom plugin for SciDB. You have two types of installation: dev install (in SciDB user directory) and cluster install (in /opt/ directory).
You have to be registered on their forum to have the link to the source code. They provide a specific documentation to build from source.
Good luck.
Several months ago I have dealt with porting SciDB 14.12 to an unsupported Linux - Fedora 19. If your OS is not supported, it will neither be supported if you try to install from the sources. You have to start from the sources, but then you have to adapt the deployment and installation scripts. The sources can be downloaded from SciDB forum.
Namely, add a new platform to deployment/common/os_detect.sh. Then, there are multiple platform specific deployment scripts, such as deployment/common/prepare_toolchain.sh, deployment/common/prepare_coordinator.sh and deployment/common/prepare_chroot.sh. You need to make sure those prepare the environment as they would on the supported OS'. I used Red Hat 6 and CentOS 6 as a reference, as those are both more similar to Fedora. Since your OS is Debian, you can first try falling back to Ubuntu deployment (in os_detect.sh).
Another problem you may encounter are the 3rd party tools, specially Boost. In my case, I had to build it manually from sources.
Sometimes when porting and debugging it is not convenient to run the scripts with deploy.sh, but it's better to run the deployment scripts directly on the target machine (e.g. coordinator).
Probably the best way to install and to start with SciDB is to download a standard image. With this image you only have to import the virtual machine with a software to virtualize. Moreover there are some characteristics of this virtual machine that are great to develop your first applications.
The main advantage, is that you have an API to SciDB queries and another to R. Then you can explore all options and to test SciDB.
This is the version that I downloaded few months ago: http://www.paradigm4.com/forum/viewtopic.php?f=14&t=1329&sid=606f614e401900cfa750375ba56de656
Nevertheless, there is a problem, the community is too poor. There are little people developing with SciDB.

How to install windows updates on multiple clients using YUM on CentOS server

I need to do a installation of windows updates (OS and Microsoft Security Essentials) on multiple clients using Cent-OS Server. I'm not very familiar with Linux systems and I cant find an appropriate tutorial On the internet.
Give OPSI a try, this is an OpenSource Deployment Solution which works on CentOS:
http://www.opsi.org/
This is an integrated system to deploy full installation as well as simple updates or rollouts.
yum installs RedHat/CentOS/Fedora RPM packages on RedHat/CentOS/Fedora systems. It doesn't have anything to do with Windows. It doesn't understand exe files or anything like that.
I'm not even sure where to begin to understand what the question you are actually trying to ask is... unless your question is really just as confused as it sounds and you are failing to understand the difference between package managed linux systems and Windows systems.

Documentation for installing and running hadoop 2.2 on Windows [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
With the latest release of Hadoop 2.2 I see that the release notes mentions that this version has significant improvements for running Hadoop on Windows. I downloaded Hadoop 2.2 yesterday and I saw lot of .cmd file alon with .sh files which ensures that this version has scripts and batch files for running Hadoop on Windows environment. However while looking at the Apache Hadoop documentation I couldn't find any step-by-step instructions on how to install and run this newer version on Windows. Besides this it looks like that the newer version has YARN architecture embedded in it and the old configurations provided on some of the tutorials online may be outdated and not applicable anymore. Is there any good documentation for Hadoop 2.2 available online ? I want it specifically for running Hadoop under Windows.
If we directly take the binary distribution of Apache Hadoop 2.2.0 release and try to run it on Microsoft Windows, then we'll encounter ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path.
The binary distribution of Apache Hadoop 2.2.0 release does not contain some windows native components (like winutils.exe, hadoop.dll etc). These are required (not optional) to run Hadoop on Windows.
So you need to build windows native binary distribution of hadoop from source codes following "BUILD.txt" file located inside the source distribution of hadoop. You can follow the following post as well for step by step guide with screen shot
Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS
i was searching for the same thing. I found hortonworks interesteing. They dedicated themselves to installing hadoop on windows. I tried it out, but i still get errors on launching all the services. The only advice i have received in their forum was to delete everything installed and reinstall the whole stuff. I haven't done that yet.
As a prerequesite, you will need to have your windows running on 64 bit.
Try that out and let me know if it worked on your site. There IS A STEP-TO-STEP guide on the hortonworks website.
Cheers Jan

Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting stumped by the HDFS section (Module 2) and think it might be easier if I had a Windows specific tutorial. I tried following this one, but some of the steps weren't quite right. I've been trying to find a good tutorial that will work for me on my Windows 7 machine, but am a bit stuck. Is there a good place to go for this? Hadoop seems to be very geared toward Linux users, and unfortunately I have to use my work laptop, which is Windows 7. Can I make this work or does it really only work for Linux users?
The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.
I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.
The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2
This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.
To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun
Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.
I was completely new to hadoop and honestly I found the cloudera tutorials and information completely unhelpful. Give the IBM ones a shot, they're super helpful and they are very friendly for beginners. Step by step instructions for pretty much all of the core hadoop applications and a few specific to IBM's distro.
Here's the download link. --
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF
You have to make an account but it's free and doesn't take that long.
I can't post more than one link right now but is pretty easy to find the tutorials online and they also exist within the VM.
Also there's a forum that I've posted my questions on when I get stuck and somebody from IBM has always helped me out within an hour to a day. Cant post the link but if you google "IBM InfoSphere BigInsights Forum", its the first hit.
Good Luck!
I am trying to learn Hadoop right now also and what I did was download virtual box ( http://www.virtualbox.org/ ) and load some linux images on it and started following tutorials.
You can even get a pre-made hadoop setup image from cloudera. I think this approach is far better than installing and setting up on your prime machine because in the event there's a problem you're main machine won't be effected(you can simply revert to an old copy of your virtual linux image or scrape it and start again without any impact).
Good luck!
Developing Hadoop on windows is doable but hard to get right. It requires installing Cygwin and getting all the environment variables right can be tricky.
To get started developing on windows I recommend installing vmware player and run the pre configured virtual machine by Cloudera. This simply means you will be doing the Hadoop development in linux without rebooting or reinstalling your windows system and without the installation troubles assiciated with cygwin.
https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM
I've been banging my head against the yahoo tutorial for a long time as well. The Eclipse plugin is no longer maintained and is pretty unreliable. Hopefully the cloudera image will do the the trick.
I have just finished the "Hadoop Fundamentals I - Version 2 " at http://bigdatauniversity.com.
It comes with IBM BigBisunessInsight VMWare images and works very well.
The images include a local mode one and a cluster mode one. It is able to simulate a multiple nodes cluster in my Windows 8 workstation with 8GB RAM.
Hope this information be helpful:-)

Resources