Documentation for installing and running hadoop 2.2 on Windows [closed] - windows

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
With the latest release of Hadoop 2.2 I see that the release notes mentions that this version has significant improvements for running Hadoop on Windows. I downloaded Hadoop 2.2 yesterday and I saw lot of .cmd file alon with .sh files which ensures that this version has scripts and batch files for running Hadoop on Windows environment. However while looking at the Apache Hadoop documentation I couldn't find any step-by-step instructions on how to install and run this newer version on Windows. Besides this it looks like that the newer version has YARN architecture embedded in it and the old configurations provided on some of the tutorials online may be outdated and not applicable anymore. Is there any good documentation for Hadoop 2.2 available online ? I want it specifically for running Hadoop under Windows.

If we directly take the binary distribution of Apache Hadoop 2.2.0 release and try to run it on Microsoft Windows, then we'll encounter ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path.
The binary distribution of Apache Hadoop 2.2.0 release does not contain some windows native components (like winutils.exe, hadoop.dll etc). These are required (not optional) to run Hadoop on Windows.
So you need to build windows native binary distribution of hadoop from source codes following "BUILD.txt" file located inside the source distribution of hadoop. You can follow the following post as well for step by step guide with screen shot
Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

i was searching for the same thing. I found hortonworks interesteing. They dedicated themselves to installing hadoop on windows. I tried it out, but i still get errors on launching all the services. The only advice i have received in their forum was to delete everything installed and reinstall the whole stuff. I haven't done that yet.
As a prerequesite, you will need to have your windows running on 64 bit.
Try that out and let me know if it worked on your site. There IS A STEP-TO-STEP guide on the hortonworks website.
Cheers Jan

Related

Not able to download Cloudera

I am trying to find a link to download cloudera zip file on VMWare , but unable to get any.
Tried searching on google , on cloudera website , but in vain.
Can somebody share some views on it.
Cloudera indeed no longer provides a quickstart VM for the legacy CDH 5 platform, this can be seen here as you get redirected to CDP datacenter.
However be aware that CDH 5 relates to a very old distribution. I believe CDH 5 goes end of life this year. Even CDH 6 is not recommended for new clusters, as CDP 7 is already GA for a while. CDP, the Cloudera Data Platform is the successor to both CDH and HDP.
If you want to check out the latest version, there is a trial which should serve for most purposes that you may wanted to use the quickstart. This can be downloaded here.
Full disclosure: I am an employee of Cloudera, the company behind both CDH and CDP.
Seems Cloudera no longer supports Quickstart VM. I too faced this issue a month ago. I finally found a link to archived version here: https://www.youtube.com/watch?v=nnvheRZYLP4
In the description of the above link, you will find the Google drive link for Cloudera 5.13.0. That's all i was able to find.

How/where to download openjdk/openjre for windows [duplicate]

This question already has answers here:
OpenJDK availability for Windows OS [closed]
(11 answers)
Closed 6 years ago.
How do I go about downloading OpenJDK and OpenJRE for Windows ?
Is there a Server version of Open JRE ?
The reason I'm asking is since googling around didn't get me anywhere. Since more and more companies have started looking at openjdk/openjre, and some of us need to deploy/develop on windows, this is a valid question.
If you think building the open jdk/jre is the only solution for now, pls. say so.
All : It's not a duplicate. Since the original question was asked (and corresponding answers), JDK 8 has been released. The OpenJDK site does not have OpenJDK 8 or OpenJRE 8 binaries. It's quite difficult for a java developer to build one for himself. I'm looking for an "Official" OpenJDK, client OpenJRE, server OpenJRE that I can download and redistribute as per the license.
How do I go about downloading OpenJDK and OpenJRE for Windows ?
On the OpenJDK home page it states
Download and install the open-source JDK 8 for most popular Linux distributions. If you came here looking for Oracle JDK 8 product binaries for Solaris, Linux, Mac OS X, or Windows, which are based largely on the same code, you can download them from java.oracle.com.
You can download and built the OpenJDK yourself, as others have done, however I am not sure this is a good idea for a production instance and it is a pretty complicated product to build and test.
Is there a Server version of Open JRE ?
Yes, the server JVM runs by default on Linux and 64-bit windows.
If you think building the open jdk/jre is the only solution for now
There is plenty of other free JDKs including Oracle's and IBM's
If you want support I suggest considering Azul's Zulu.

scidb installation on single debian server

I would like to try scidb as a replacement for hdf5. I would like to test it on my Debian laptop (no clusters) to give it a try.
Is this possible? Might be that Debian (as opposed to Ubuntu) is not supported?
I had no luck with the installation instructions. The deployment script tells that my OS is not supported. The scidb userguide says about some pre-built packages (for Ubuntu, at least). But there is no hint on how to obtain them.
SciDB is limited to RedHat / CentOS, and to Ubuntu as of the 14.9 release. Folk who want to run it on other distros generally compile from code.
Information about how to obtain the sources (as well as current documentation and community discussion) can be found on the forums here ... http://www.scidb.org/forum/. You'll need to register as a forum user.
Specifically, have a look at http://www.scidb.org/forum/viewtopic.php?f=16&t=364. There's a list of releases and links to code bundles there.
I installed SciDB several times using several ways (building from sources and installing from packages, installing the cluster version and the dev version).
Installation from packages
First, if you choose to install from packages (the easiest and fastest way), SciDB is very very sensitive about your Linux version. For example, for the last version of SciDB (14.8), if you choose to install on a Ubuntu, it has to be a Ubuntu 12.04 (and not a 14.04, a common mistake) 64 bits (meaning you have to install the AMD64 version even if you have an Intel processor). It won't work if you have a different version.
If you have an Ubuntu 12.04 AMD64, Paradigm4 provides a deployment script and a documentation with very simple steps:
https://github.com/Paradigm4/deployment
Installation from sources
It's not so difficult but it can be painful and time consuming. I did it because we had to compile a custom plugin for SciDB. You have two types of installation: dev install (in SciDB user directory) and cluster install (in /opt/ directory).
You have to be registered on their forum to have the link to the source code. They provide a specific documentation to build from source.
Good luck.
Several months ago I have dealt with porting SciDB 14.12 to an unsupported Linux - Fedora 19. If your OS is not supported, it will neither be supported if you try to install from the sources. You have to start from the sources, but then you have to adapt the deployment and installation scripts. The sources can be downloaded from SciDB forum.
Namely, add a new platform to deployment/common/os_detect.sh. Then, there are multiple platform specific deployment scripts, such as deployment/common/prepare_toolchain.sh, deployment/common/prepare_coordinator.sh and deployment/common/prepare_chroot.sh. You need to make sure those prepare the environment as they would on the supported OS'. I used Red Hat 6 and CentOS 6 as a reference, as those are both more similar to Fedora. Since your OS is Debian, you can first try falling back to Ubuntu deployment (in os_detect.sh).
Another problem you may encounter are the 3rd party tools, specially Boost. In my case, I had to build it manually from sources.
Sometimes when porting and debugging it is not convenient to run the scripts with deploy.sh, but it's better to run the deployment scripts directly on the target machine (e.g. coordinator).
Probably the best way to install and to start with SciDB is to download a standard image. With this image you only have to import the virtual machine with a software to virtualize. Moreover there are some characteristics of this virtual machine that are great to develop your first applications.
The main advantage, is that you have an API to SciDB queries and another to R. Then you can explore all options and to test SciDB.
This is the version that I downloaded few months ago: http://www.paradigm4.com/forum/viewtopic.php?f=14&t=1329&sid=606f614e401900cfa750375ba56de656
Nevertheless, there is a problem, the community is too poor. There are little people developing with SciDB.

Hadoop on Windows [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am a Microsoft Windows user and new to Apache Hadoop. Most of the Hadoop tutorials I found are Linux based so I am finding it difficult to learn Hadoop on Windows. Any pointers to learn Hadoop on Windows would be best.
If your primary objective is to learn Hadoop then it does not matter you learn it on Windows or Linux, because everything is exactly same on both platforms. I have extensively used Hadoop on both platform and found all the commands and processing are identical on Windows and Linux. So here are my suggestions:
Download VMware VMPlayer on your Windows Machine
Download CDH Virtual Machine for VMware
https://ccp.cloudera.com/display/SUPPORT/Downloads
Access virtual machine in your Windows box and follow the tutorials exactly they are on Linux.
You can also try "Syncfusion BigData Studio" to run a single node cluster in your local machine along with required ecosystems installed with it; Also "Syncfusion BigData Platform" allows you to manage clusters in a much easy way without any manual configuration from user's end; These 2 setups are "100% free for everyone";
To download setup and know more, please refer: http://www.syncfusion.com/products/big-data
You can also try Amazon Elastic MapReduce, this is more efficient if you don't have any linux experience.
I managed to port Hadoop-1.0.1 on windows-7, cygwin-1.7, jdk1.7_x64.
but it's not for beginners: you will need to patch and recompile hadoop.
http://sourceforge.net/p/win-hadoop/wiki/Hadoop-on-Cygwin/
I use Hadoop natively on Windows as a virtual 2-node cluster running on one machine. It runs inside Cygwin (so no VM). Works well to try Hadoop out and I still use it to test new code in small before putting it on the cluster. You basically get every bit of functionality as with a full cluster. Getting it to work can be a bit tricky though.
I used the following short guide: Stanford Hadoop for Windows guide
Which worked fine for me. Very important is that you use 0.20.0! Higher version do not run under Cygwin. I think it is best to leave the number of default nodes to 2. This way you can test if splitting the work across multiple nodes works, but more simultaneous nodes can give you memory problems.
With the latest release of Hadoop 2.2 I see that the release notes mentions that this version has significant improvements for running Hadoop on Windows. I downloaded Hadoop 2.2 yesterday and I saw lot of .cmd file alon with .sh files which ensures that this version has scripts and batch files for running Hadoop on Windows environment. However while looking at the Apache Hadoop documentation I couldn't find any step-by-step instructions on how to install and run this newer version on Windows. Besides this it looks like that the newer version has YARN architecture embedded in it and the old configurations provided on some of the tutorials online may be outdated and not applicable anymore. Is there any good documentation for Hadoop 2.2 available online ? I want it specifically for running Hadoop under Win
I compiled Hadoop on Windows and it's released as zetabyte's "Apache Hadoop for Windows". There is a core/common package available and also a package with a GNU environment (bash, etc.)

Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting stumped by the HDFS section (Module 2) and think it might be easier if I had a Windows specific tutorial. I tried following this one, but some of the steps weren't quite right. I've been trying to find a good tutorial that will work for me on my Windows 7 machine, but am a bit stuck. Is there a good place to go for this? Hadoop seems to be very geared toward Linux users, and unfortunately I have to use my work laptop, which is Windows 7. Can I make this work or does it really only work for Linux users?
The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.
I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.
The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2
This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.
To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun
Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.
I was completely new to hadoop and honestly I found the cloudera tutorials and information completely unhelpful. Give the IBM ones a shot, they're super helpful and they are very friendly for beginners. Step by step instructions for pretty much all of the core hadoop applications and a few specific to IBM's distro.
Here's the download link. --
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF
You have to make an account but it's free and doesn't take that long.
I can't post more than one link right now but is pretty easy to find the tutorials online and they also exist within the VM.
Also there's a forum that I've posted my questions on when I get stuck and somebody from IBM has always helped me out within an hour to a day. Cant post the link but if you google "IBM InfoSphere BigInsights Forum", its the first hit.
Good Luck!
I am trying to learn Hadoop right now also and what I did was download virtual box ( http://www.virtualbox.org/ ) and load some linux images on it and started following tutorials.
You can even get a pre-made hadoop setup image from cloudera. I think this approach is far better than installing and setting up on your prime machine because in the event there's a problem you're main machine won't be effected(you can simply revert to an old copy of your virtual linux image or scrape it and start again without any impact).
Good luck!
Developing Hadoop on windows is doable but hard to get right. It requires installing Cygwin and getting all the environment variables right can be tricky.
To get started developing on windows I recommend installing vmware player and run the pre configured virtual machine by Cloudera. This simply means you will be doing the Hadoop development in linux without rebooting or reinstalling your windows system and without the installation troubles assiciated with cygwin.
https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM
I've been banging my head against the yahoo tutorial for a long time as well. The Eclipse plugin is no longer maintained and is pretty unreliable. Hopefully the cloudera image will do the the trick.
I have just finished the "Hadoop Fundamentals I - Version 2 " at http://bigdatauniversity.com.
It comes with IBM BigBisunessInsight VMWare images and works very well.
The images include a local mode one and a cluster mode one. It is able to simulate a multiple nodes cluster in my Windows 8 workstation with 8GB RAM.
Hope this information be helpful:-)

Resources