Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed] - windows

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting stumped by the HDFS section (Module 2) and think it might be easier if I had a Windows specific tutorial. I tried following this one, but some of the steps weren't quite right. I've been trying to find a good tutorial that will work for me on my Windows 7 machine, but am a bit stuck. Is there a good place to go for this? Hadoop seems to be very geared toward Linux users, and unfortunately I have to use my work laptop, which is Windows 7. Can I make this work or does it really only work for Linux users?

The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.
I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.
The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2
This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.
To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun
Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.

I was completely new to hadoop and honestly I found the cloudera tutorials and information completely unhelpful. Give the IBM ones a shot, they're super helpful and they are very friendly for beginners. Step by step instructions for pretty much all of the core hadoop applications and a few specific to IBM's distro.
Here's the download link. --
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF
You have to make an account but it's free and doesn't take that long.
I can't post more than one link right now but is pretty easy to find the tutorials online and they also exist within the VM.
Also there's a forum that I've posted my questions on when I get stuck and somebody from IBM has always helped me out within an hour to a day. Cant post the link but if you google "IBM InfoSphere BigInsights Forum", its the first hit.
Good Luck!

I am trying to learn Hadoop right now also and what I did was download virtual box ( http://www.virtualbox.org/ ) and load some linux images on it and started following tutorials.
You can even get a pre-made hadoop setup image from cloudera. I think this approach is far better than installing and setting up on your prime machine because in the event there's a problem you're main machine won't be effected(you can simply revert to an old copy of your virtual linux image or scrape it and start again without any impact).
Good luck!

Developing Hadoop on windows is doable but hard to get right. It requires installing Cygwin and getting all the environment variables right can be tricky.
To get started developing on windows I recommend installing vmware player and run the pre configured virtual machine by Cloudera. This simply means you will be doing the Hadoop development in linux without rebooting or reinstalling your windows system and without the installation troubles assiciated with cygwin.
https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM

I've been banging my head against the yahoo tutorial for a long time as well. The Eclipse plugin is no longer maintained and is pretty unreliable. Hopefully the cloudera image will do the the trick.

I have just finished the "Hadoop Fundamentals I - Version 2 " at http://bigdatauniversity.com.
It comes with IBM BigBisunessInsight VMWare images and works very well.
The images include a local mode one and a cluster mode one. It is able to simulate a multiple nodes cluster in my Windows 8 workstation with 8GB RAM.
Hope this information be helpful:-)

Related

Not able to download Cloudera

I am trying to find a link to download cloudera zip file on VMWare , but unable to get any.
Tried searching on google , on cloudera website , but in vain.
Can somebody share some views on it.
Cloudera indeed no longer provides a quickstart VM for the legacy CDH 5 platform, this can be seen here as you get redirected to CDP datacenter.
However be aware that CDH 5 relates to a very old distribution. I believe CDH 5 goes end of life this year. Even CDH 6 is not recommended for new clusters, as CDP 7 is already GA for a while. CDP, the Cloudera Data Platform is the successor to both CDH and HDP.
If you want to check out the latest version, there is a trial which should serve for most purposes that you may wanted to use the quickstart. This can be downloaded here.
Full disclosure: I am an employee of Cloudera, the company behind both CDH and CDP.
Seems Cloudera no longer supports Quickstart VM. I too faced this issue a month ago. I finally found a link to archived version here: https://www.youtube.com/watch?v=nnvheRZYLP4
In the description of the above link, you will find the Google drive link for Cloudera 5.13.0. That's all i was able to find.

How to Start working with Hadoop

Hi I want to learn Hadoop.I have basic idea on how hadoop works with MapReduce framework.
Now i want to practice on my local PC so i want to know how to install hadoop on single Node.
I installed VM Workstation 10 and i tried to install any Linux flavour Operating system to install Hadoop , but iam not able to load Ubuntu into VM ware Workstation ,iam getting error as Exiting intel ...,Operating Not found message.
Can any one please provide me steps on how to start with Hadoop installation.
Should i go for any Distributions(Cloudera,Hortonworks,MapR).If that is simple then tell me how to install those distributions.(I tried even with Cloudera importing vmware file into VMWare workstation it did not worked for me)
You can use the VM given by Udacity for its course on Hadoop. I found it really easy to set up.

Documentation for installing and running hadoop 2.2 on Windows [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
With the latest release of Hadoop 2.2 I see that the release notes mentions that this version has significant improvements for running Hadoop on Windows. I downloaded Hadoop 2.2 yesterday and I saw lot of .cmd file alon with .sh files which ensures that this version has scripts and batch files for running Hadoop on Windows environment. However while looking at the Apache Hadoop documentation I couldn't find any step-by-step instructions on how to install and run this newer version on Windows. Besides this it looks like that the newer version has YARN architecture embedded in it and the old configurations provided on some of the tutorials online may be outdated and not applicable anymore. Is there any good documentation for Hadoop 2.2 available online ? I want it specifically for running Hadoop under Windows.
If we directly take the binary distribution of Apache Hadoop 2.2.0 release and try to run it on Microsoft Windows, then we'll encounter ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path.
The binary distribution of Apache Hadoop 2.2.0 release does not contain some windows native components (like winutils.exe, hadoop.dll etc). These are required (not optional) to run Hadoop on Windows.
So you need to build windows native binary distribution of hadoop from source codes following "BUILD.txt" file located inside the source distribution of hadoop. You can follow the following post as well for step by step guide with screen shot
Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS
i was searching for the same thing. I found hortonworks interesteing. They dedicated themselves to installing hadoop on windows. I tried it out, but i still get errors on launching all the services. The only advice i have received in their forum was to delete everything installed and reinstall the whole stuff. I haven't done that yet.
As a prerequesite, you will need to have your windows running on 64 bit.
Try that out and let me know if it worked on your site. There IS A STEP-TO-STEP guide on the hortonworks website.
Cheers Jan

Hadoop on Windows [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am a Microsoft Windows user and new to Apache Hadoop. Most of the Hadoop tutorials I found are Linux based so I am finding it difficult to learn Hadoop on Windows. Any pointers to learn Hadoop on Windows would be best.
If your primary objective is to learn Hadoop then it does not matter you learn it on Windows or Linux, because everything is exactly same on both platforms. I have extensively used Hadoop on both platform and found all the commands and processing are identical on Windows and Linux. So here are my suggestions:
Download VMware VMPlayer on your Windows Machine
Download CDH Virtual Machine for VMware
https://ccp.cloudera.com/display/SUPPORT/Downloads
Access virtual machine in your Windows box and follow the tutorials exactly they are on Linux.
You can also try "Syncfusion BigData Studio" to run a single node cluster in your local machine along with required ecosystems installed with it; Also "Syncfusion BigData Platform" allows you to manage clusters in a much easy way without any manual configuration from user's end; These 2 setups are "100% free for everyone";
To download setup and know more, please refer: http://www.syncfusion.com/products/big-data
You can also try Amazon Elastic MapReduce, this is more efficient if you don't have any linux experience.
I managed to port Hadoop-1.0.1 on windows-7, cygwin-1.7, jdk1.7_x64.
but it's not for beginners: you will need to patch and recompile hadoop.
http://sourceforge.net/p/win-hadoop/wiki/Hadoop-on-Cygwin/
I use Hadoop natively on Windows as a virtual 2-node cluster running on one machine. It runs inside Cygwin (so no VM). Works well to try Hadoop out and I still use it to test new code in small before putting it on the cluster. You basically get every bit of functionality as with a full cluster. Getting it to work can be a bit tricky though.
I used the following short guide: Stanford Hadoop for Windows guide
Which worked fine for me. Very important is that you use 0.20.0! Higher version do not run under Cygwin. I think it is best to leave the number of default nodes to 2. This way you can test if splitting the work across multiple nodes works, but more simultaneous nodes can give you memory problems.
With the latest release of Hadoop 2.2 I see that the release notes mentions that this version has significant improvements for running Hadoop on Windows. I downloaded Hadoop 2.2 yesterday and I saw lot of .cmd file alon with .sh files which ensures that this version has scripts and batch files for running Hadoop on Windows environment. However while looking at the Apache Hadoop documentation I couldn't find any step-by-step instructions on how to install and run this newer version on Windows. Besides this it looks like that the newer version has YARN architecture embedded in it and the old configurations provided on some of the tutorials online may be outdated and not applicable anymore. Is there any good documentation for Hadoop 2.2 available online ? I want it specifically for running Hadoop under Win
I compiled Hadoop on Windows and it's released as zetabyte's "Apache Hadoop for Windows". There is a core/common package available and also a package with a GNU environment (bash, etc.)

Testing web application on Mac/Safari when I don't own a Mac [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 months ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Having been caught out recently when a web site I launched displayed perfectly on IE, Firefox, Chrome and Safari on Windows but was corrupted when viewed using Safari on the Mac (by a potential customer), I need to start testing how my sites look when viewed on a Mac.
Problem is, I don't own a Mac.
I've tried BrowsrCamp, which claims to provide VNC access to a Mac with lots of browsers installed, but after finding it unreliable (so far, it's worked 1 day in the last 5) I need another solution.
Any suggestions?
The best site to test website and see them realtime on MAC Safari is by using
Browserstack
They have like 25 free minutes of first time testing and then 10 free mins each day..You can even test your pages from your local PC by using their WEB TUNNEL Feature
I tested 7 to 8 pages in browserstack...And I think they have some java debugging tool in the upper right corner that is great help
For my case (a small, personal project) https://www.lambdatest.com/ was very helpful. Free tier allows for 6 sessions per month.
Meanwhile, MacOS High Sierra can be run in VirtualBox (on a PC) for Free.
It's not really fast but it works for general browser testing.
How to setup see here: https://www.howtogeek.com/289594/how-to-install-macos-sierra-in-virtualbox-on-windows-10/
I'm using this for a while now and it works quite well
You don't have to pay for those online service and still be able to use latest Safari for free with these choices:
A) Install VMware 🧡
Use Google to find VMware + free MacOs ISO image. This solution is significantly faster than VirtualBox.
B) Install VirtualBox and download free MacOS High Sierra image
See tutorial here: https://www.wikigain.com/install-macos-high-sierra-virtualbox-windows/
Use these vbox settings to increase resolution and memory, but it is still very laggy and slow:
cd "C:\Program Files\Oracle\VirtualBox\"
VBoxManage setextradata "macOS" VBoxInternal2/EfiGraphicsResolution 1920x1080
VBoxManage modifyvm "macOS" --vram 256
If it's a major concern to start doing a lot of testing on a Mac, then I would definitely suggest buying a second hand Mac, or perhaps building a Hackintosh. The former gets you up and running quickly, the latter gives you a lot of power for the same price.
For just the odd piece of testing, running OS X in VMWare on your current PC is a cheaper option.
Unfortunately you cannot run MacOS X on anything but a genuine Mac.
MacOS X Server however can be run in VMWare. A stopgap solution would be to install it inside a VM. But you should be aware that MacOS X Server and MacOS X are not exactly the same, and your testing is not going to be exactly what the user has. Not to mention the $499 price tag.
Simplest way is to buy yourself a cheap mac mini or a laptop with a broken screen used on ebay, plug it onto your network and access it via VNC to do your testing.
Amazon AWS recently launched macOS EC2 instances.
As of now (Dec 2020) they are pretty pricey, you have to reserve them minimum for 24h.
You can connect to the instance via VNC (sample guide for connecting from Windows) and test your browser.
https://turbo.net/ offers a browser sandbox in which containerised virtual machines run browser sessions for you. I tried it with Safari on my Windows development machine and it seems to work very well.
Litmus may help you. It will take screenshots of your webpage(s) in a wide variety of browsers so you can make sure that your site works in all of them. A free alternative (Litmus is a paid service) is Browsershots, but you do get what you pay for. (In some screenshots that Browershots returns, the browser hasn't yet finished loading the webpage...)
Of course, as other people have suggested, buying a Mac is also a good solution (and may be better, depending on the kind of testing you need to do), because then you can test your website yourself in any of the browsers that run under Mac OS X or Windows.

Resources