How Do I Build Hadoop From Source Without Errors - hadoop

I have spent weeks trying to resolve different errors in building Hadoop. SO was helpful in pointing me towards the answer to an occasional problem, but after a lot of searching here on SO, I was never able to get the whole thing to build.
It’s been a couple of weeks since all this started so I have forgotten most of the explicit error messages, but the problems I had included
Protobuff versions being wrong
SSH connections not working
Mojofailure Exceptions during build
Incorrect Java versions being used
C++ sanity checks failing
a host of other crap that made no sense to me and I couldn't decipher root causes for
Today I finally got Hadoop to build from the git repo source and wanted to record the process for the SO community members that face similar problems.
For those of you trying to build Hadoop from source, here is how I got everything to compile from source.
Some notes on configuration:
I am installing Hadoop in a virtual environment, in my case VirtualBox.
The Host machine runs Windows 7 x64
The Guest VM runs CentOS 7 x64
I am aiming for the bare minimum installation

How to Build Hadoop From Source Without Errors
Preliminary Downloads:
You need to download the following before you begin.
Virtual Box (I used version 4.3.16 r95972 available here: old VB builds)
CentOS 7 minimal iso file from - http://www.centos.org/download/
WinSCP (version 5.7.4) - https://winscp.net/eng/download.php
This walk through consists of 4 Phases
Create a CentOS Appliance inside VirtualBox that can support
building Hadoop
Add SSH capabilities to the Appliance so that downloaded prerequisites can be scp’ed from the Host to the Guest VM
Install all the things (utilities and dependencies) needed to build Hadoop
Build Hadoop without errors
Phase 1 - Creating a CentOS Appliance for VirtualBox
Start by opening VirtualBox and clicking on the “New” button in the top left corner. This will open a new window asking for some information about the virtual machine appliance you want to create.
Name it “CentOS x64 – Hadoop Base”
Select Linux as the “Type” of operating system
Select RedHat (64 Bit) as the “Version.”
Click “Next”
Follow the remaining prompts in the VM creation wizard. The only things I changed from the defaults where on the “Memory size” passage (I used 4096 MB) and the “File location and size” passage (I used 128 GB). I would encourage you to do the same if your system can support it. Leave all other defaults alone
Click “Create” on the last passage of the VM creation wizard
Once created, the VM will show up on the left hand pane of the VirtualBox Window.
Double click on the VM you just created and wait for the dialog to come up asking you for the iso file you want to use.
When the dialog appears, click on the folder icon on the right and navigate to / select the “CentOS minimal iso” you downloaded during the Preliminary steps.
Once the iso is listed in the drop down box Click “Start”
When prompted, after the VM boots, select “Install CentOS 7” (this is not the default, you have to press the “up” arrow) and press “Enter”. When the setup program loads, the first thing it will ask you about is your keyboard layout. I leave the defaults in place and just click the “Continue” button in the lower right corner. This brings up the Installation Summary page on which you need to make changes to 2 areas: “Installation Destination” and “Network & Host Name”
Click “Installation Destination”
Double Click the virtual disk (make sure that the background is blue and the check mark is there)
Click “Done” to go back to the "Installation Summary" page.
Back on the Installation Summary page,
- Click “Network and Host Name”
- In this menu screen turn on Ethernet networking by clicking the toggle switch on the right.
- Click “Done” in the top left corner.
With both modifications complete you can click the “Begin Installation” button in the bottom right corner. As the iso installs to your system you should take the time to provide a root password by
Clicking on that option at the top left of the page
Filling out the form it brings up
Clicking “Done” (if you select a password considered weak, you have to double click “Done” to accept anyway).
I added a password, but I did not bother to add any non-root users.
Once everything is installed click on the “Reboot” button that appears in the bottom right of the screen.
Once the system reboots select CentOS 7 and allow it to boot. Check your credentials by logging in as root, and then close the CentOS VM by clicking on the red X button at the top right of the window and selecting “Power off the machine” when prompted.
This completes Phase 1
You should now be looking at just VirtualBox
Phase 2 - Adding SSH capabilities to the VM to support download transfers
Open the settings of your CentOS Appliance by first clicking the appliance
Next, click the “Settings” button on the top left of VirtualBox’s main menu. This will bring up a new window.
In the left hand pane of the new window, click on “Network” which will display a set of adapter tabs.
Now click on the Triangle to the left of the label “Advanced”.
This will reveal a series of options, but the one you need to click on is the button labeled “Port Forwarding”
This will bring up another window where you can set port forwarding rules.
Click the green plus sign in the top right corner. This will produce a row where you can enter in a port forwarding rule.
Add the following rule to the row
Name= ssh, Host port =2222, Guest port = 22
Click the “OK” button on the Port Forwarding window
Click the “OK” button on the Appliance Settings window.
With this rule in place you should now be able to ssh from your Windows Host to the CentOS Guest on port 2222 and avoid the following error:
ssh: connect to host localhost port 22: Connection refused
You should now be looking at just VirtualBox again.
Start the CentOS VM appliance and log in as root.
Once logged in, execute the following line from the command prompt.
yum –y install openssh-server openssh-client
This command will install a ssh server on the CentOS VM. After the install, confirm that the ssh server is running by typing the following command.
ps –aux | grep sshd
This command should return 2 processes showing sshd (the ssh daemon). One is the grep command itself. The other is your server running in the background.
Now we need to make sure that ssh did in fact generate the keys it will need to communicate with WinSCP. Issue the following command and make sure that all keys’ byte size values are non-zero.
ls -l /etc/ssh
If the sizes of the keys are 0 bytes, you need to remove them, restart the sshd daemon, and validate that the keys were regenerated when sshd restarted. To do all that, execute the following commands
rm –rf /etc/ssh/ssh*key*
systemctl restart sshd
ls -l /etc/ssh
This processes will help avoid unexpected “connection closed by 127.0.0.1” errors.
Now that we have an ssh daemon up and keys generated, we are going to test the connection. Start by opening WinSCP. And entering in the following values on the start menu that pops up.
Host name = localhost, Port number = 2222, User name = root, Password = , File Protocol = SCP.
Note that you need to set “File Protocol” last. If you don’t, it will try to outsmart you when you enter in a “Port number” that it isn’t expecting. When all the values are entered. Click the “Login” Button and accept / click Update or OK to any security warnings you get.
Once you have logged in, move a file between the Host and VM Guest to confirm everything is working.
Though I won’t focus on it here, you can also us Cygwin to connect to the VM, and it is useful for diagnosing connection problems. The command you need to enter to get verbose diagnostic output is
ssh –vvv –p 2222 root#localhost
This completes Phase 2
Phase 3 - Install Utilities and Dependencies Needed to Build Hadoop
Our CentOS distribution really is “barebones” and so we need to install everything required to build Hadoop. We will do this by downloading most things in Windows and then moving them over to the VM via WinSCP.
Before we start, we need to add a “downloads” directory to the home directory of the root user on the CentOS VM by issuing the following command at the CentOS command line.
mkdir ~/downloads/
We can now begin downloading Hadoop dependencies. We will download everything to Windows and then use WinSCP to move it over to the VM.
Start by downloading the Java 7 JDK from - http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
Ignore the “End of Public Updates” error message at the top of the page. Java 7 is what Apache recommends.
You want to download the jdk-7u79-linux-x64.rpm file
Once downloaded use WinSCP to navigate to the Downloads directory of the Host computer and the newly created “downloads” directory of the Guest VM (you may need to click the refresh icon on the VM side of the WinSCP pane to see the directory). Drag and drop the jdk file from the Host over to the VM Guest.
Now we just need to install the JDK on the CentOS VM. From the CentOS command line change your directory to the “downloads” folder we created under root’s home, once in the “downloads” directory use rpm to install java 7.
cd ~/downloads
rpm –ihv jdk-7u79-linux-x64.rpm
Once installation is complete, you can verify it by typing
java –version
Which will produce output stating that you have a Java Run Time Environment installed.
Next we are going to install a subset of the packages Hadoop needs to build successfully. The list is taken straight from the Apache website: https://wiki.apache.org/hadoop/HowToContribute and the command we need to enter on the command line to retrieve them is:
yum -y install lzo-devel zlib-devel gcc autoconf automake libtool openssl-devel fuse-devel
Next we are going to install Apache’s Maven. You can download it here:
https://archive.apache.org/dist/maven/binaries/
Apache’s website says you can use version 3+. I used version 3.2.2 so download this file to follow along:
apache-maven-3.2.2-bin.tar.gz
Once you have the file downloaded, use WinSCP to move it from your host computer to the Guest VM ‘s “downloads” folder just like you did with the JDK file. We then untar the file into the /usr/local/ directory, and create a symbolic link in the /usr/local/ directory that points to the maven folder with the following three commands.
tar xzf apache-maven-3.2.2-bin.tar.gz -C /usr/local
cd /usr/local
ln -s apache-maven-3.2.2 maven
We now need to add Maven’s bin directory to the $PATH variable. We do so by editing the .bashrc file in root’s home directory. Open the file for editing in vi by using the following command
vi ~/.bashrc
This will bring up the bash file in the vi editor ( if you need it, a tutorial on vi can be found here: http://www.unix-manuals.com/tutorials/vi/vi-in-10-1.html ) follow these instructions to correctly update the file.
Enter Edit mode by pressing the “a” key
Add the following lines to the file:
export M2_HOME=/usr/local/maven
export PATH=$M2_HOME/bin:$PATH
Press the “Esc” key to leave Edit mode
Type “:wq” – it will automatically show up at the vi command line (bottom left of the screen)
Press “Enter”
Now log out of CentOS. Log back into CentOS, and check to make sure that the new PATH variable is appropriately set using the following commands.
exit
<log back in as root>
mvn –version
you should see output indicating that maven is currently installed
Next we need to install C++ support for gcc. We do that with the following one line command
yum –y install gcc-c++.x86_64
Next we need to install git so that we can pull down the Hadoop source code.
yum –y install git
Once you have git. Go ahead and pull down the Hadoop source. There is still one more thing (ProtocolBuffer) we need before we can build the source code, but we need to see the BUILDING.txt file in the repo before we download ProtocolBuffer to make sure that we get the right version.
To get the Hadoop source we run the git clone command. Simply execute the following commands from the CentOS command line to download the Hadoop repo.
cd /usr/local
git clone git://git.apache.org/hadoop.git
The clone operation will place a “hadoop” directory in your /usr/local directory. When the operation has completed and you have the command prompt back, take a look at the BUILDING.txt file in your new hadoop directory using the following command:
less /usr/local/hadoop/BUILDING.txt
In the “Requirements” section of the file it states the version of ProtocolBuffer we need for Hadoop to build correctly. In this case it’s ProtocolBuffer 2.5.0. With this information in hand we go back to the command prompt by pressing “q” for quit.
Now we can finally, install the last of the things Hadoop needs: ProtocolBuffer. To get the right version of ProtocolBuffer, we visit the ProtocolBuffer release page:
https://github.com/google/protobuf/releases
and scroll down until we see the version needed for Hadoop to compile. For this walkthrough you want to download the following file.
protobuf-2.5.0.tar.gz
Once downloaded, use WinSCP and transfer it to the VM’s “downloads” folder like you did earlier for the other downloads. Once the file is sitting in the VM’s “downloads” folder, issue the following commands to install ProtocolBuffer on CentOS
cd ~/downloads
tar xzf protobuf-2.5.0.tar.gz -C /usr/local
cd /usr/local/protobuf-2.5.0
./configure
`make'
make install
Once this is done all the prerequisite utilities and dependencies needed for building Hadoop will be installed.
This completes Phase 3
Phase 4 - Build Hadoop Without Errors
Go to the Hadoop directory, and run Maven skipping the tests using the following commands:
cd /usr/local/hadoop
mvn clean install -DskipTests
The build should now occur without any problems and when everything is finished, you should see a screen like the one below.
This completes the walk through
I hope some of you find it helpful.

I know this was a question of how to build hadoop from source, but after running into a variety of errors throughout the build process, I found this extremely helpful. Someone has already built Hadoop on Windows and posted the binaries. I setup this version on my Windows machine and it is working great:
http://www.barik.net/archive/2015/01/19/172716/

Related

Vagrant dev build throwing errors

So I am having some issues with vagrant. I had initially tried to report this as an issue on the vagrant github issue boards, but they kept closing the issues without responding to them. I guess they decided I wasn't worth their time, or they were just behaving unprofessionally. Anyway, Here is the problem: I use vagrant with virtualbox, and a new version of virtualbox was recently released that is, unfortunately, not compatible with the latest vagrant installation.
However, the people at hashicorp have already updated the source code so that it is compatible with the new version of virtualbox, but you have to build the vagrant executable from the source repo (instructions here). So I followed the instructions and vagrant is working just like it used to.....when the only command I need to run is vagrant up. I should also mention ahead of time that, in order to run the vagrant dev build, the current working directory needs to be the root of the source code repo and the dev build can only be run using the following command with ruby:
bundle exec vagrant
With that being said, I needed to update one of my custom boxes, so I built a vm in the updated version of virtualbox and ran the below command
bundle exec vagrant package --base go --vagrantfile ../../vagrant/vagrantfile
After an extended period of time, vagrant spat back out the following error
The executable 'bsdtar' Vagrant is trying to run was not found in the %PATH% variable. This is an `error. Please verify this software is installed and on the path.`
I should also note that I use a windows machine and that this error never occurred when using the installed version of vagrant. At this point, I had posted the issue on github to get some input from the devs, but they (very unprofessionally) decided to ignore my requests for help and close the issues without providing any response. I used the GNUwin32 project to make numerous unix commands available to my Windows environment and added the folder to my PATH environment variable. I then run the same command again to create my new box and it works!! So then I upload it to the vagrant cloud and attempt to update the vagrant box that is stored on my system by running the following command:
bundle exec vagrant box update
Then, after waiting for a while, vagrant then spat this error out at me:
The box failed to unpackage properly. Please verify that the box
file you're trying to add is not corrupted and that enough disk space
is available and then try again.
The output from attempting to unpackage (if any):
C:\gnuwin32\bin/bsdtar.EXE: invalid option -- s
Usage:
List: bsdtar.EXE -tf <archive-filename>
Extract: bsdtar.EXE -xf <archive-filename>
Create: bsdtar.EXE -cf <archive-filename> [filenames...]
Help: bsdtar.EXE --help
Another error, and still involving this bsdtar tool. It does not appear that anyone else is reporting the issue I am running into because I think they are just waiting for hashicorp to release the new official installation, but, just to give you a look into their priorities, the version of virtualbox that was released which no longer worked with vagrant was released back on December 10. It has been over a month since and there is still no updated release.
So, I am hoping that someone out there might be able to find out why I keep running into these errors when trying to use vagrant's dev build and provide a solution. If not, then maybe if someone else is able to reproduce the issue and report it to hashicorp, maybe they will listen to someone else.
If you are on Ubuntu 20.04 then bsdtar was removed. Try to install libarchive-tools package.
$ sudo apt-get install libarchive-tools
I figured it out. My original hypothesis was correct: since vagrant is a tool that was built primarily to be run on linux machines, then vagrant runs in windows, the installation includes a mingw environment with all of the dependencies vagrant needs to function and which the installed vagrant executable imports into the console session when run. This why the dev build kept failing: because it was not importing this mingw environment. So, in order to fix the issue, I first cloned the vagrant source code repo from github and followed the instructions I linked to above to build the executable from the source repo. I then copied all of the files in the source repo into the following folder:
<hashicorp install folder root>\Vagrant\embedded\gems\2.2.6\gems\vagrant-<version num>
So, for me, the destination directory is C:\HashiCorp\Vagrant\embedded\gems\2.2.6\gems\vagrant-2.2.6
This directory is identical to the source code repo, and copying the source code repo to the above folder replaces the installation version of vagrant with the dev build. After I did this, running the vagrant commands which had failed previously normally (as in, without using ruby or bundle) worked. I hope this helps someone else out there who Hashicorp has decided is not worth their time.

How to copy intl.so to /usr/lib/php/extensions

I wanted to try Moodle (LMS) on my computer. It requires intl extension.
I used sudo port install php71-intl and successfully installed php71-intl. But it also installed php71 and other dependencies on /opt/local/.
I am using Mac built-in php. Its directories are
/usr/lib/php/extensions/no-debug-non-zts-20160303
/usr/bin/php
I intended to copy the /opt/local/lib/php71/extensions/no-debug-non-zts-20160303/intl.so to /usr/lib/php/extensions/no-debug-non-zts-20160303. But terminal said Operation not permitted.
I tried sudo pecl install intl but failed with make error.
This method sudo port install php71-intl can install the intl.so successfully but in its own direcotry.
So how I can copy the intl.so? Thank you!
You have to disable SIP to do that. First, restart your Mac and before OS starts up, hold down Command-R and keep it held down until you see an Apple icon and a progress bar. Release. This boots you into Recovery. From the Utilities menu, select Terminal and at the prompt type exactly the following and then press Return: csrutil disable
Terminal should display a message that SIP was disabled. From the menu, select Restart. You can re-enable SIP by following the above steps, but using csrutil enable instead.
Find this file "environment.xml" and comment all lines that have "intl" and reload the page. Continue button appears and installation completes without any error. My instance started working like charm, I use it to test out and create courses with SCORM packages, no errors yet.
environment.xml file is located in "admin" folder of moodle directory

How to check Vagrant up progress

I am using vagrant for first time.
I am trying to download a VM by running "vagrant up" command.
corresponding vagrant file is https://github.com/aalkilani/spark-kafka-cassandra-applying-lambda-architecture/tree/master/vagrant
i have a slow internet connection ..., its been around 1 hour i am not sure how much of download has happened .... few questions
How to check the % of download completed ( i know it will tell me when it reaches 20% ... but how to check % of downloaded )
Which temp directory does the vagrant download to ( if i have to stop download in between and resume tomorrow ... not sure if i need to cleanup or it will resume from where it left)
I am using Vagrant2.0.0 on windows7
looking forward to learn from your experience.
Acutally when you execute the vagrant up in the console, it will show the download processes.
But for your question, all the downloaded boxes are house in "C:\Users\USERNAME\.vagrant.d\boxes" folder.
Baically due to the poor connection, vagrant download the boxes very slow, so it is high recommand to download your base box in http://www.vagrantbox.es/ or https://app.vagrantup.com/boxes/search with the download tool, then you can add it by
vagrant box add <title> <path_to_file>
vagrant init <title>
vagrant up
Boxes are first downloaded from ~/.vagrant.d/tmp so if you interrupt the download, it will remain here, if you have many unsuccessful downloads, you might want to clean this directory.
If download is in progress, you can follow the size of the file from this directory, if anything is moving or not.
once box file is fully downloaded, it will be installed in ~/.vagrant.d/boxes
vagrant internally uses curl to download the box, you can use this tool or wget to download the particular box file and follow the progress directly from there.
For your particular case, the direct URL for the box is https://vagrantcloud.com/aalkilani/boxes/spark-kafka-cassandra-applying-lambda-architecture/versions/0.0.6/providers/virtualbox.box, you can download this file directly using your preferred tool. Once you have downloaded, you can install the box directly from the file.

Docker "Can't add file <path> to tar: readlink <path> The system cannot find the file specified"

I'm a beginner at docker trying to get it working on my Windows 10 machine using creator's update bash subsystem. My dockerfile builds fine on my mac, but when I try docker build from the same file on windows, I get errors like the following:
time="2017-08-28T14:44:36-07:00" level=error
msg="Can't add file \\\\?\\C:\\Users\\username\\Workspace\\...\\node_modules\\.bin\\nodemon
to tar: readlink \\\\?\\C:\\Users\\username\\Workspace\\...\\node_modules\\.bin\\nodemon:
The system cannot find the file specified."
This is printed out for all of my node_modules dependencies. My first question is: What is triggering this "add" operation for all of my node_modules? These error messages are printed out before the first line of my dockerfile is executed
Step 1/25 : FROM ubuntu:14.04
Second, does this issue have something to do with different paths in windows? To me, the weirdest part about the errors is the strange path \\\\?\\C:\\Users\\username. What is going on here with the excessive slashes and a question mark in the path?
Third, is it just me? Or does everyone encounter problems when using docker on windows bash subsystem?
NOTE: I've tried setting up the Docker Toolbox and running the quickstart terminal. I get the following error from running the quickstart terminal
Error with pre-create check:
"This computer is running Hyper-V. VirtualBox won't boot a 64bits VM when Hyper-V is activated. Either use Hyper-V as a driver, or disable the Hyper-V hypervisor. (To skip this check, use --virtualbox-no-vtx-check)"
Looks like something went wrong in step ´Checking if machine default exists
´... Press any key to continue...
So I've tried making sure hyper-v is enabled from this article here
NOTE 2: I've also made sure that my C drive is shared with my containers

Issue with setting up Vagrant

I have just set up a new Linux box and trying to install vagrant on it. The issue is that when I am running vagrant up command, I am getting the following error:
Vagrant failed to initialize at a very early stage:
The directory Vagrant will use to store local environment-specific
state is not accessible. The directory specified as the local data
directory must be both readable and writable for the user that is
running Vagrant.
Any idea how to fix this?
I think a better way is to provide your user the required permission to the directory by making the user the owner - where you want the vagrant to be booted:
$ sudo chown -R <user> <directory>
and then you will be easily able to do:
$ vagrant up
Using sudo for vagrant up is unusual as why do you want to run your virtual machine as a root user.
I met the same problem and I solved it by run the terminal with"run as administrator". It's quite easy.
Hope this can help you.
I encountered the same issue four years later and could not fix it using chmod or even #Ziya's comment under the initial question (which brought me closer to the resolution though).
In my case, I use Vagrant 2.2.6 on Windows 10, and use Cygwin as a command line interface.
For the error to disappear, I had to :
open Windows Explorer
right-click .vagrant folder in the location where I typed vagrant up
access the "Properties" menu
then, in the "Security" tab, update the authorizations for my user, granting total control
Properties window screenshot
Hope this can help someone else.
Please follow these steps:
1) install vagrant 1.7.1
2) install virtual box 4.1, 4.2, or 4.3
3) use the administrator name in the custom directory (e.g., for windows users c:\users\AdminName\myvagrant or for Mac/Linux users /home/Admin/myvagrant)
For instance: c:\users\safwan\myvagrant where safwan is the user with administrator rights/privileges.
Copy the file name Vagrantfile in the myvagrant forlder.
4) Now open DOS window as shown in the picture and follow the steps in the DOS window changing the admin name

Resources