Unable to configuring Apache-SQOOP in windows machine - hadoop

I am trying to confugure Apache-SQOOP in my system.
But getting below error.
May I know what is the root cause for below.
Thanks

Your error is clearly indicating that you have not a setup environment variable. go to system properties and add HBASE_HOME etc by pointing right folder location and then it will work. Since sqoop also support hive tables which further has dependencies on other Hadoop ecosystem projects, it needs support tools and libraries.
I would rather suggest using the Cloudera or Hortonworks distribution to avoid such configurations.
You can check environment variables by running set command
c:\> set

Related

Installing multiple oracle homes on the same machine

I have Oracle 11g installed on my system and want to install 12c now. Read articles ( Oracle docs and general) which suggest that i can do so in different homes.
But when I try to install 12c (12.2.0.1 release 2) , it does not allow me to do so as it says "oracle _home (in environment variables) already defined and does not match the path specified (during installation)".
Am I supposed to specify the paths manually (or change it in the env variables) and if so then how do i accommodate different paths for two Oracle_homes for both versions in the environment variables.
A search on this portal gives results that does not answer my query.
How do I get around the installation?
You should not install more than one (i.e. one each for 32bit and 64bit) Oracle Client on one machine, I could also not imagine any reason for it.
Anyway, if you like to install more than one Oracle Client delete Environment Variable ORACLE_HOME and ORACLE_BASE from your computer, if existing. Then modify PATH Environment Variable and remove all directories of your first Oracle installation from PATH.
After that it should be no problem to install another Oracle Client. You must install it into a different directory, otherwise you mess up the installation and I assume afterwards none of them will work properly.
In order to use one or the other you have to set ORACLE_HOME and PATH Environment Variable accordingly, you cannot use them together! According Managing Oracle Home Directories you should have a "Home Selector" tool but I never used this.
Note, some components (e.g. "Oracle Provider for OLE DB") you can install only once (i.e. one each for 32bit and 64bit). This limitation is caused by Windows COM. Other drivers for example "Oracle Data Provider for .NET" may also fail due to version mismatch and/or policy settings in GAC.

Talend - System env variables not reflecting without restart

I'm using system environment variables to parameterise jobs in talend, But every time I change any value, without restarting talend I'm not able to get the changes reflected. Is there any workaround? I don't want to use Context groups or Implicit Context load. I'm using Talend Open Studio free edition. Is this any different in Enterprise version?
This has to do with Talend is handling environment variables. Talend is reading the environment variables at startup and stores them. There is a good answer here which explains this behavior in more detail regarding Java (Talend is build on Java).
There are also some tricks listed how to get the variables depending on your OS.

Installing cloudera Hadoop without internet connection

Actually I am trying to install cloudera hadoop cluster with few VMs with CentOS but this project is under secure environment where I can't use internet.I tried with various tutorial but each and every tutorial needed internet connection at some point of time. Few things I have downloaded instead of wget command.But still I couldn't make it.
Can any once share with me how can I do that either using cloudera Manager or manually (without need of any internet connection)??
You can do that by selecting the path B Manual installation of cloudera specified here which provide you the option of downloading the parcels online or specifying them from local repository.
OR
You can install the packages individually by using the path C for installation which is explained here on cloudera documentation.

How to install jenkins under current user (not 'jenkins') on MAC OS X

I have configured MAC OS X environment (SDKs, licenses, etc) under current user for build server and would like to reuse all the settings by a build agent. Jenkins was chosen as a good option but for some reason during installation it created a new user jenkins and launch the app under it, causing the environment setup to be not accessible (no SDKs, no licenses are found anymore).
Is it possible to install jenkins under current user?
Probably it could be installed under jenkins but then launched under current user?
Any other good options for me to consider are appreciated.
Try this: http://www.sailmaker.co.uk/blog/2013/04/02/advanced-jenkins-for-ios-and-mac/#Installing-Jenkins-itself
I’m also going to recommend installing Jenkins via Homebrew, to avoid
some nonsense in Jenkins’ own installer whereby it puts itself in
/Users/Shared/. You really don’t want that.
If you're free to reinstall however you'd like, I'd recommend re-installing as the user you want to use, using whatever type of install you prefer, and then simply copy over the old Jenkins data directory to the new installation's location, and then changing the permissions in that directory.
That is to say, the directory containing the config, plugin and job information (it may be something like /usr/lib/jenkins, but could vary).
Then, chown -R the data directory using the user:group info you want to use so Jenkins has access to the files.
I have used this type of method in the past to transfer all the data from one install to another totally separate install on the same box, and it has worked well (one could use this method to transfer the data to an install on another box, as well).
Note: I would highly recommend making a full backup of the data directory before doing this, in case anything goes awry.

How to Get Pig to Work with lzo Files?

So, I've seen a couple of tutorials for this online, but each seems to say to do something different. Also, each of them doesn't seem to specify whether you're trying to get things to work on a remote cluster, or to locally interact with a remote cluster, etc...
That said, my goal is just to get my local computer (a mac) to make pig work with lzo compressed files that exist on a Hadoop cluster that's already been setup to work with lzo files. I already have Hadoop installed locally and can get files from the cluster with hadoop fs -[command].
I also already have pig installed locally and communicating with the hadoop cluster when I run scripts or when I just run stuff through grunt. I can load and play around with non-lzo files just fine. My problem is only in terms of figuring out a way to load lzo files. Maybe I can just process them through the cluster's instance of ElephantBird? I have no idea, and have only found minimal information online.
So, any sort of short tutorial or answer for this would be awesome, and would hopefully help more people than just me.
I recently got this to work and wrote up a wiki on it for my coworkers. Here's an excerpt detailing how to get PIG to work with lzos. Hope this helps someone!
NOTE: This is written with a Mac in mind. The steps will be almost identical for other OS', and this should definitely give you what you need to know to configure on Windows or Linux, but you will need to extrapolate a bit (obviously, change Mac-centric folders to whatever OS you're using, etc...).
Hooking PIG up to be able to work with LZOs
This was by far the most annoying and time-consuming part for me-- not because it's difficult, but because there are 50 different tutorials online, none of which are all that helpful. Anyway, what I did to get this working is:
Clone hadoop-lzo from github at https://github.com/kevinweil/hadoop-lzo.
Compile it to get a hadoop-lzo*.jar and the native *.o libraries. You'll need to compile
this on a 64bit machine.
Copy the native libs to $HADOOP_HOME/lib/native/Mac_OS_X-x86_64-64/.
Copy the java jar to $HADOOP_HOME/lib and $PIG_HOME/lib
Then configure hadoop and pig to have the property java.library.path
point to the lzo native libraries. You can do this in $HADOOP_HOME/conf/mapred-site.xml with:
<property>
<name>mapred.child.env</name>
<value>JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Mac_OS_X-x86_64-64/</value>
</property>
Now try out grunt shell by running pig again, and make sure everything still works. If it doesn't, you probably messed up something in mapred-site.xml and you should double check it.
Great! We're almost there. All you need to do now is install elephant-bird. You can get that from https://github.com/kevinweil/elephant-bird (clone it).
Now, in order to get elephant-bird to work, you'll need quite a few pre-reqs. These are listed on the page mentioned above, and might change, so I won't specify them here. What I will mention is that the versions on these are very important. If you get an incorrect version and try running ant, you will get errors. So, don't try grabbing the pre-reqs from brew or macports as you'll likely get a newer version. Instead, just download tarballs and build for each.
command: ant in the elephant-bird folder in order to create a jar.
For simplicity's sake, move all relevant jars (hadoop-lzo-x.x.x.jar and elephant-bird-x.x.x.jar) that you'll need to register frequently somewhere you can easily find them. /usr/local/lib/hadoop/... works nicely.
Try things out! Play around with loading normal files and lzos in grunt shell. Register the relevant jars mentioned above, try loading a file, limiting output to a manageable number, and dumping it. This should all work fine whether you're using a normal text file or an lzo.

Resources