Hadoop and Hive Homes in CDH4 - hadoop

I'm trying to configure RHive in the CDH4 environment.
When reading a package 'RHive' in R, the error below got returned.
I'm guessing that's due to wrong homes.
If so, what would be the correct ones?
Or if that's not the reason, what's wrong with that?
Any help would be very appreciated.
Thanks.
> Sys.setenv(HIVE_HOME="/etc/hive")
> Sys.setenv(HADOOP_HOME="/etc/hadoop")
> library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-7. For overview type '?RHive'.
HIVE_HOME=/etc/hive
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
In addition: Warning message:
In file(file, "rt") :
cannot open file '/etc/hadoop/conf/slaves': No such file or directory
Error: package/namespace load failed for 'RHive'

Had the problems but solved it. Downside is that I have to keep track of a bunch of sym links
After struggling with install RHive_0.0-7.tar.gz on CDH 4.7.x and getting:
Warning in file(file, "rt") :
cannot open file '/etc/hadoop/conf/slaves': No such file or directory
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
In /etc/hadoop/conf
I added a the following sym link ----> ln -s /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/etc/hadoop/conf.empty/slaves slaves
(why Cloudera CHD 4.7 installs in /opt without creating the proper sym links from /usr/lib is puzzling)
I also defined the followingin /usr/lib64/R/etc/Renviron
## set hive paths
HIVE_HOME='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive'
HADOOP_HOME='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop'
LD_LIBRARY_PATH='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop'
At a shell prompt I ran R CMD INSTALL RHive_0.0-7.tar.gz
Installation Happiness!!
++++++
Inside R-Studio (server)
>
> library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-7. For overview type ‘?RHive’.
HIVE_HOME=/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
>
+++++++

You should set the HADOOP_CONF_DIR separately.
Try export $HADOOP_CONF_DIR=/etc/hadoop/conf/conf.pseudo
The conf.pseudo has the slaves file.
Though I'd be curious to see if you can make RHive work with CDH4.

Related

Getting error while starting presto as $ sudo bin/launcher run

When I'm starting presto as this, it was started but
sys2079#sys2079:~/Music/presto-server-0.149$ sudo bin/launcher start
Started as 10672
After that i tried to lunch the presto then I'm getting error like this
sys2079#sys2079:~/presto-server-0.149$ sudo bin/launcher run
Unrecognized VM option 'G1HeapRegionSize = 32M'
Did you mean 'G1HeapRegionSize='?
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
WHAT I HAVE TO DO ????
try changing the RAM in your /etc/config.properties file
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://<your IP or *Localhost*>:8080
Hope this helps
The problem is in your jvm.config file inside presto. You should write your JVM option as: G1HeapRegionSize=32M instead of G1HeapRegionSize = 32M

Running AppManage raises error

Running path\to\tibco\tra\5.8\bin\AppManage.exe (from Tibco distribution) gives the following error:
Failed to open properties file : AppManage.tra
Adjustment failed.
How can I fix this?
Running AppManage.exe inside the folder path\to\tibco\tra\5.8\bin or adding this folder to PATH solves the issue.

Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found (Spark 1.6 Windows)

I am trying to access s3 files from local spark context using pySpark.
I keep getting File "C:\Spark\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o20.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
I had set os.environ['AWS_ACCESS_KEY_ID'] and
os.environ['AWS_SECRET_ACCESS_KEY'] before I called df = sqc.read.parquet(input_path). I also added these lines:
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsSecretAccessKey", os.environ["AWS_SECRET_ACCESS_KEY"])
hadoopConf.set("fs.s3.awsAccessKeyId", os.environ["AWS_ACCESS_KEY_ID"])
I have also tried changing s3 to s3n, s3a. Neither worked.
Any idea how to make it work?
I am on Windows 10, pySpark, Spark 1.6.1 built for Hadoop 2.6.0
I'm running pyspark appending the libraries from hadoop-aws.
You will need to use s3n in your input path. I'm running that from Mac-OS. so I'm not sure if it will work in Windows.
$SPARK_HOME/bin/pyspark --packages org.apache.hadoop:hadoop-aws:2.7.1
This package declaration works even in spark-shell
spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.1
and specify in the shell
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxxxxxxxxxxxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxxxxxxxxxxxxxxxx")

install_driver(Oracle) failed: Can't load '/usr/local/lib/perl5/auto/DBD/Oracle/Oracle.so' for module DBD::Oracle: libocci.so.11.1

I am facing this error from two days. I'am able to get output from via command line at nagios end
/usr/local/nagios/libexec/check_oracle_health --connect 192.168.2.92:1521/modula --user nagios --password nagios --mode tnsping
Output is
OK - connection established to 192.168.2.92:1521/modula.
But when I am going to GUI mode it is giving me error
CRITICAL - cannot connect to 192.168.2.92:1521/modula.
install_driver(Oracle) failed:
Can't load '/usr/local/lib/perl5/auto/DBD/Oracle/Oracle.so' for module DBD::Oracle:
libocci.so.11.1: cannot open shared object file:
No such file or directory at /usr/lib/perl5/DynaLoader.pm line 200.
at (eval 18) line 3
Compilation failed in require at (eval 18) line 3.
Perhaps a required shared library or dll isn't installed where expected
at /usr/local/nagios/libexec/check_oracle_health line 5837
\n
Plese help me to resolve the error.
I had this issue on CentOS 6 and here is how I resolved it:
`echo "$ORACLE_HOME/lib" >> /etc/ld.so.conf.d/oracle-x86_64.conf && ldconfig`
The answer by Jordan Neufeld is good and may be enough for you (I've tested it on CentOS 7), but I recommend setting these environment variables:
export ORACLE_HOME=/usr/lib/oracle/11.2/client64
export LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client64/lib:$LD_LIBRARY_PATH
export PATH=/usr/lib/oracle/11.2/client64/bin:$PATH
[examples are for oracle-instantclient11.2-basic-11.2 rpm, change path if needed]

HPCC/HDFS Connector

Does anyone know about HPCC/HDFS connector.we are using both HPCC and HADOOP.There is one utility(HPCC/HDFS connector) developed by HPCC which allows HPCC cluster to acess HDFS data
i have installed the connector but when i run the program to acess data from hdfs it gives error as libhdfs.so.0 doesn't exist.
I tried to build libhdfs.so using command
ant compile-libhdfs -Dlibhdfs=1
its giving me error as
target "compile-libhdfs" does not exist in the project "hadoop"
i used one more command
ant compile-c++-libhdfs -Dlibhdfs=1
its giving error as
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
[get] To: /home/hadoop/hadoop-0.20.203.0/ivy/ivy-2.1.0.jar
[get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
to /home/hadoop/hadoop-0.20.203.0/ivy/ivy-2.1.0.jar
BUILD FAILED java.net.ConnectException: Connection timed out
any suggestion will be a great help
Chhaya, you might not need to build libhdfs.so, depending on how you installed hadoop, you might already have it.
Check in HADOOP_LOCATION/c++/Linux-<arch>/lib/libhdfs.so, where HADOOP_LOCATION is your hadoop install location, and arch is the machine’s architecture (i386-32 or amd64-64).
Once you locate the lib, make sure the H2H connector is configured correctly (see page 4 here).
It's just a matter of updating the HADOOP_LOCATION var in the config file:
/opt/HPCCSystems/hdfsconnector.conf
good luck.

Resources