Installing RHadoop on a Hadoop Cluster

Installing RHadoop on a Hadoop Cluster - hadoop

I am trying to install RHadoop on top of my Hadoop cluster. While installing some of the required packages I am facing the following error:
> install.packages("Megh/rmr2_3.3.1.tar.gz")
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from 'pkgs'
Error in rawToChar(block[seq_len(ns)]) :
embedded nul in string: 'rmr2/man/fromdfstodfs.Rd\0\0\0\0erties\n i-_". '
Warning message:
In install.packages("Megh/rmr2_3.3.1.tar.gz") :
installation of package ‘Megh/rmr2_3.3.1.tar.gz’ had non-zero exit status
>
> install.packages("Megh/plyrmr_0.6.0.tar.gz")
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from 'pkgs'
Warning in untar2(tarfile, files, list, exdir, restore_times) :
checksum error for entry 'plyrmr/man/as.data.framed'
Warning in readBin(con, "raw", n = 512L) :
invalid or incomplete compressed data
Error in untar2(tarfile, files, list, exdir, restore_times) :
incomplete block on file
Warning message:
In install.packages("Megh/plyrmr_0.6.0.tar.gz") :
installation of package ‘Megh/plyrmr_0.6.0.tar.gz’ had non-zero exit status
I Have also installed RHive on the cluster. I'm able to execute relatively smaller queries through RHive but larger queries fail:
> rhive.query("SELECT COUNT(*) FROM tradehistory")
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
> rhive.query("SELECT tradeno FROM tradehistory LIMIT 10")
tradeno
1 34232193
2 34232198
3 34232199
4 34232200
5 34232201
6 34232202
7 34232203
8 34232204
9 34232205
10 34232206
If anybody has any idea please help me out with this! Thanks a lot in advance!

For installation error that I was facing, I figured out that it was an issue with the tar file.
I downloaded that tar file using Windows system and was transferring the same to my cluster using WinSCP.
for transferring zip/archive kind of files, ideally binary transfer should be used otherwise there are chances of some bytes of the tar file being missed out.
This in turn, results in the error.
In case of Tez, if a query needs to be executed which has to invoke multiple MapReduce tasks, the query can't execute without proper authorization.
So when I tried the same rhive query with supplying the username and password, I was able to achieve the desired results.

Related

Can't use 'put'() to add data to hbase with happybase

My python version is 3.7, and after I ran pip3 install happybase, I started the command hbase thrift start and tried to write a brief .py file as following:
import happybase
connection = happybase.Connection('master')
table = connection.table('jmlr') #'jmlr' is a table in hbase
for i in table.scan():
print(i)
table.put('001', {'title':'dasds'}) #error here
connection.close()
When it's about to run table.put(), it reported such an error:
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')
And at the same time, the thrift reported an error:
ERROR [thrift-worker-1] thrift.TBoundedThreadPoolServer: Error occurred during processing of message. java.lang.IllegalArgumentException: Invalid famAndQf provided.
But just now I ran this python file again, it gave me a different error in thrift:
thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Bad version in readMessageBegin
I have tried to add parameters like protocol='compact', transport='framed', but this didn't work, even the table.scan() failed.
Everything in the hbase shell is OK, so I can't figure out what went wrong, I'm about to collapse.

I ran into the same issue and found this sollution. You need to add even empty Column Qualifier ( ':' symbol as delimiter between Column Family and Column Qualifier) into put() method:
table.put('001:', {'title':'dasds'})
Also, you have a different error message after second run of script because thrift server is already failed.
I hope it will help you.

Error on installing Titan DB on Windows

Following the official guide of Titan DB here, and trying to run the command:
graph = TitanFactory.open('conf/titan-cassandra-es.properties')
I got this error:
Backend shorthand unknown: conf/titan-cassandra-es.properties
Obviously, the reason is the incorrect path to the
titan-cassandra-es.properties
file. So I changed it to:
graph = TitanFactory.open('../conf/titan-cassandra-es.properties')
and got this error:
Encountered unregistered class ID: 141.
The error happens in the following version:
titan-0.5.4-hadoop2
On titan-1.0.0-hadoop2 instead of this error message I get this one:
Invalid import definition: 'com.thinkaurelius.titan.hadoop.MapReduceIndexManagement'; reason: startup failed: script14747941661821834264593.groovy: 1: unable to resolve class com.thinkaurelius.titan.hadoop.MapReduceIndexManagement # line 1, column 1. import com.thinkaurelius.titan.hadoop.MapReduceIndexManagement ^
1 error
And on titan-1.0.0-hadoop2 I get this one:
The input line is too long.
The syntax of the command is incorrect.
Does anyone know how to handle this issue?

It seems like you have not even managed to get Titan 1 to start up yet.
I do not believe Titan 1 has been deployed to support Windows out of the box. I.e. the downloadable package will not just work with windows.
Saying that I have managed to get Titan DB 1 to work on windows. To do so, all you have to do is install Cassandra 2.x on Windows. This guide may help you out. Start cassandra and enable thrift connections.
With that done you should be able to get Titan doing basic operations on windows. From there you may find dealing with you current errors easier.
Side Note: Windows support for Titan 0.5.x may be more substantial. So you could look into that as well.

OPNET Compilation Error

I recently installed OPNET 14 and i have been unable to run simulations. I keep getting the error below when i try to run any:
<<< Recoverable Error >>>
Object repository construction failed
due to errors encountered by the binder program (bind_so_msvc)
T (0), EV (-), MOD (NONE), PROC (sim_load_repos_rebuild)
----
Errors reported by the binder program follow
(these messages have been saved in (C:\Users\Karl\op_admin\tmp\bind_err_7640):
LINK : fatal error LNK1181: cannot open input file 'kernel32.lib'
----
<<< Program Abort >>>
Error encountered rebuilding repository -- unable to proceed
T (0), EV (-), MOD (NONE), PROC (sim_load_repos_load)
----
Kindly advise on how i can over come this.
Appreciate the help.

Select Edit --> Preferences
Type "network sim" in the search for field
Click the value field of Network Simulation Repositories
Add stdmod to the value field of Network Simulation Repositories

Hadoop and Hive Homes in CDH4

I'm trying to configure RHive in the CDH4 environment.
When reading a package 'RHive' in R, the error below got returned.
I'm guessing that's due to wrong homes.
If so, what would be the correct ones?
Or if that's not the reason, what's wrong with that?
Any help would be very appreciated.
Thanks.
> Sys.setenv(HIVE_HOME="/etc/hive")
> Sys.setenv(HADOOP_HOME="/etc/hadoop")
> library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-7. For overview type '?RHive'.
HIVE_HOME=/etc/hive
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
In addition: Warning message:
In file(file, "rt") :
cannot open file '/etc/hadoop/conf/slaves': No such file or directory
Error: package/namespace load failed for 'RHive'

Had the problems but solved it. Downside is that I have to keep track of a bunch of sym links
After struggling with install RHive_0.0-7.tar.gz on CDH 4.7.x and getting:
Warning in file(file, "rt") :
cannot open file '/etc/hadoop/conf/slaves': No such file or directory
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
In /etc/hadoop/conf
I added a the following sym link ----> ln -s /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/etc/hadoop/conf.empty/slaves slaves
(why Cloudera CHD 4.7 installs in /opt without creating the proper sym links from /usr/lib is puzzling)
I also defined the followingin /usr/lib64/R/etc/Renviron
## set hive paths
HIVE_HOME='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive'
HADOOP_HOME='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop'
LD_LIBRARY_PATH='/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop'
At a shell prompt I ran R CMD INSTALL RHive_0.0-7.tar.gz
Installation Happiness!!
++++++
Inside R-Studio (server)
>
> library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-7. For overview type ‘?RHive’.
HIVE_HOME=/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
>
+++++++

You should set the HADOOP_CONF_DIR separately.
Try export $HADOOP_CONF_DIR=/etc/hadoop/conf/conf.pseudo
The conf.pseudo has the slaves file.
Though I'd be curious to see if you can make RHive work with CDH4.

I'm trying to build a simple cluster based on Windows XP. I compiled OpenMPI-1.4.2 successfully, and tools like mpicc and ompi_info work too, but I can't get my mpirun working properly. The only output I can see is
Z:\>orterun --hostfile z:\hosts.txt -np 2 hostname
[host0:04728] Failed to initialize COM library. Error code = -2147417850
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\mca\ess\hnp\ess_hnp_module.c at line 218
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_plm_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\runtime\orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_set_name failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\..\openmpi
-1.4.2\orte\tools\orterun\orterun.c at line 543
Where z:\hosts.txt appears as follows:
host0
host1
Z: is a shared network drive available to both host0 and host1.
What my problem is and how do I fix it?
Upd:
Ok, this problem seems to be fixed. It seems to me that WideCap driver and/or software components causes this error to appear. A "clean" machine runs local task successfully. Anyway, I still cannot run a task within at least 2 machines, I'm getting following message:
Z:\>mpirun --hostfile z:\hosts.txt -np 2 hostname
connecting to host1
username:MAIN\cluster
password:********
Save Credential?(Y/N) y
[host0:04728] This feature hasn't been implemented yet.
[host0:04728] Could not connect to namespace cimv2 on node host1. Error code =-2147217400
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error.
More information may be available above.
--------------------------------------------------------------------------
I googled a little and did all the things as described here: http://www.open-mpi.org/community/lists/users/2010/03/12355.php but I'm still getting the same error. Can anyone help me?
Upd2:
Error code -2147217400 might be WMI error WBEM_E_INVALID_PARAMETER (0x80041008) which occures when one of the parameters passed to the WMI call is not correct. Does this mean that the problem is in OpenMPI source code itself? Or maybe it's because of wrong/outdated wincred.h and credui.lib I used while building OpenMPI from the source code?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio