Hadoop HDFS connection in PowerCenter - hadoop

I have installed Cloudera's Hadoop QuickStart VM and I am attempting to pass records from my local database to HDFS using a PowerCenter mapping.
I've set up the Hadoop_HDFS_Connection in PowerCenter Workflow Manager but when I run the workflow I get the following error: "Unable to establish a connection with the specified HDFS host". It gives a "java.net.ConnectionException" error when trying to connect to the host name and port.
I think the error may be in the hostname notation. On Cloudera Manager on the VM, the host name is listed as 'localhost.localdomain' but I don't know how to translate this in the PowerCenter connection settings.
Anybody got this connection to work?
Many thanks.
Brian

Related

Cdap connectivity with Apache HIVE

I have linux Box with CDAP installed and I configured the Hive import and Export plugins in CDAP.
In the same machine, I have Hadoop with HIVE installed. Am able to start all of the Hadoop services and verified using jps command and create and query the hive tables.
The actual problem is when am trying to connect the hive from cdap. It is unable to connect to hive and it is throwing the below error message.
Connection string: jdbc:hive2://localhost:10000/defaultdb;auth=deligateToken;
Output Directory: /tmp/hive - this directory is already exists
Error:
I tried changing the connection string to
Option1 : Connection string: jdbc:hive2://localhost:10000/defaultdb;auth=deligateToken; - COnnection refused error
Option 2: Connection string: jdbc:hive2:// - unable to instantiate error.
Option 3: Connection string: jdbc:hive2://localhost:10001/defaultdb;auth=deligateToken; - still it is not working

Write to HDFS/Hive using NiFi

I'm using Nifi 1.6.0.
I'm trying to write to HDFS and to Hive (cloudera) with nifi.
On "PutHDFS" I'm configure the "Hadoop Confiugration Resources" with hdfs-site.xml, core-site.xml files, set the directories and when I'm trying to Start it I got the following error:
"Failed to properly initialize processor, If still shcedule to run,
NIFI will attempt to initalize and run the Processor again after the
'Administrative Yield Duration' has elapsed. Failure is due to
java.lang.reflect.InvocationTargetException:
java.lang.reflect.InvicationTargetException"
On "PutHiveStreaming" I'm configure the "Hive Metastore URI" with
thrift://..., the database and the table name and on "Hadoop
Confiugration Resources" I'm put the Hive-site.xml location and when
I'm trying to Start it I got the following error:
"Hive streaming connect/write error, flow file will be penalized and routed to retry.
org.apache.nifi.util.hive.HiveWritter$ConnectFailure: Failed connectiong to EndPoint {metaStoreUri='thrift://myserver:9083', database='mydbname', table='mytablename', partitionVals=[]}:".
How can I solve the errors?
Thanks.
For #1, if you got your *-site.xml files from the cluster, it's possible that they are using internal IPs to refer to components like the DataNodes and you won't be able to reach them directly using that. Try setting dfs.client.use.datanode.hostname to true in your hdfs-site.xml on the client.
For #2, I'm not sure PutHiveStreaming will work against Cloudera, IIRC they use Hive 1.1.x and PutHiveStreaming is based on 1.2.x, so there may be some Thrift incompatibilities. If that doesn't seem to be the issue, make sure the client can connect to the metastore port (looks like 9083).

Unable to connect to hbase using phoenix jdbc driver (Can't get the locations error)

I am working on hbase(1.2.6) with phoenix(4.10.0-Hbase-1.2).
I am getting this error :
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations error
Below is my code through which I am trying to connect to hbase using phoenix:
Connection connection = DriverManager.getConnection("jdbc:phoenix:localhost");
Below is my hdfs-site.xml file where I have made some changes:
What changes I need to do? Please suggest..
A combined answer with #vrb
The Zookeeper port is a non-standard port, and needs to be specified on the jdbc url used with the Connection/DriverManager
Connection connection = DriverManager.getConnection("jdbc:phoenix:localhost:12181");
Check the "conf/regionserver" file in hbase for hostnames and use the same hostname to connect to hbase.
connection = DriverManager.getConnection("jdbc:phoenix:{hostname_in_regionserver_conf_file}:2181");
Also ensure that the "phoenix-X.X.X-HBase-X.X-client.jar" jar is in the classpath of you java file.

Hive : The application won't work without a running HiveServer2

I am new to this field. I was checking CDH 5.8 quick-start VM to try some basic hive/impala example.
But I hit an issue, while I am opening HUE it's giving below error. I searched solution for but didnt get anything which can resolve my issue.
Configuration files located in /etc/hue/conf.empty
Potential misconfiguration detected. Fix and restart Hue.
Hive The application won't work without a running HiveServer2.
I checked the and it's up & running. Tried restarting the service & CDH, didnt help.
Hive Server2 is running [ OK ]
When navigated to Hive tried some command it gave me below error.
Could not connect to quickstart.cloudera:10000 (code THRIFTTRANSPORT): TTransportException('Could not connect to quickstart.cloudera:10000',)
FOR Impala I am getting
AnalysisException: This Impala daemon is not ready to accept user requests. Status: Waiting for catalog update from the StateStore.
Tried starting hive --service metastore but got error
[cloudera#quickstart conf.empty]$ hive --service metastore
2017-03-03 05:37:14,502 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
Starting Hive Metastore Server
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.
Not sure what is wrong or if I need to change some config. Can you anyone guide me towards the solution ?
You HiveServer2 requires Metastore up and running. Seems your Metastore Server cannot start because the port 9083 is already used by some service. Check it:
netstat -tulpn | grep 9083
If something is using this port you need to either change the port of you metastore in hive configuration or stop the application which already uses this port.

phoenix hbase not connecting remotely

I have two cloudera VM and on both i've configured phoenix and it is working fine as long as it is localhost.
When i'm trying to connect hbase from one VM from phoenix of another VM, i'm using this command
$ ./sqlline.sh xxx.xx.xx.xx:2181
The connection is successful, but phoenix is still referencing the local HBASE and not the remote HBASE. Can anyone tell me where is the problem?

Resources