IllegalArgumentException with Hive when executing query - hadoop

Ok so I'm trying to execute SHOW DATABASES on hive, but then when I do it, it returns this error
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D
I've checked around in my XML to see what's going one, but nothing I change fixes this error.
Here's the relevant parts of the XML:
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>733</value>
<description>The permission for the user specific scratch directories that get created.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/${system:user.name}</value>
<description>Location of Hive run time structured log file</description>
</property>
Other than that, I can't think of any other property causing those errors, but then again I'm completely new this. I still haven't figured out a lot of things, for example how to connect my windows program in my windows VM to hive in my linux VM.
Anyway, if I can get any help I'd appreciate it.

What is ${system:user.name}? Why not use ${user.name} for hive.exec.local.scratchdir and hive.querylog.location. It seems like ${system:user.name} is not expanded.

Related

File is not Loaded Into HDFS from Local Using Streamsets (validated Successfully!)

I just have started using streamsets, and i'm trying to load a text file from local to HDFS.
Please note: I'm using Cloudera Manager, here is a view of "core-site.xml":
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.sdc.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.sdc.groups</name>
<value>*</value>
</property>
</configuration>
The local file is a text file stored in "/home/cloudera/Desktop".
Here is a view of the source (Local) configuration in Streamsets:
Here is a view of Hadoop fs configuration in Streamsets:
It was validated successfully!
After I played the pipline, I'm supposed to find the file in HDFS directory that I specified, especially at "/user/cloudera".
But when I run it the file hasn't been loaded.
I'm sure I missed something, and I couldn't find answer for this.
Could you please help!
Thanks,
You need to play the pipeline, not only validate it.

Hadoop fs -ls outputs current working directory's files rather than hdfs volume's files

Have set up a single pseudo-distributed node (localhost) with Hadoop 2.8.2 on OpenSuse Tumbleweed 20170703. Java version is 1.8.0_151. Generally, it seems to be set up correctly. I can format namenode with no errors etc.
However, when I try hadoop fs -ls, files/dirs from the current working directory are returned rather than the expected behaviour of returning the hdfs volume files (which should be nothing at the moment).
Was originally following this guide for CentOS (making changes as required) and the Apache Hadoop guide.
I'm assuming that it's a config issue, but I can't see why it would be. I've played around with core-site.xml and hdfs-site.xml as per below with no luck.
/opt/hadoop-hdfs-volume/ exists and is assigned to user hadoop in user group hadoop. As is the /opt/hadoop/ directory (for bin stuff).
EDIT:
/tmp/hadoop-hadoop/dfs/name is where the hdfs namenode -format command runs. /tmp/ also seems to hold my user (/tmp/hadoop-dijksterhuis) and the hadoop user directories.
This seems odd to me considering the *-site.xml config files below.
Have tried restarting the dfs and yarn services with the .sh scripts in the hadoop/sbin/ directory. Have also rebooted. No luck!
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-hdfs-volume/${user.name}</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Anyone have any ideas? I can provide more details if needed.
Managed to hack a fix via another SO answer:
Add $HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop to the hadoop user's .bashrc.
This has the effect of overriding the value in etc/hadoop-env.sh, which keeps pointing the namenode to the default tmp/hadoop-${user-name} directory.
source .bashrc et voila! Problem fixed.

How to setup hadoop without changing `/etc/hosts`?

In order to test the network performance in our cluster, I have to deploy hadoop in the nodes. In all the setup guide that I can find, there is a step that changes /etc/hosts file. The problem is, the network I'm testing is not the frequently used one. So if I directly edit this file, this may cause the existing program fails.
I've tried to use ip address instead of its host name in those hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml). For example, in core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.1.0.50:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadooptmp</value>
</property>
</configuration>
But this cannot work without changing /etc/hosts.
Is there any way to specify a host file only for hadoop?

Issues in saving bulk data in HBase in Pseudo-distributed mode

I am setting up CDH4 in a pseudo-distributed mode.
I have set up Hadoop, and as suggested on CDH4 installation guide, have also completed the hdfs demo successfully.
I have also set up, HIVE, & HBase.
To populate the data in Hbase, I have written a java client, which populates the bulk data in HBase (around 1M rows each in 4 tables).
Now I am facing two issues:
When java client is running to port the dummy data into hbase, the regionserver shut down after around 4,50,000 rows of data is entered in total.
Using Hive, I am not able to access tables created in HBase, or worst, even cannot create tables from hive shell. Though, the hbase shell shows me the data/table structure (whetever has been generated before regionserver shut down.)
I have seen other posts regarding same. Seems that the 2nd issue is related to my /etc/hosts or hive-site.xml. Thus, I am pasting contents of both of them.
/etc/hosts
198.251.79.225 u17162752.onlinehome-server.com u17162752
198.251.79.225 default-domain.com
198.251.79.225 hbase.zookeeper.quorum localhost
198.251.79.225 cloudera-vm # Added by NetworkManager
127.0.0.1 localhost.localdomain localhost
127.0.1.1 cloudera-vm-local localhost
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>
</configuration>
These issue are holding me from accomplish the task, I am supposed to.
Thanks in advance
Abhiskek
PS: This is my first post to this forum, so apologies, for anything inappropriate, you might have found! Thanks for bearing with me.
Hi Tariq, Thanks for your response. I have somehow managed to get over this. Now, I am facing another issue.
I am having 4 tables in HBase already, for which I want to create external tables in hive shell. But on running create external table commands on hive shell gives following error:
'ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in -ROOT- for region .META.,,1.1028785192 containing row'
Also, this error appears when I do something in HBase shell.
The other error that comes with the former one, on hbase shell is related to zookeeper. Stacktrace:
'WARN zookeeper.ZKUtil: catalogtracker-on- org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#6a9a56bf- 0x1413718482c0010 Unable to get data of znode /hbase/unassigned/1028785192
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/1028785192'
Please help. Thanks!

Error in starting hadoop Job Tracker

I tried to run a simple program in hadoop using Windows-Cygwin.
I am able to start the namenode .
The jobtracker start however fails with exception :
FATAL mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2200)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2192)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2186)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
I tried all possible methods to resolve this ,but in vain. Any pointers will greatly help me.
Hdfs-site.xml configurations :
<configuration><br>
<property>
<name>fs.default.name< /name>
<value>hdfs://localhost:9100</value>
</property>
<property>
<name>mapred.job.tracker< /name>
<value>localhost:9101< /value>
</property>
<property>
<name>dfs.replication< /name>
<value>1</value>
</property>
</configuration>
The problem is the following lines should on into mapred-site.xml and NOT hdfs-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
By the way why are you trying to run Hadoop in Windows? For development? You don't have a linux machine or reluctant to install one?
One more thing, you usually put this property in core-site.xml not hdfs-site.xml,
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9100</value>
</property>
I faced the same issue when working on the "Pseudo Distributed" examples as at this page: http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html#PseudoDistributed
It turned out that hadoop simply wasn't picking up my conf files. The examples at the link above assume you are running in your install of hadoop (i.e. /Usr/jane/hadoop-1.1.2). I was trying to run the examples in another directory. I'm sure you could configure hadoop to recognize other 'conf' directories, but I took the easy route and just started running in my hadoop directory.
This thread helped me figure it out: https://issues.apache.org/jira/browse/HDFS-2515

Resources