Netezza utility NZLOAD to point -df location to the hdfs location - hadoop

Currently, we are copying the files from hdfs to local and we are using the NZLOAD utility to load the data into Netezza, but just wanted to know if it is possible to provide the hdfs location of the files as below
nzload -host ${NZ_HOST} -u ${NZ_USER} -pw ${NZ_PASS} -db ${NZ_DB} -t ${TAR_TABLE} -df "hdfs://${HDFS_Location}"

As HDFS is different file system, nzload will not recognise the file if you provide hdfs file path in -df option of Netezza nzload.
You can use hdfs dfs -cat along with nzload to load Netezza table from hdfs directory.
$ hdfs dfs -cat /data/stud_dtls/stud_detls.csv | nzload -host 192.168.1.100 -u admin -pw password -db training -t stud_dtls -delim ','
Load session of table 'STUD_DTLS' completed successfully
Load HDFS file into Netezza Table Using nzload and External Tables

Related

How to create new user in hadoop

I am new to hadoop. I have done apache hadoop multinode installation and the user name is hadoop.
I am using total 3 nodes: 1 namenode and 2 datanodes
I have to create new user for data isolation. I have found few links on google, but those are not working and I am unable to access the hdfs.
**[user1#datanode1~]# hdfs dfs -ls -R /
bash: hdfs: command not found...**
Can someone help me with the steps to create the new user which can access hdfs for data isolation. And on which node I should create the new user.
Thanks
Hadoop doesn't have users like Linux does. Users are generally managed by external LDAP/Kerberos systems. By default, there is not even security features, all user-names are based on the HADOOP_USER_NAME environment variable, and can be overriden by export command. Also, by default, the user used is the current username, for example, your command user1#datanode1 # hdfs dfs -ls would actually run hdfs dfs -ls /user/user1, and return an error if that folder doesn't first exist.
However, your actual error is saying that your OS PATH variable does not include $HADOOP_HOME/bin, for example. Edit your .bashrc to fix this.
You'd create an HDFS folder for "user" username with
hdfs dfs -mkdir /user/username
hdfs dfs -chown username /user/username
hdfs dfs -chmod -R 770 /user/username
And you also should run useradd command on the namenode machine to make sure it knows about a user named "username"

Cannot see the file I have upload to hadoop

The pscp tell me, it is successful.
pscp -P 22 part-00003 username#172.31.143.131:/home/username/lab2_hdfs
username#172.31.143.131's password:
part-00000 | 758 kB | 758.9 kB/s | ETA: 00:00:00 | 100%
But I didn't see it in my hadoop when I use hdfs dfs -ls, why?
HDFS is not the same as your local filesystem. You can't upload files to HDFS using SCP
According your command you have just transfered your local file to a remote host and into a remote directory (/home/username/lab2_hdfs). At that stage HDFS wasn't involved at all and therefore do not know about the new file.
You may have a look into articles like Hadoop: copy a local file to HDFS and use commands like
hadoop fs -put part-00003 /hdfs/path

how to copy file from remote server to HDFS

I have a remote server and servers authenticated Hadoop environment.
I want to copy file from Remote server to Hadoop machine to HDFS
Please advise efficient approach/HDFS command to copy files from remote server to HDFS.
Any example will be helpful.
as ordinary way to copy file from remote server to server itself is
scp -rp file remote_server:/tmp
but this approach not support copy directly to hdfs
You can try that:
ssh remote-server "hadoop -put - /tmp/file" < file
Here the remote server you mean to say it is not in the same network as the hadoop nodes. If that is the case may be you can scp from remote machine to hadoop nodes local file system and then use -put or -copyFromLocal command to move to HDFS.
example: hadoop fs -put file-name hdfs://namenode-uri/path-to-hdfs

beeline cant connect passing a init script from hdfs

It's possible initialize the connection with Hive passing a init script from HDFS?
I tried beeline -u <<url>> -i wasb://<path-to-init-script>, the connection is estabelished but the script initiliazation fails with 'no such file or directory', but the files exists and is listed with hdfs dfs -ls <<path-to-file>> and แบith hdfs dfs -ls wasb://<path-to-file>> too. The same happens if i try with beeline -u <<url>> -f wasb://<path-to-file>.
I am running inside a HDInsight(HDI 3.5) connected by ssh and the connection with the Hive is running in http mode. The Hadoop version is the 2.7.

How determine Hive database size?

How determine Hive's database size from Bash or from Hive CLI?
hdfs and hadoop commands are also avaliable in Bash.
A database in hive is a metadata storage - meaning it holds information about tables and has a default location. Tables in a database can also be stored anywhere in hdfs if location is specified when creating a table.
You can see all tables in a database using show tables command in Hive CLI.
Then, for each table, you can find its location in hdfs using describe formatted <table name> (again in Hive CLI).
Last, for each table you can find its size using hdfs dfs -du -s -h /table/location/
I don't think there's a single command to measure the sum of sizes of all tables of a database. However, it should be fairly easy to write a script that automates the above steps. Hive can also be invoked from bash CLI using: hive -e '<hive command>'
Show Hive databases on HDFS
sudo hadoop fs -ls /apps/hive/warehouse
Show Hive database size
sudo hadoop fs -du -s -h /apps/hive/warehouse/{db_name}
if you want the size of your complete database run this on your "warehouse"
hdfs dfs -du -h /apps/hive/warehouse
this gives you the size of each DB in your warehouse
if you want the size of tables in a specific DB run:
hdfs dfs -du -h /apps/hive/warehouse/<db_name>
run a "grep warehouse" on hive-site.xml to find your warehouse path

Resources