Ambari- Import multiple files to Hive - hadoop

I have a python script that generates schemas, drop table and load table commands for files in a directory that I want to import into Hive. I can then run these in Ambari to import files. Multiple 'create table' commands can be executed, but when uploading files to import into their respective Hive tables, I can only upload one file at a time.
Is there a way to perhaps put these commands in a file and execute them all at once so that all tables are created and the relevant files are subsequently uploaded to their respective tables?
I have also tried importing files to HDFS with the aim of then sending them to Hive via Linux using 'hdfs dfs -copyFromLocal /home/ixroot/Documents/ImportToHDFS /hadoop/hdfs' commands, but errors such as 'no such directory' crop up with regards to 'hadoop/hdfs'. I have tried changing permissions using chmod, but these don't seem to be effective either.
I would be very grateful if anyone could tell me which route would be better to pursue with regards to efficiently importing multiple files into their respective tables in Hive.

1) Is there a way to perhaps put these commands in a file and execute them all at once so that all tables are created and the relevant files are subsequently uploaded to their respective tables?
You can give all the queries in a .hql file, something like test.hql and run hive -f test.hql to execute all command in one shot
2) errors such as 'no such directory'
give hadoop fs -mkdir -p /hadoop/hdfs and then type hadoop fs -copyFromLocal /home/ixroot/Documents/ImportToHDFS /hadoop/hdfs
Edit: for permission
hadoop fs -chmod -R 777 /user/ixroot

Related

How to stop hive from moving data when hive loads files from HDFS into tables?

Hive version is 3.1.0 and sql is LOAD DATA INPATH 'filepath' OVERWRITE INTO TABLE tablename. filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). I hope hive only copies files, not moves to hive warehouse dir, because files are also used elsewhere. What should I do?
LOAD DATA command moves files. If you want to copy, use one of the above commands:
Use copyFromLocal command:
hdfs dfs -copyFromLocal <localsrc> URI
or put command:
hdfs dfs -put <localsrc> ... <dst>
If your files are already in HDFS, alternatively you can create table/partition on top of that directory, specifying location, without copying them at all. ALTER TABLE SET location also will work.

How does Sqoop append command will work in hadoop

I got one question on Sqoop --append command as we know append command will add a value to the existing table or record but in hadoop or hdfs update option is prohibited how does it work?
From the documentation,
By default, imports go to a new target location. If the destination directory already exists in HDFS, Sqoop will refuse to import and overwrite that directory’s contents. If you use the --append argument, Sqoop will import data to a temporary directory and then rename the files into the normal target directory in a manner that does not conflict with existing filenames in that directory.
in hadoop also we have a provision to update the file using "-appendtoFile" command , there it will append the data to existing data but file names will be diff.

Having file in hive wareshouse

I have a file sample.txt and i want to place it in hive warehouse directory (Not under the database xyz.db but directly into immediate subdirectory of warehouse). Is it possible?
To answer your question, since /user/hive/warehouse is just another folder on HDFS, you can move any file to the location without actually creating the file.
From the Hadoop Shell, you can achieve it by doing:
hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
From the Hive Prompt, you can do that by giving this command:
!hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
Here the first URL is the source location of your file and the next URL is the destination i.e. Hive Warehouse where you wish to move your file.
But such a situation does not generally occur in a real scenario.

How to view the hadoop data directory structure?

I have partitioned table in hive. So I wanna see the directory structure in hadoop hdfs?
From documentation, I have found the following command
hadoop fs -ls /app/hadoop/tmp/dfs/data/
and /app/hadoop/tmp/dfs/data/ is my data path. But this command return
ls: Cannot access /app/hadoop/tmp/dfs/data/: No such file or
directory.
Am I missing something there?
Unless I'm mistaken, it seems you are looking for a temporary directory that you probably defined in the property hadoop.tmp.dir. This is a local directory, but when you do hadoop fs -ls you are looking at what files are available in HDFS, so you won't see anything.
Since you're looking or the Hive directories, you are looking for the following property in your hive-site.xml:
hive.metastore.warehouse.dir
The default is /user/hive/warehouse, so if you haven't changed this property you should be able to do:
hadoop fs -ls /user/hive/warehouse
And this should show you your table directories.
check whether tmp directory is correctly set in your core-site.xml file and hdfs-site.xml.
if not set, then the temporary directory of operating system(tmp in ubuntu and %temp% in windows) will be set to hadoop tmp folder, due to which you may lose your data after restarting your computer. Set this dfs.tmp.dir in both the xml and restart your cluster. It will work fine then.
even after this if it is not resolved, please give more details about partitioning table code and the table data too.

Using multiple local folders as source in hadoop mapreduce job

I have data in multiple local folders i.e. /usr/bigboss/data1, /usr/bigboss/data2 and many more folders. I want to use all of these folders as input source for my MapReduce command and store the result at HDFS. I can not find a working command to use Hadoop Grep example to do it.
The data will need to reside in HDFS for you to process it with the grep example. You can upload the folders to HDFS using the -put FsShell command:
hadoop fs -mkdir bigboss
hadoop fs -put /usr/bigboss/data* bigboss
Which will create a folder in the current user HDFS directory, and upload each of the data directories to it
Now you should be able to run the grep example over the data

Resources