I got one question on Sqoop --append command as we know append command will add a value to the existing table or record but in hadoop or hdfs update option is prohibited how does it work?
From the documentation,
By default, imports go to a new target location. If the destination directory already exists in HDFS, Sqoop will refuse to import and overwrite that directory’s contents. If you use the --append argument, Sqoop will import data to a temporary directory and then rename the files into the normal target directory in a manner that does not conflict with existing filenames in that directory.
in hadoop also we have a provision to update the file using "-appendtoFile" command , there it will append the data to existing data but file names will be diff.
Related
Hive version is 3.1.0 and sql is LOAD DATA INPATH 'filepath' OVERWRITE INTO TABLE tablename. filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). I hope hive only copies files, not moves to hive warehouse dir, because files are also used elsewhere. What should I do?
LOAD DATA command moves files. If you want to copy, use one of the above commands:
Use copyFromLocal command:
hdfs dfs -copyFromLocal <localsrc> URI
or put command:
hdfs dfs -put <localsrc> ... <dst>
If your files are already in HDFS, alternatively you can create table/partition on top of that directory, specifying location, without copying them at all. ALTER TABLE SET location also will work.
I have a table in HDFS with the current path of /apps/hive/warehouse/ratings. I tried to download this to my local file system with the copyToLocal function in Hadoop.
The call worked and showed no errors, but when I go check in to the downloaded table is just a folder containing a file type.
Do you know what is the proper function call to download the table from HDFS as a CSV file?
This is the command that I am using at the moment
hadoop fs -copyToLocal /apps/hive/warehouse/ratings /home/maria_dev
this was to check what type of file i had
You can try
hadoop fs -get /apps/hive/warehouse/ratings /home/maria_dev
And after your file is in your local file system you can rename the file to what ever you want and add your preferred file format
I have a python script that generates schemas, drop table and load table commands for files in a directory that I want to import into Hive. I can then run these in Ambari to import files. Multiple 'create table' commands can be executed, but when uploading files to import into their respective Hive tables, I can only upload one file at a time.
Is there a way to perhaps put these commands in a file and execute them all at once so that all tables are created and the relevant files are subsequently uploaded to their respective tables?
I have also tried importing files to HDFS with the aim of then sending them to Hive via Linux using 'hdfs dfs -copyFromLocal /home/ixroot/Documents/ImportToHDFS /hadoop/hdfs' commands, but errors such as 'no such directory' crop up with regards to 'hadoop/hdfs'. I have tried changing permissions using chmod, but these don't seem to be effective either.
I would be very grateful if anyone could tell me which route would be better to pursue with regards to efficiently importing multiple files into their respective tables in Hive.
1) Is there a way to perhaps put these commands in a file and execute them all at once so that all tables are created and the relevant files are subsequently uploaded to their respective tables?
You can give all the queries in a .hql file, something like test.hql and run hive -f test.hql to execute all command in one shot
2) errors such as 'no such directory'
give hadoop fs -mkdir -p /hadoop/hdfs and then type hadoop fs -copyFromLocal /home/ixroot/Documents/ImportToHDFS /hadoop/hdfs
Edit: for permission
hadoop fs -chmod -R 777 /user/ixroot
I have a file sample.txt and i want to place it in hive warehouse directory (Not under the database xyz.db but directly into immediate subdirectory of warehouse). Is it possible?
To answer your question, since /user/hive/warehouse is just another folder on HDFS, you can move any file to the location without actually creating the file.
From the Hadoop Shell, you can achieve it by doing:
hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
From the Hive Prompt, you can do that by giving this command:
!hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
Here the first URL is the source location of your file and the next URL is the destination i.e. Hive Warehouse where you wish to move your file.
But such a situation does not generally occur in a real scenario.
I have created a folder to drop the result file from a Pig process using the Store command. It works the first time, but the second time it compains that the folder already exists. What is the best practice for this situiation? Documentation is sparse on this topic.
My next step will be to rename the folder to the original file name, to reduce the impact of this. Any thoughts?
You can execute fs commands from within Pig, and should be able to delete the directory by issuing a fs -rmr command before running the STORE command:
fs -rmr dir
STORE A into 'dir' using PigStorage();
The only subtly is the fs command doesn't expect quotes around the directory name, whereas the store command does expect quotes around the directory name.