Failed make hive table on desired path and insert the values - hadoop

I want to make table in hive containing of only 1 column and 2 values: 'Y' and 'N'
I already try this:
create external table if not exists tx_test_table (FLAG string)
row format delimited fields terminated by ','
stored as textfile location "/user/hdd/data/";
My question is : why it locate at default table?
how to make it through the path I desire?
When I make query from the table I jut make, it failed to show the field (using select * from )
Bad status for request TFetchResultsReq(fetchType=0,
operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None,
operationType=0,
operationId=THandleIdentifier(secret='pE\xff\xfdu\xf6B\xd4\xb3\xb7\x1c\xdd\x16\x95\xb85',
guid="\n\x05\x16\xe7'\xe4G \xb6R\xe06\x0b\xb9\x04\x87")),
orientation=4, maxRows=100):
TFetchResultsResp(status=TStatus(errorCode=0,
errorMessage='java.io.IOException: java.io.IOException: Not a file:
hdfs://nameservice1/user/hdd/data/AC22', sqlState=None,
infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException:
java.io.IOException: Not a file: hdfs://nameservice1/user/hdd/data/AC22:14:13',
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:496',
'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:297',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:869', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:507',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:708',
'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717',
'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:605',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748',
'*java.io.IOException:java.io.IOException: Not a file: hdfs://nameservice1/user/hdd/data/AC22:18:4',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:521'
, 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:428',
'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:146',
'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2227',
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:491',
'*java.io.IOException:Not a file: hdfs://nameservice1/user/hdd/data/AC22:21:3',
'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat.java:329',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:372',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:304',
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'], statusCode=3),
results=None, hasMoreRows=None)

Each table in HDFS has it's own location. And location you specified for your table seems used as common location where other table folders are located.
According to the exception: java.io.IOException:Not a file: hdfs://nameservice1/user/hdd/data/AC22:21:3', at least one folder (not a file) was found in the /user/hdd/data/ location. I guess it belongs to some other table.
You should specify table location where will be stored only files which belong to this table, not the common data warehouse location, in which other table locations are.
Usually table location is named as table name: /user/hdd/data/tx_test_table
Fixed create table sentence:
create external table if not exists tx_test_table (FLAG string)
row format delimited fields terminated by ','
stored as textfile location "/user/hdd/data/tx_test_table";
Now table will have it's own location which will contain it's files, not mixed with other table folders or files.
You can put files into /user/hdd/data/tx_test_table location or load data into the table using INSERT, files will be created in the location.

Related

How to store multiple files under the same directory in hive?

I'm using Hive to process my CSV files. I've stored CSV files in HDFS and wanna create tables from those files.
I use the following command:
create external table if not exists csv_table (dummy STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'hdfs://localhost:9000/user/hive'
TBLPROPERTIES ("skip.header.line.count"="1");
LOAD DATA INPATH '/CsvData/csv_table.csv' OVERWRITE INTO TABLE csv_table;
So the file under /CsvData will be moved into /user/hive. It makes sense.
But how if I want to create another table?
create external table if not exists csv_table2 (dummy STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'hdfs://localhost:9000/user/hive'
TBLPROPERTIES ("skip.header.line.count"="1");
LOAD DATA INPATH '/CsvData/csv_table2.csv' OVERWRITE INTO TABLE csv_table2;
It will raise an exception complaining that the directory is not empty.
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Directory hdfs://localhost:9000/user/hive could not be cleaned up.
So it is hard for me to understand, does it mean I can store only one file understand one directory? To store multiple files I have to create one directory for every file?
Is it possible to store all the files together?
Create table sentence will NOT raise an exception complaining that the directory is not empty because it is quite normal scenario when you create table on top of existing directory.
You can store as many files in the directory as necessary. And all of them will be accessible to the table built on top of the folder.
Table location is directory, not file. If you need to create new table and keep it's files not mixed with other table then create separate folder.
Read also this answer for clear understanding: https://stackoverflow.com/a/54038932/2700344

No rows selected when trying to load csv file in hdfs to a hive table

I have a csv file called test.csv in hdfs. The file was placed there through filezilla. I am able to view the path as well as the contents of the file when I log in to Edge node through putty using the same account credentials that I used to place the file into hdfs. I then connect to Hive and try to create an external table specifying the location of my csv file in hdfs using the statement below:
CREATE EXTERNAL TABLE(col1 string, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC LOCATION '/file path'
when I execute this command it is creating an external table on hive but the table that is being created is empty with only the columns showing up which i have already mentioned in the create statement. My question is, am I specifying the correct path in the location parameter in the create statement above? I tried using the path which I see on filezilla when I placed my csv file into hdfs which is in the format home/servername/username/directory/subdirectory/file
but this returns an error saying the user whose username is specified in the path above does not have ALL privileges on the file path.
NOTE: I checked the permissions on the file and the directory in which it resides and the user has all permissions(read,write and execute).
I then tried changing the path into the format user/username/directory/subdirectory/file and when I did this I was able to create the external table however the table is empty and does not load all the data in the csv file on which it was created.
I also tried the alternative method of creating an internal table as below and then using the LOAD DATA INPATH command. But this also failed as I am getting an error saying that "there are no files existing at the specified path".
CREATE TABLE foobar(key string, stats map<string, bigint>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':' ;
LOAD DATA INPATH '/tmp/foobar.csv' INTO TABLE foobar;
First thing you can't load csv file directly into Hive table which is specified with orc file format while creating. Orc is a compression technique to store data in optimised way. So you can load your data into orc format table by following below steps.
You should create a temp table as text file format.
Load data into it by using the command.
hive> load data in path.....
or else u can use location parameter while creating the table itself.
Now create a hive table as your required file format (RC, ORC, parquet, etc).
-Now load data into it by using following command.
hive> insert overwrite into table foobar as select * from temptbl;
You will get table in orc file format.
In second issue is if you Load data into the table by using LOAD DATA command, the data which is in your file will become empty and new dir will be created in default location (/user/hive/warehouse/) with the table name and data will moved into that file. So check in that location you will see the data.

Query using hive CLI data is visible but if query using HUE no data found for multiple directory hdfs location [duplicate]

CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/LOGS/test';
under 'test' folder I am writing files daily. for eg:
/user/hive/warehouse/LOGS/test/20170420
/user/hive/warehouse/LOGS/test/20170421
/user/hive/warehouse/LOGS/test/20170422
I cannot see any data inside LOGS table that i have created.
But, I create the table using
LOCATION '/user/hive/warehouse/LOGS/test/20170422';
I can see that days records.
I want to see all the data under /test directory in my HIVE table, also the /test directory is populated daily with new files.
Option 1
In order to support sub-directories
set mapred.input.dir.recursive=true;
and if you Hive version is lower than 2.0.0 then also
set hive.mapred.supports.subdirectories=false;
Option 2
Create a partitioned table
CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)
partitioned by (dt date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/LOGS/test';
alter table LOGS add if not exists partition (dt=date '2017-04-20') LOCATION '/user/hive/warehouse/LOGS/test/20170420';
alter table LOGS add if not exists partition (dt=date '2017-04-21') LOCATION '/user/hive/warehouse/LOGS/test/20170421';
alter table LOGS add if not exists partition (dt=date '2017-04-22') LOCATION '/user/hive/warehouse/LOGS/test/20170422';
It would be easier to manage if you keep your directories using the standard convention, e.g. dt=2017-04-20 instead of 20170420
By default hive reads only the files (not directories) inside the specified location in external table. If you want to enable adding the directories, then set the below parameter:
set mapred.input.dir.recursive=true;

How to point one Hive Table to Multiple External Files?

I would like to be able to append multiple HDFS files to one Hive table while leaving the HDFS files in their original directory. These files are created are located in different directories.
The LOAD DATA INPATH moves the HDFS file to the hive warehouse directory.
As far as I can tell, an External Table must be pointed to one file, or to one directory within which multiple files with the same schema can be placed. However, my files would not be underneath a single directory.
Is it possible to point a single Hive table to multiple external files in separate directories, or to otherwise copy multiple files into a single hive table without moving the files from their original HDFS location?
Expanded Solution off of Pradeep's answer:
For example, my files look like this:
/root_directory/<job_id>/input/<dt>
Pretend the schema of each is (foo STRING, bar STRING, job_id STRING, dt STRING)
I first create an external table. However, note that my DDL does not contain an initial location, and it does not include the job_id and dt fields:
CREATE EXTERNAL TABLE hivetest (
foo STRING,
bar STRING
) PARTITIONED BY (job_id STRING, dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
Let's say I have two files I wish to insert located at:
/root_directory/b1/input/2014-01-01
/root_directory/b2/input/2014-01-02
I can load these two external files into the same Hive table like so:
ALTER TABLE hivetest
ADD PARTITION(job_id = 'b1', dt='2014-01-01')
LOCATION '/root_directory/b1/input/2014-01-01';
ALTER TABLE hivetest
ADD PARTITION(job_id = 'b2', dt='2014-01-02')
LOCATION '/root_directory/b2/input/2014-01-02';
If anyone happens to require the use of Talend to perform this, they can use the tHiveLoad component like so [edit: This doesn't work; check below]:
The code talend produces for this using tHiveLoad is actually LOAD DATA INPATH ..., which will remove the file off its original location in HDFS.
You will have to do the earlier ALTER TABLE syntax in a tHiveLoad instead.
The short answer is yes. A Hive External Table can be pointed to multiple files/directories. The long answer will depend on the directory structure of your data. The typical way you do this is to create a partitioned table with the partition columns mapping to some part of your directory path.
E.g. We have a use case where an external table points to thousands of directories on HDFS. Our paths conform to this pattern /prod/${customer-id}/${date}/. In each of these directories we have approx 100 files. In mapping this into a Hive Table, we created two partition columns, customer_id and date. So every day, we're able to load the data into Hive, by doing
ALTER TABLE x ADD PARTITION (customer_id = "blah", dt = "blah_date") LOCATION '/prod/blah/blah_date';
Try this:
LOAD DATA LOCAL INPATH '/path/local/file_1' INTO TABLE tablename;
LOAD DATA LOCAL INPATH '/path/local/file_2' INTO TABLE tablename;

Hive table not retrieving rows from external file

I have a text file called as sample.txt. The file looks like:
abc,23,M
def,25,F
efg,25,F
I am trying to create a table in hive using:
CREATE EXTERNAL TABLE ppldb(name string, age int,gender string)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/path/to/sample.txt';
But the data isn't getting into the table. When I run the query:
select count(*) from ppldb
I get 0 in output.
What could be the reason for data not getting loaded into the table?
The location in a external table in Hive should be an HDFS directory and not the full path of the file.
If that directory does not exists then the location we give will be created automatically. In your case /path/to/sample.txt is being treated as a directory.
So just give the /path/to/ in the LOCATION and keep the sample.txt file inside the directory. It will work.
Hope it helps...!!!
the LOCATION clause indicates where the table will be stored, not where to retrieve data from. After moving the samples.txt file into hdfs with something like
hdfs dfs -copyFromLocal ~/samples.txt /user/tables/
you could load the data into a table in hive with
create table temp(name string, age int, gender string)
row format delimited fields terminated by ','
stored as textfile;
load data inpath '/user/tables/samples.txt' into table temp;
That should work

Resources