Unable to insert data in Hive Table with Load data - hadoop

I am trying to insert data into Hive Server with command "load data local inpath 'C:\User\HiveData_Employ.csv' into table table1swa;" command. The csv is in my local machine. And the data in the CSV is {21,Name1}. But i am getting an error like below:
FAILED: IllegalArgumentException java.net.URISyntaxException: Relative path in absolute URI: C:%5CSwarup%5CHiveData_Employ.csv (state=42000,code=40000)
What am i doing wrong here as i think i should mention local as i am loading data from my local machine and not from HDFS path.Also please confirm the input data is correct..

Try Changing Backslash
as below
C:/User/HiveData_Employ.csv' into table table1swa
Also the input data looks good

Related

No rows selected when trying to load csv file in hdfs to a hive table

I have a csv file called test.csv in hdfs. The file was placed there through filezilla. I am able to view the path as well as the contents of the file when I log in to Edge node through putty using the same account credentials that I used to place the file into hdfs. I then connect to Hive and try to create an external table specifying the location of my csv file in hdfs using the statement below:
CREATE EXTERNAL TABLE(col1 string, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC LOCATION '/file path'
when I execute this command it is creating an external table on hive but the table that is being created is empty with only the columns showing up which i have already mentioned in the create statement. My question is, am I specifying the correct path in the location parameter in the create statement above? I tried using the path which I see on filezilla when I placed my csv file into hdfs which is in the format home/servername/username/directory/subdirectory/file
but this returns an error saying the user whose username is specified in the path above does not have ALL privileges on the file path.
NOTE: I checked the permissions on the file and the directory in which it resides and the user has all permissions(read,write and execute).
I then tried changing the path into the format user/username/directory/subdirectory/file and when I did this I was able to create the external table however the table is empty and does not load all the data in the csv file on which it was created.
I also tried the alternative method of creating an internal table as below and then using the LOAD DATA INPATH command. But this also failed as I am getting an error saying that "there are no files existing at the specified path".
CREATE TABLE foobar(key string, stats map<string, bigint>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':' ;
LOAD DATA INPATH '/tmp/foobar.csv' INTO TABLE foobar;
First thing you can't load csv file directly into Hive table which is specified with orc file format while creating. Orc is a compression technique to store data in optimised way. So you can load your data into orc format table by following below steps.
You should create a temp table as text file format.
Load data into it by using the command.
hive> load data in path.....
or else u can use location parameter while creating the table itself.
Now create a hive table as your required file format (RC, ORC, parquet, etc).
-Now load data into it by using following command.
hive> insert overwrite into table foobar as select * from temptbl;
You will get table in orc file format.
In second issue is if you Load data into the table by using LOAD DATA command, the data which is in your file will become empty and new dir will be created in default location (/user/hive/warehouse/) with the table name and data will moved into that file. So check in that location you will see the data.

Inserting local csv to a Hive table from Qubole

I have a csv on my local machine, and I access Hive through Qubole web console. I am trying to upload the csv as a new table, but couldn't figure out. I have tried the following:
LOAD DATA LOCAL INPATH <path> INTO TABLE <table>;
I get the error saying No files matching path file
I am guessing that the csv has to be in some remote server where hive is actually running, and not on my local machine. The solutions I saw doesn't explain how to handle this issue. Can someone help me out reg. this?
Qubole allows you to define hive external/managed tables on the data sitting on your cloud storage ( s3 or azure storage ) - so LOAD from your local box wont work. you will have to upload this on your cloud storage and then define an external table against it -
CREATE External TABLE orc1ext(
`itinid` string, itinid1 string)
stored as ORC
LOCATION
's3n://mybucket/def.us.qubole.com/warehouse/testing.db/orc1';
INSERT INTO TABLE orc1ext SELECT itinid, itinid
FROM default.default_qubole_airline_origin_destination LIMIT 5;
First, create a table on hive using the field names present in your csv file.syntax which you are using seems correct.
Use below syntax for creating table
CREATE TABLE foobar(key string, stats map<string, bigint>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':' ;
and then load data using below format,then mention path name correctly
LOAD DATA LOCAL INPATH '/yourfilepath/foobar.csv' INTO TABLE foobar;

nameservice1 error when loading data in hive

I am trying to load a flat file in a table in hive and get below error.
FAILED: IllegalArgumentException java.net.UnknownHostException: nameservice1
Not sure what is required to do here.
The table is created as
CREATE TABLE IF NOT EXISTS poc_yi2 ( IndexValid_fg STRING ) ROW FORMAT delimited fields terminated by ',' STORED AS TEXTFILE
The data file contains one line which is
Yes,
The command to load the data is:
load data local inpath '/home/user1/testx/1' overwrite into table poc_yi2;
Is this a configuration param? I am relatively new to Hive. Can someone please assist
Looks like some problem with your cluster configuration. Please make sure you have properly set the properties like :
dfs.nameservices=nameservice1
dfs.ha.namenodes.nameservice1=namenode1,namenode2
Stop the daemons, make all the necessary modifications and restart your cluster. If the problem still persists, please show me your log files along with the config files.

Partitioning a table in Hadoop

I am going through an example in the O'Reilly Hadoop book about partitioning a table. Here is the code I am running.
This code creates a table, it seems to execute without errors.
CREATE TABLE logs (ts BIGINT, line STRING)
PARTITIONED BY (dt STRING, country STRING);
When I run the below command, it returns nothing, suspicious.
SHOW PARTITIONS logs;
When I run the next part of the example code, I get an Invalid path error.
LOAD DATA LOCAL INPATH '/user/paul/files/dt=2010-01-01/country=GB/test.out'
INTO TABLE logs
PARTITION (dt='2001-01-01', country='GB');
I have definitely created the file, and I can browse it through Hue at the following location.
/user/paul/files/dt=2010-01-01/country=GB
This is the specific error.
FAILED: SemanticException Line 1:23 Invalid path ''/user/paul/files/dt=2010-01-01/country=GB/test.out'': No files matching path file:/user/paul/files/dt=2010-01-01/country=GB/test.out
Am I missing something blatantly obvious here?
It just means file not found on the local file system at '/user/paul/files/dt=2010-01-01/country=GB/test.out'.
The file that you created '/user/paul/files/dt=2010-01-01/country=GB/test.out' is this file stored in HDFS or local file system? If it is in HDFS then you can't use local inpath
Remove local beforehand. I don't exactly remember but you may also need to alter the table beforehand: ALTER TABLE table_name ADD PARTITION (partCol = 'value1') location 'loc1';

Moving partial data across hadoop instances

I have to copy a certain chunk of data from one hadoop cluster to another. I wrote a hive query which dumps the data into hdfs. After copying the file to the destination cluster, I tried to load the data using the command "load data inpath '/a.txt' into table data". I got the following error message
Failed with exception Wrong file format. Please check the file's format.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
I had dumped the data as a sequence file. Can anybody let me know what am I missing here ?
You should use STORED AS SEQUENCEFILE while creating the table if you want to store sequence file in the table. And you have written that you have dumped data as Sequence file but your file name is a.txt. I didn't get that.
If you want to load a text file into a table that expects Sequence file as the data source you could do one thing. First create a normal table and load the text file into this table. Then do :
insert into table seq_table select * from text_table;

Resources