How to extract apache phoenix table/view data to a file - hadoop

How can I extract data from an apache phoenix table/view to CSV/PSV/text?
For ex query:
select * from test_view

After connecting to phoenix with sqlline.py:
phoenix-sqlline zk4-habsem.lzmf1fzmprtezol2fr25obrdth.jx.internal.cloudapp.net,zk5-habsem.lzmf1fzmprtezol2fr25obrdth.jx.internal.cloudapp.net,zk1-habsem.lzmf1fzmprtezol2fr25obrdth.jx.internal.cloudapp.net:2181:/hbase-unsecure
which format do you want exported file
!outputformat csv
Mension local file path
!record data.csv
Phoenix query which we want to export into file
select * from tableName
!record
!quit
File is saved in /home/user/data.csv

You can use squirrel to extract the data to local file.
Refer
https://community.hortonworks.com/articles/44350/setting-up-squirrel-and-phoenix-integration.html

Related

Creating hive table: no files matching path file... but the file exist in the path

Im trying to create a hive orc table using a file stored in hdfs.
I have a table "partsupp.tbl" file where each line have the below format:
1|25002|8076|993.49|ven ideas. quickly even packages print. pending multipliers must have to are fluff|
I create a hive table like this:
create table if not exists partsupp (PS_PARTKEY BIGINT,
PS_SUPPKEY BIGINT,
PS_AVAILQTY INT,
PS_SUPPLYCOST DOUBLE,
PS_COMMENT STRING)
STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY")
;
Now Im trying to load the data in the .tbl file in the table like this:
LOAD DATA LOCAL INPATH '/tables/partsupp/partsupp.tbl' INTO TABLE partsupp
But Im getting this issue:
No files matching path file:/tables/partsupp/partsupp.tbl
But the files exists in the hdfs...
LOCAL signifies that file is present on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
So in this case, use following query:
LOAD DATA INPATH '/tables/partsupp/partsupp.tbl' INTO TABLE partsupp

Hadoop backend with millions of records insertion

I am new to hadoop, can someone please suggest me how to upload millions of records to hadoop? Can I do this with hive and where can I see my hadoop records?
Until now I have used hive for creation of the database on hadoop and I am accessing it with localhost 50070. But I am unable to load data from csv file to hadoop from terminal. As it is giving me error:
FAILED: Error in semantic analysis: Line 2:0 Invalid path ''/user/local/hadoop/share/hadoop/hdfs'': No files matching path hdfs://localhost:54310/usr/local/hadoop/share/hadoop/hdfs
Can anyone suggest me some way to resolve it?
I suppose initially the data is in the Local file system.
So a simple workflow could be: load data from local to hadoop file system(HDFS), create a hive table over it and then load the data in hive table.
Step 1:
// put in HDFS
$~ hadoop fs -put /local_path/file_pattern* /path/to/your/HDFS_directory
// check files
$~ hadoop fs -ls /path/to/your/HDFS_directory
Step 2:
CREATE EXTERNAL TABLE if not exists mytable (
Year int,
name string
)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as TEXTFILE;
// display table structure
describe mytable;
Step 3:
Load data local INPATH '/path/to/your/HDFS_directory'
OVERWRITE into TABLE mytable;
// simple hive statement to fetch top 10 records
SELECT * FROM mytable limit 10;
You should use LOAD DATA LOCAL INPATH <local-file-path> to load the files from local directory to Hive tables.
If you dont specify LOCAL , then load command will assume to lookup the given file path from HDFS location to load.
Please refer below link,
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables

Hive : How to execute a query from a file and dump the output in hdfs

I can execute a query from a sql file and store the output in a local file using
hive -f /home/Prashasti/test.sql > /home/Prashasti/output.csv
Also, I can store the output of a hive query in hdfs using :
insert overwrite directory 'user/output' select * from folders;
Is there any way I can run the query from a sql file and store the output in hdfs too?
Just modify the sql file and add the insert overwrite directory 'user/output' to the front of the query.

Exporting Data from Hive table to Local Machine File System

Using following command:
insert overwrite local directory '/my/local/filesystem/directory/path'
select * from Emp;
overwrites the entire already existing data in /my/local/filesystem/directory/path with the data of Emp.
What i want is to just copy the data of Emp to /my/loca/filesystem/directory/path and not overwrite, how to do that?
Following are my failed trials:
hive> insert into local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:12 mismatched input 'local' expecting
TABLE near 'into' in insert clause
hive> insert local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:0 cannot recognize input near 'insert'
'local' 'directory' in insert clause
Can u please tell me how can I get this solved?
To appened to a hive table you need to use INSERT INTO:
INSERT INTO will append to the table or partition keeping the existing
data in tact. (Note: INSERT INTO syntax is only available starting in
version 0.8)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
But you can't use this to append to an existing local file so another option is to use a bash command.
If you have a file called 'export.hql' and in that file your code is:
select * from Emp;
Then your bash command can be:
hive -f 'export.hql' >> localfile.txt
The -f command executes the hive file and the >> append pipes the results to the text file.
EDIT:
The command:
hive -f 'export.hql' > localfile.txt
Will save the hive query to a new file, not append.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SQLOperations
When using 'LOCAL', 'OVERWRITE' is also needed in your hql.
For example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out' SELECT * FROM test

Greenplum loading data from table to file using external table

I ran the below create script and it created the table:-
Create writable external table FLTR (like dbname.FLTR)
LOCATION ('gpfdist://172.90.38.190:8081/fltr.out')
FORMAT 'CSV' (DELIMITER ',' NULL '')
DISTRIBUTED BY (fltr_key);
But when I tried inserting into the file like insert into fltr.out select * from dbname.fltr
I got the below error, cannot find server connection.
Please help me out
I think your gpfdist is probably not running try:
gpfdist -p 8081 -l ~/gpfdist.log -d ~/ &
on 172.90.38.190.
This will start gpfidist using your home directory as the data directory.
When I do that my inserts work and create a file ~/fltr.out

Resources