how to access hadoop hdfs with greenplum external table - hadoop

oue datawarehouse is based on hive,now we need to transform data from hive to greenplum,we want to use external table with gphdfs,but it looks something goes wrong.
the table creating script is
CREATE EXTERNAL TABLE flow.http_flow_data(like flow.zb_d_gsdwal21001)
LOCATION ('gphdfs://mdw:8081/user/hive/warehouse/flow.db/d_gsdwal21001/prov_id=018/day_id=22/month_id=201202/data.txt')
FORMAT 'TEXT' (DELIMITER ' ');
when we run
bitest=# select * from flow.http_flow_data limit 1;
ERROR: external table http_flow_data command ended with error. sh: java: command not found (seg12 slice1 sdw3:40000 pid=17778)
DETAIL: Command: gphdfs://mdw:8081/user/hive/warehouse/flow.db/d_gsdwal21001/prov_id=018/day_id=22/month_id=201202/data.txt
our hadoop is 1.0 and greenplum is 4.1.2.1
I want to know if we need to config something about to make gp access hadoop

Have you opened the port (8081) to listen for the month_id=201202 directory?

I would double check the admin guide, I think you can use gphdfs, but not until greenplum 4.2

have you checked to ensure that java is installed on your Greenplum system? as this is required in order for gphdfs to work.

Related

error getting while creating hive table

Before creating the twitter table i added this
ADD JAR hdfs:///user/hive/warehouse/hive-serdes-1.0-SNAPSHOT.jar;
I got the following error when create the twitter table in hive:
Error while processing statement: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde:
com.cloudera.hive.serde.JSONSerDe
Move the Jar from HDFS to Local.
Then try to add JAR in hive terminal
Then try to use the query on Twitter Table
Ideally speaking you can add jars from both Local file system or hdfs, looks like problem could be something else here.
I would recommend to follow below sequence of steps:
List the file on hdfs to make sure it exists
hadoop fs -ls hdfs://namenode_hostname:8020/user/hive/warehouse/hive-serdes-1.0-SNAPSHOT.jar
Add the jar in the hive using full path like above and verify the
addition using list jars command in hive cli Use the serde in
hive>list jars;
create table statement with proper syntax as show here for example
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe

Unable to create table in Hive

I am running below simple query to create a simple table.
create table test (id int, name varchar(20));
But I am getting the below error, please let know what need to be done exactly.
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user
/hive/warehouse/test is not a directory or unable to create one)
I have given full read/write access to /user/hive/warehouse folder.
your hive user doesn't have permission for create director into hdfs. Whenever you create a table, hive will make a directory into User/hive/warehouse/table but here It's not able to create a directory into user/hive/warehouse/ so give permission to this directory to allow your user to create a table.
http://www.cloudera.com/documentation/archive/cdh/4-x/4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_7.html
Sounds like a permissions issue. The change mode command below may help.
hadoop fs -chmod -R 755 /user/hive/warehouse/
The error message says
file:/user /hive/warehouse/test".
Despite that space between the /user and the rest of the path, file:/ means that Hive is trying to create that directory on your local file system instead on hdfs. There is probably problem with accessing configuration. I would check is HADOOP_CONF_DIR environment variable is properly initialized.
For me, the issue was exactly like yours(create internal table), not related to permission, but to storage, it was 100% used. Try checking for the same.

Issue with load data into HIVE

We have launched two EMR in AWS and installed the hadoop and hive-0.11.0 in one EMR and hive-0.13.1 other one.
Everything seems to be working fine but while trying to loading data into TABLE it's giving the below error and it happening in both the Hive Servers.
ERROR MESSAGE:
An error occurred when executing the SQL command: load data inpath
's3://buckername/export/employee_1/' into table employee_2 Query
returned non-zero code: 10028, cause: FAILED: SemanticException [Error
10028]: Line 1:17 Path is not legal
''s3://buckername/export/employee_1/'': Move from:
s3://buckername/export/employee_1 to:
hdfs://XXX.XX.XXX.XX:X000/mnt/hive_0110/warehouse/employee_2 is not
valid. Please check that values for params "default.fs.name" and
"hive.metastore.warehouse.dir" do not conflict. [SQL State=42000, DB
Errorcode=10028]
I searched for the reason and mean of this message, I found this link but when tried to execute command suggested in the given link it's also giving the below error.
Command:
--service metatool -updateLocation hdfs://XXX.XX.XXX.XX:X000 hdfs://XXX.XX.XXX.XX:X000
Initializing HiveMetaTool.. HiveMetaTool:Parsing failed. Reason:
Unrecognized option: -hiveconf
Any help in this will be really appreciated.
LOAD does not support S3. It is best practice to leave data in S3 and just use it as a Hive external table instead of copying the data to HDFS. Some references http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html and When you create an external table in Hive with an S3 location is the data transfered?
If you have installed hive on your Hadoop cluster, the default storage of hive data is HDFS (hive.metastore.warehouse.dir=/user/hive/warehouse).
As a workaround you can copy the file from S3 file system to HDFS and then from HDFS load the file to hive.
Most probably we may need to modify the parameter "hive.exim.uri.scheme.whitelist=hdfs,pfile" to load the data from S3 file system.

Export data into Hive from a node without Hadoop(HDFS) installed

Is it possible to export data from a node that has not hadoop(HDFS) or Sqoop installed to a Hive server?
I would read the data from a source which could be Mysql or just files in some directory and then use the Hadoop core classes or something like Sqoop to export the data into my Hadoop cluster.
I am programming in Java.
Since you are final destination is a hive table. I would suggest the following :
Create a hive final table.
use the following command to load data from the other node
LOAD DATA LOCAL INPATH '<full local path>/kv1.txt' OVERWRITE INTO TABLE table_name;
refer this
Using Java , You could use JSCH lib to invoke these shell commands or so .
Hope this helps.

How to add SerDe jar

I use Hive to create table store sequencefile. Row format is serder class myserde.TestDeserializer in hiveserde-1.0.jar
In the command line I use this command to add the jar file:
hive ADD JAR hiveserde-1.0.jar
Then I create a table, the file loads successfully.
But now I want to run it and create a table on the client by using mysql jdbc.
The error is :
SerDe: myserde.TestDeserializer does not exist.
How to run it ? Thanks
So, there are a few options. In all of them the jar needs to be present on your cluster with Hive installed. The JDBC client code, of course, can be run from anywhere within or outside of the cluster.
Option 1: You issue a HQL query before you run any of your other HQL commands:
ADD JAR hiveserde-1.0.jar
Option 2: You can update your hive-site.xml to have the
hive.aux.jars.path property set to the complete path to your jar hiveserde-1.0.jar
Go to your hive-env.sh and append to the bottom of the file:
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH:/<path-to-jar>
You can then source this file. Not ideal, but it works.
Are you saying that you'd like to create table by jdbc rather than doing in CLI ? In that case, you should add the jar to your classpath when you run your jdbc code.
Yes this can be a little bit confusing, it seems half the time Hive is reading from the cluster and the other half from the local file system (machine Hive server is installed).
To overcome this simple copy the .jar file to the Hive server machine and you can then reference this in your Hive query for example:
add jar /tmp/json-serde.jar;
create table tweets (
name string,
address1 string,
address2 string,
address3 string,
postcode string
)
...
And then onto the next problem ;)

Resources