external table gphdfs protocol command ended with error. Error occurred during initialization of VM - greenplum

testdb=# CREATE EXTERNAL TABLE sales_fact_1997 ( product_id int, time_id int, customer_id int, promotion_id int, store_id int, store_sales decimal, store_cost decimal, unit_sales decimal ) LOCATION ('gphdfs://hz-cluster2/user/nrpt/hive-server/foodmart.db/sales_fact_1997') FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE
testdb=#
testdb=#
testdb=#
testdb=# select * from sales_fact_1997 ;
ERROR: external table gphdfs protocol command ended with error. Error occurred during initialization of VM (seg0 slice1 sdw1:40000 pid=3450)
DETAIL:
Could not reserve enough space for object heap
Could not create the Java virtual machine.
Command: 'gphdfs://le/user/nrpt/hive-server/foodmart.db/sales_fact_1997'
External table sales_fact_1997, file gphdfs://hz-cluster2/user/nrpt/hive-server/foodmart.db/sales_fact_1997
I changed the value of -Xmx from hadoop-2.5.2/etc/hadoop/hadoop-env.sh file and I see the can used memory is enough of JVM. but I still get this error.
as follows
#localhost ~]$ free -m
export GP_JAVA_OPT='-Xms20m -Xmx20m -XX:+DisplayVMOutputToStderr'
total used free shared buff/cache available
Mem: 993 114 393 219 485 518
Swap: 819 0 819
who can help me ,I created EXTERNAL TABLE succeed but I can't read data from hdfs.

You either don't have enough memory on your segment hosts or you need to allocate more memory for the JVM. Here is how you can configure the JVM for gphdfs:
http://gpdb.docs.pivotal.io/4380/admin_guide/load/topics/g-gphdfs-jvm.html

Related

HiveAccessControlException Permission denied: user [hive] does not have [ALL] privilege on [hdfs://sandbox-....:8020/user/..] (state=42000,code=40000)

When I'm trying to load a CSV file from local hadoop on sandbox to hive table, I'm getting the following exception
LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice';
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hive] does not have [ALL] privilege on [hdfs://sandbox-hdp.hortonworks.com:8020/user/ma
ria_dev/practice] (state=42000,code=40000)
I used the following code, can you please suggest a solution for this?
CREATE TABLE Sales_transactions(
Transaction_date DATE,
Product STRING,
Price FLOAT,
Payment_Type STRING,
Name STRING,
City STRING,
State STRING,
Country STRING,
Account_Created TIMESTAMP,
Last_Login TIMESTAMP,
Latitude FLOAT,
Longitude FLOAT,
Zip STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
**LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice';** //Error pointing this line.
It is actually two step process and i think you missed step1.(Assuming your user have all proper access.)
Step 1 - Load local file into hdfs file system.
hdfs dfs -put /~/Sales_transactions.csv hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice`
Step 2 - Then load above hdfs data into the table.
load data inpath 'hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice/Sales_transactions.csv' into table myDB.Sales_transactions_table
Alternately you can use this as well -
LOAD DATA LOCAL INPATH '/~/Sales_transactions.csv' INTO TABLE mydb.Sales_transactions_table;

Getting NULL values after loading data into Hive tables from an online dataset

I am trying to load a data from an online dataset into my hive table using hue interface but I am getting NULL values.
Here's my dataset:
https://www.kaggle.com/psparks/instacart-market-basket-analysis?select=aisles.csv
Here's my code:
CREATE TABLE IF NOT EXISTS AISLES (aisles_id INT, aisles STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");
Here's how I loaded the data:
LOAD DATA LOCAL INPATH '/home/hadoop/aisles.csv' INTO TABLE aisles;
My Workaround, but no go:
FIELDS TERMINATED BY ','
FIELDS TERMINATED BY '\t'
FIELDS TERMINATED BY ''
FIELDS TERMINATED BY ' '
Also tried removing LINES TERMINATED BY '\n'
This is how I downloaded the data:
[hadoop#ip-172-31-76-58 ~]$ wget -O aisles.csv "https://www.kaggle.com/psparks/instacart-market-basket-analysis?select=aisles.csv"
--2020-10-14 23:50:06-- https://www.kaggle.com/psparks/instacart-market-basket-analysis?select=aisles.csv
Resolving www.kaggle.com (www.kaggle.com)... 35.244.233.98
Connecting to www.kaggle.com (www.kaggle.com)|35.244.233.98|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘aisles.csv’
I checked the location of the table I created this is what it says;
hdfs://ip-172-31-76-58.ec2.internal:8020/user/hive/warehouse/aisles
I tried browsing the directory and see where the file was saved:
[hadoop#ip-172-31-76-58 ~]$ hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxrwxrwt - arjiesaenz hadoop 0 2020-10-15 00:57 /user/hive/warehouse/aisles
So, I tried to change my load script like this;
LOAD DATA INPATH '/user/hive/warehouse/aisles.csv' INTO TABLE aisles;
But I got an error:
Error while compiling statement: FAILED: SemanticException line 6:61 Invalid path ''/user/hive/warehouse/aisles.csv'': No files matching path hdfs://ip-172-31-76-58.ec2.internal:8020/user/hive/warehouse/aisles.csv
Hopefully someone can help me pinpoint the problem with my code.
Thanks.
I tried the same on my hadoop cluster. The code worked without any issues.
Here's my execution snippet:
hive> CREATE TABLE IF NOT EXISTS AISLES (aisles_id INT, aisles STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> tblproperties("skip.header.line.count"="1");
OK
Time taken: 0.034 seconds
hive> load data inpath '/user/hirwuser1448/aisles.csv' into table AISLES;
Loading data to table revisit.aisles
Table revisit.aisles stats: [numFiles=1, totalSize=2603]
OK
Time taken: 0.183 seconds
hive> select * from AISLES limit 10;
OK
1 prepared soups salads
2 specialty cheeses
3 energy granola bars
4 instant foods
5 marinades meat preparation
6 other
7 packaged meat
8 bakery desserts
9 pasta sauce
10 kitchen supplies
Time taken: 0.038 seconds, Fetched: 10 row(s)
I think you need to cross check if your dataset aisles.csv is at the hdfs location and not stored at local directory.
The problem is with your load cmd.
LOAD DATA INPATH '/user/hive/warehouse/aisles.csv' INTO TABLE aisles;
I see you tried browsing the dir to see the saved file. Do you see aisles.csv under that dir? If the file's there, then you're giving wrong path in your load cmd else file isn't there at all.
I found a workaround by downloading the dataset and uploaded it into the Amazon S3 bucket and used the S3 path in the LOAD command.

Error while unmounting mmcblk1p1 on beaglebone black - during repartitioning and formatting

Hi I'm a newbie of embeded linux. I'm following this tutorial (https://e2e.ti.com/support/embedded/linux/f/354/t/398780?Script-to-Erase-Emmc-independently-Beagle-Bone-Black) for flashing my linux system to beaglebone eMMC.
But I have an error: umount: can't umount /dev/mmcblk1p1: Invalid argument
This is my cmd :
Disk /dev/mmcblk1: 3825 MB, 3825205248 bytes
4 heads, 16 sectors/track, 116736 cylinders
Units = cylinders of 64 * 512 = 32768 bytes
Device Boot Start End Blocks Id System
/dev/mmcblk1p1 * 2048 2536 15648 e Win95 FAT16 (LBA)
/dev/mmcblk1p2 1 2047 65496 83 Linux
Partition table entries are not in disk order
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table
[ 235.274729] mmcblk1: p1 p2
root#beaglebone:/# umount /dev/mmcblk1p1
umount: can't umount /dev/mmcblk1p1: Invalid argument
Sorry my English is not good. Does anybody have any idea of what did I do wrong or did I miss something?
This is an error in the script you are following. If you have created new partitions without a file system you would not expect them to be mounted.
Creating the 2nd partition in sectors 1 - 2047 is probably not what you want to do. You should use all the space after partition 1.

Presto Query HIVE Table Exception: Failed to list directory

I'm new to Presto. I have two machine for presto 0.160, one is coordinator, the other is worker. I want to query table in hive. Now I can "show tables", "desc tablename", but when I want to "select * from tablename", exception occured: "Query 20170728_123013_00011_q4s3a failed: Failed to list directory: hdfs://cdh-test/user/hive/warehouse/employee_hive"
presto> desc hive.default.employee_hive;
Column | Type | Comment
-------------+---------+---------
eid | integer |
name | varchar |
salary | varchar |
destination | varchar |
(4 rows)
Query 20170728_123001_00010_q4s3a, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
0:00 [4 rows, 268B] [40 rows/s, 2.68KB/s]
presto> select * from hive.default.employee_hive;
Query 20170728_123013_00011_q4s3a, FAILED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20170728_123013_00011_q4s3a failed: Failed to list directory: hdfs://cdh-test/user/hive/warehouse/employee_hive
Here is my configuration for hive catalog:
connector.name=hive-cdh4
hive.metastore.uri=thrift://***:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
where am I wrong?
The path that the table is stored on needs to exist on HDFS for Presto to open it successfully. From the path it appears your table is an "internal" hive table, meaning hive should have created the path itself. Since it hasn't, you could create it yourself using a command similar to hdfs dfs -mkdir hdfs://cdh-test/user/hive/warehouse/employee_hive, although the exact command depends on your HDFS set up.
you can't access the hadoop directory directory. I hope you have created the table as textfile and it stores internal directory of respective user.
you just create table as external table and you can able to access via presto
Create External Table tablename (columnames datatypes) row format delimited fields terminated by '\t' stored as textfile;
load data inpath 'Your_hadoop_directory' into table tablename;
else you just create a internal table and load it to external ORC table and access via presto
Create Table tablename (columnames datatypes) row format delimited fields terminated by '\t' stored as textfile;
load data inpath 'Your_hadoop_directory' into table tablename;
Create external Table tablename (columnames datatypes) STORED AS ORC;
insert into orc_tablename select * from internal_tablename
I solved above issue by creating ORC table.

SQL Loader error for opening log file

I have created an external table :
CREATE TABLE XX_Lookup_EXT
(
LOOKUP_TYPE varchar2(200),
LOOKUP_CODE varchar2(200),
MEANING varchar2(200),
ENABLED_FLAG varchar2(10)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY INTF_DIR1
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE SKIP 1
NODISCARDFILE
FIELDS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
)
LOCATION (INTF_DIR1:'LOOKUP_CODE.csv')
)
REJECT LIMIT UNLIMITED
NOPARALLEL
nomonitoring;
When I am querying this table it is giving me the following error :
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
error opening file /orabin/tst/test/XX_LOOKUP_EXT_30723.log
29913. 00000 - "error in executing %s callout"
*Cause: The execution of the specified callout caused an error.
*Action: Examine the error messages take appropriate action.
I have tried everything out. Still I am getting this error.
#alex poole is right. The /orabin/tst/test/ directory must be local to the database and the database server account, usually 'oracle', needs read and write permissions within the directory.

Resources