Presto: Cannot INSERT values - hadoop

I can create and drop tables and do query normally in Presto, but when I use insert, it's always wrong as shown bellow:
presto:default> create table test.lll (a int);
CREATE TABLE
presto:default> insert into test.lll select 1;
Query 20180104_091933_00007_k8e78, FAILED, 5 nodes
Splits: 84 total, 30 done (35.71%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20180104_091933_00007_k8e78 failed: No page sink provider for connector 'hive'
What is the reason and how to address it? Any help is appreciated.
Error Type: INTERNAL_ERROR
Error Code: GENERIC_INTERNAL_ERROR (65536)
Full stack trace:
java.lang.IllegalArgumentException: No page sink provider for connector 'hive'
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
at com.facebook.presto.split.PageSinkManager.providerFor(PageSinkManager.java:67)
at com.facebook.presto.split.PageSinkManager.createPageSink(PageSinkManager.java:61)
at com.facebook.presto.operator.TableWriterOperator$TableWriterOperatorFactory.createPageSink(TableWriterOperator.java:97)
at com.facebook.presto.operator.TableWriterOperator$TableWriterOperatorFactory.createOperator(TableWriterOperator.java:88)
at com.facebook.presto.operator.DriverFactory.createDriver(DriverFactory.java:92)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.createDriver(SqlTaskExecution.java:515)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.access$1400(SqlTaskExecution.java:490)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:616)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at com.facebook.presto.execution.executor.LegacyPrioritizedSplitRunner.process(LegacyPrioritizedSplitRunner.java:23)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:492)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)`

Related

ORACLE: Table or View Not found using Semantic Sparql query

On a Windows Server 19 based Oracle 19c Enterprise database, as the EE user, I created a Triple table, a user owned network and user owned model:
CREATE TABLE EE.RDF_WORDNET (TRIPLE MDSYS.SDO_RDF_TRIPLE_S)
COLUMN TRIPLE NOT SUBSTITUTABLE AT ALL LEVELS TABLESPACE USERS
LOGGING COMPRESS NOCACHE PARALLEL MONITORING;
exec sem_apis.create_sem_network('semts', network_owner=>'EE', network_name=>'EE_WordNet' );
exec sem_apis.create_sem_model('wn','RDF_WORDNET','triple', network_owner=>'EE', network_name=>'EE_WordNet');
Then bulk-load in a ton of data from the Princeton Wordnet, which all goes in with out error....and create the entailment
exec SEM_APIS.CREATE_ENTAILMENT('rdfs_rix_wn',
SEM_Models('wn'),
SEM_Rulebases('RDFS'),
network_owner=>'EE',
network_name=>'EE_WordNet');
I can check the tables and all the data looks like its good, and the views/tables created in the network_owner (EE) schema look right, but when I run a SPARQL query as the EE user (network/model owner), I get ORA-00942: table or view does not exist, and I can't figure out what it can't see....
Select *
From Table(Sem_Match('(?s <wn20schema:containsWordSense> ?ws)
( ?ws <wn20schema:word> ?w)
( ?w <wn20schema:lexicalForm> ?l )
( ?s <wn20schema:containsWordSense> ?ws2)
( ?ws2 <wn20schema:word> ?w2)
( ?w2 <wn20schema:lexicalForm> ?l2 )',
Sem_Models('wn'), Null, Null, Null))
Where Upper(L) = Upper('Gold');
Results in:
ORA-00942: table or view does not exist
ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line 161
ORA-06512: at "MDSYS.RDF_APIS_INTERNAL", line 8702
ORA-06512: at "MDSYS.S_SDO_RDF_QUERY", line 26
ORA-06512: at "MDSYS.RDF_APIS_INTERNAL", line 8723
ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line 144
ORA-06512: at line 4
There are 25 tables created in the EE schema when the model is created:
EE_WORDNET#RDF_CLIQUE$
EE_WORDNET#RDF_COLLISION$
EE_WORDNET#RDF_CRS_URI$
EE_WORDNET#RDF_DELTA$
EE_WORDNET#RDF_GRANT_INFO$
EE_WORDNET#RDF_HIST$
EE_WORDNET#RDF_LINK$
EE_WORDNET#RDF_MODEL$_TBL
EE_WORDNET#RDF_MODEL_INTERNAL$
EE_WORDNET#RDF_NAMESPACE$
EE_WORDNET#RDF_NETWORK_INDEX_INTERNAL$
EE_WORDNET#RDF_PARAMETER
EE_WORDNET#RDF_PRECOMP$
EE_WORDNET#RDF_PRECOMP_DEP$
EE_WORDNET#RDF_PRED_STATS$
EE_WORDNET#RDF_RI_SHAD_2$
EE_WORDNET#RDF_RULE$
EE_WORDNET#RDF_RULEBASE$
EE_WORDNET#RDF_SESSION_EVENT$
EE_WORDNET#RDF_SYSTEM_EVENT$
EE_WORDNET#RDF_TERM_STATS$
EE_WORDNET#RDF_TS$
EE_WORDNET#RDF_VALUE$
EE_WORDNET#RENAMED_APPTAB_RDF_MODEL_ID_1
EE_WORDNET#SEM_INDEXTYPE_METADATA$
and 41 views, including the RDF_WORDNET, which was originally created as the Triple table above...
EE_WORDNET#RDFI_RDFS_RIX_WN
EE_WORDNET#RDFM_WN
EE_WORDNET#RDFR_OWL2EL
EE_WORDNET#RDFR_OWL2RL
EE_WORDNET#RDFR_OWLPRIME
EE_WORDNET#RDFR_OWLSIF
EE_WORDNET#RDFR_RDF
EE_WORDNET#RDFR_RDFS
EE_WORDNET#RDFR_RDFS++
EE_WORDNET#RDFR_SKOSCORE
EE_WORDNET#RDFT_WN
EE_WORDNET#RDF_DTYPE_INDEX_INFO
EE_WORDNET#RDF_MODEL$
EE_WORDNET#RDF_PRIV$
EE_WORDNET#RDF_RULEBASE_INFO
EE_WORDNET#RDF_RULES_INDEX_DATASETS
EE_WORDNET#RDF_RULES_INDEX_INFO
EE_WORDNET#RDF_VMODEL_DATASETS
EE_WORDNET#RDF_VMODEL_INFO
EE_WORDNET#SEMI_RDFS_RIX_WN
EE_WORDNET#SEMM_WN
EE_WORDNET#SEMP_WN
EE_WORDNET#SEMR_OWL2EL
EE_WORDNET#SEMR_OWL2RL
EE_WORDNET#SEMR_OWLPRIME
EE_WORDNET#SEMR_OWLSIF
EE_WORDNET#SEMR_RDF
EE_WORDNET#SEMR_RDFS
EE_WORDNET#SEMR_RDFS++
EE_WORDNET#SEMR_SKOSCORE
EE_WORDNET#SEMT_WN
EE_WORDNET#SEM_DTYPE_INDEX_INFO
EE_WORDNET#SEM_INF_HIST
EE_WORDNET#SEM_MODEL$
EE_WORDNET#SEM_NETWORK_INDEX_INFO
EE_WORDNET#SEM_RULEBASE_INFO
EE_WORDNET#SEM_RULES_INDEX_DATASETS
EE_WORDNET#SEM_RULES_INDEX_INFO
EE_WORDNET#SEM_VMODEL_DATASETS
EE_WORDNET#SEM_VMODEL_INFO
RDF_WORDNET
As a test, I granted Select, Insert, Update to MDSYS on all these tables and views, made no difference...
I might add on an Oracle 12.2 database with a system owned network, and the same data bulk-loaded into the same model, this query returns the expected data, so this has something to do with it being a user owned network...
Any thoughts as to who needs permissions to what for this to work?

Error opening Hive split hdfs Can not read SQL type double from ORC stream .commission of type STRING

select commission from dby_pro.ods_bi_user_event_ih limit 5;
Query 20201203_081000_00431_49yi4, FAILED, 1 node
Splits: 117 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20201203_081000_00431_49yi4 failed: Error opening Hive split hdfs://mycluster/warehouse/tablespace/external/hive/dby_pro.db/ods_bi_user_event_ih/day=20201130/ods_bi_user_event_ih__516fdf2e_af6b_476a_bf46_e1479d487f85 (offset=0, length=2849658): java.io.IOException: Malformed ORC file. Can not read SQL type double from ORC stream .commission of type STRING [hdfs://mycluster/warehouse/tablespace/external/hive/dby_pro.db/ods_bi_user_event_ih/day=20201130/ods_bi_user_event_ih__516fdf2e_af6b_476a_bf46_e1479d487f85]
When I query by hive sql,no problem. but query by presto sql, have this problem.
Who can help me,very thank you!

How to import data from a hbase table to hive table?

I've created a Hbase table like this,
create 'student','personal'
and I've put some data into it like this.
ROW COLUMN+CELL
1 column=personal:age, timestamp=1456224023454, value=20
1 column=personal:name, timestamp=1456224008188, value=pesronA
2 column=personal:age, timestamp=1456224891317, value=13
2 column=personal:name, timestamp=1456224868967, value=pesronB
3 column=personal:age, timestamp=1456224935178, value=21
3 column=personal:name, timestamp=1456224921246, value=personC
4 column=personal:age, timestamp=1456224951789, value=20
4 column=personal:name, timestamp=1456224961845, value=personD
5 column=personal:age, timestamp=1456224983240, value=20
5 column=personal:name, timestamp=1456224972816, value=personE
-
I want to import this data to a hive table. I wrote a hive query for that like this
CREATE TABLE hbaseStudent(key INT,name STRING,age INT) STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal:age,personal:name") TBLPROPERTIES("hbase.table.name" = "student")
But when I execute the query error comes out like this.
Driver returned: 1. Errors: OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hbase/HBaseConfiguration
what should i do?
I tried this thing and it worked try replacing all the double quotes (") with single quotes ('). It will work & also try to add terminator ; in last line.

Too many counter groups while storing Hive partitioned table as parquet

I have created a table sample with id as its partition and stored it in parquet format.
create table sample(uuid String,date String,Name String,EmailID String,Comments String,CompanyName String,country String,url String,keyword String,source String) PARTITIONED BY (id String) Stored as parquet;
Then I inserted values into it using below command
INSERT INTO TABLE sample PARTITION (id) Select uuid,date,Name,EmailID,Comments,CompanyName,country,url,keyword,source,id from inter distribute by id;
This query results in following issue
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:97) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(AbstractCounterGroup.java:123) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:113) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:130) at org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369) at org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper$ReportStats.func(ExecMapper.java:261) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:850) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:853) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:289) ... 7 more Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkGroups(Limits.java:118) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:230) at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:113) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.stats.CounterStatsPublisher.publishStat(CounterStatsPublisher.java:54) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1167) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1017) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) ... 7 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 137 Container exited with a non-zero exit code 137
NOTE id column have 1 million distinct values
Any one help me in this?
You should expand the counters limit, such as:
mapreduce.job.counters.limit=1000
mapreduce.job.counters.max=1000
mapreduce.job.counters.groups.max=500
mapreduce.job.counters.group.name.max=1000
mapreduce.job.counters.counter.name.max=500

Hive 0.13.0, partition and buckets

I am getting an error message, which is very different from 2 test runs.
I verified data types data and exactly - double value but there is an issue with type cast.
Why does this occur?, please help me to fix
DROP TABLE XXSCM_SRC_SHIPMENTS;
CREATE TABLE IF NOT EXISTS XXSCM_SRC_SHIPMENTS(
INVENTORY_ITEM_ID DOUBLE
,ORDERED_ITEM STRING
,SHIP_FROM_ORG_ID DOUBLE
,QTR_START_DATE STRING
,QTR_END_DATE STRING
,SEQ DOUBLE
,EXTERNAL_SHIPMENTS DOUBLE
-- ,PREV_EXTERNAL_SHIPMENTS DOUBLE
,INTERNAL_SHIPMENTS DOUBLE
--,PREV_INTERNAL_SHIPMENTS DOUBLE
,AVG_SELL_PRICE DOUBLE)
--,PREV_AVG_SELL_PRICE DOUBLE)
COMMENT 'DIMENTION FOR THE SHIPMENTS LOCAL AND GLOBAL'
PARTITIONED BY (ORGANIZATION_CODE STRING, FISCAL_PERIOD STRING)
CLUSTERED BY (INVENTORY_ITEM_ID, ORDERED_ITEM, SHIP_FROM_ORG_ID, QTR_START_DATE, QTR_END_DATE, SEQ)
SORTED BY (INVENTORY_ITEM_ID ASC, ORDERED_ITEM ASC, SHIP_FROM_ORG_ID ASC, QTR_START_DATE ASC, QTR_END_DATE ASC, SEQ ASC)
INTO 256 BUCKETS
STORED AS ORC TBLPROPERTIES("orc.compress"="SNAPPY");
1) Error Fails
SELECT inventory_item_id,ordered_item,ship_from_org_id,qtr_start_date,qtr_end_date,seq,external_shipments FROM supply_chain_pcam.XXSCM_SRC_SHIPMENTS limit 100
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable
2) Got the result successfully
hive -e "set hive.cli.print.header=true;select * from supply_chain_pcam.xxscm_src_shipments limit 100"
There is a problem in SHIP_FROM_ORG_ID field. You will need to re-create the table XXSCM_SRC_SHIPMENTS with the correct data type. Hive is not able to parse that field. You almost had the answer - if
select *
is fetching the result then try-out individual fields - Here it says a cast exception to double so you can take only the double fields.

Resources