Using Elastic search version 6.8.0
hive> select * from provider1;
OK
{"id","k11",}
{"id","k12",}
{"id","k13",}
{"id","k14",}
{"id":"K1","name":"Ravi","salary":500}
{"id":"K2","name":"Ravi","salary":500}
{"id":"K3","name":"Ravi","salary":500}
{"id":"K4","name":"Ravi","salary":500}
{"id":"K5","name":"Ravi","salary":500}
{"id":"K6","name":"Ravi","salary":"sdfgg"}
{"id":"K7","name":"Ravi","salary":"sdf"}
{"id":"k8"}
{"id":"K9","name":"r1","salary":522}
{"id":"k10","name":"r2","salary":53}
Time taken: 0.179 seconds, Fetched: 14 row(s)
ADD JAR /home/smrafi/elasticsearch-hadoop-6.8.0/dist/elasticsearch-hadoop-6.8.0.jar;
CREATE external TABLE hive_es_with_handler( data STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource' = 'test_eshadoop/healthCareProvider',
'es.nodes' = 'vpc-pid-pre-prod-es-cluster-b7thvqfj3tp45arxl34gge3yyi.us-east-2.es.amazonaws.com',
'es.input.json' = 'yes',
'es.index.auto.create' = 'true',
'es.write.operation'='upsert',
'es.nodes.wan.only' = 'true',
'es.port' = '443',
'es.net.ssl'='true',
'es.batch.size.entries'='1',
'es.mapping.id' ='id',
'es.batch.write.retry.count'='-1',
'es.batch.write.retry.wait'='60s',
'es.write.rest.error.handlers' = 'es, ignoreBadRecords',
'es.write.data.error.handlers' = 'customLog',
'es.write.data.error.handler.customLog' = 'com.verisys.elshandler.CustomLogOnError',
'es.write.rest.error.handler.es.client.resource'="error_es_index/error",
'es.write.rest.error.handler.es.return.default'='HANDLED',
'es.write.rest.error.handler.log.logger.name' = 'BulkErrors',
'es.write.data.error.handler.log.logger.name' = 'SerializationErrors',
'es.write.rest.error.handler.ignoreBadRecords' = 'com.verisys.elshandler.IgnoreBadRecordHandler',
'es.write.rest.error.handler.es.return.error'='HANDLED');
insert into hive_es_with_handler10 select * from provider1;
Below is exception trace, it failed complaining the error.handler index is not present
Caused by: org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character (',' (code 44)): was expecting a colon to separate field name and value at [Source: [B#1e3f0aea; line: 1, column: 7]
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:95)
at org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.java:168)
at org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.java:151)
at org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process(JsonFieldExtractors.java:213)
at org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess(JsonTemplatedBulk.java:64)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:54)
at org.elasticsearch.hadoop.hive.EsSerDe.serialize(EsSerDe.java:171)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
... 9 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character (',' (code 44)): was expecting a colon to separate field name and value at [Source: [B#1e3f0aea; line: 1, column: 7]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:442)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:500)
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:93) ... 22 more
I tried to use the custom SerializationErrorHandler But it is of no use and Handler is not coming into context, Its completely stopping the job instead of continuing for the good records even After having default (HANDLED as the constant)
Seems you have invalid JSON
Mentioned in the docs, this is not handled by Hive
Serialization Error Handlers are not yet available for Hive. Elasticsearch for Apache Hadoop uses Hive’s SerDe constructs to convert data into bulk entries before being sent to the output format. SerDe objects do not have a cleanup method that is called when the object ends its lifecycle. Because of this, we do not support serialization error handlers in Hive as they cannot be closed at the end of the job execution
Related
I am getting error on JoinEnrichment and don't know why it is coming.
My query in JoinEnrichment :
SELECT original.*, enrichment.*
FROM original
FULL JOIN enrichment
ON original.name = enrichment.name
Error
JoinEnrichment[id=a17b1f06-f80c-3641-945f-c2ff331f8028] Failed to join 'original' FlowFile FlowFile[filename=cmdbci-mtaas] and 'enrichment' FlowFile FlowFile[filename=cmdbci-mtaas]; routing to failure: java.sql.SQLException: Error while preparing statement [SELECT original., enrichment.
FROM original
FULL JOIN enrichment
ON original.name = enrichment.name]
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 4, column 4 to line 4, column 34: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <JAVATYPE(CLASS JAVA.LANG.STRING)>'. Supported form(s): '<COMPARABLE_TYPE> = <COMPARABLE_TYPE>'
Caused by: : Corg.apache.calcite.sql.validate.SqlValidatorExceptionannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <JAVATYPE(CLASS JAVA.LANG.STRING)>'. Supported form(s): '<COMPARABLE_TYPE> = <COMPARABLE_TYPE>'
Sample input from original
Location,Environment,ip_address,category,name,dv_sys_updated_on
"",,,Hardware,Ndiggan,2022-12-17 22:37:28
"",,,Hardware,class,2022-12-31 22:37:38
"",,,,Vlan2,2022-12-27 02:17:13
Sample input from enrichment
Location,Environment,ip_address,category,name,dv_sys_updated_on
"",,,Hardware,vpna,2022-12-17 22:36:02
"",,,Hardware,dlcccno,2022-12-17 22:37:04
"",,,Hardware,Ndiggan,2022-12-17 22:37:28
While working on flume (1.6& 1.7) I am experiencing the below error
2016-12-02 00:57:11,634 (pool-3-thread-1) [WARN - org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:143)] Line length exceeds max (2048), truncating line!
2016-12-02 00:57:11,777 (pool-3-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:280)] FATAL: Spool Directory source r2: { spoolDir: /home/h/flume/forex }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input within/between OBJECT entries
at [Source: java.io.ByteArrayInputStream#319bed83; line: 1, column: 4097]
at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:136)
at org.apache.flume.sink.solr.morphline.MorphlineInterceptor$LocalMorphlineInterceptor.intercept(MorphlineInterceptor.java:163)
at org.apache.flume.sink.solr.morphline.MorphlineInterceptor$LocalMorphlineInterceptor.intercept(MorphlineInterceptor.java:152)
at org.apache.flume.sink.solr.morphline.MorphlineInterceptor.intercept(MorphlineInterceptor.java:74)
at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:148)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:258)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input within/between OBJECT entries
at [Source: java.io.ByteArrayInputStream#319bed83; line: 1, column: 4097]
at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:98)
at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:130)
... 13 more
Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input within/between OBJECT entries
at [Source: java.io.ByteArrayInputStream#319bed83; line: 1, column: 4097]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1524)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWS(UTF8StreamJsonParser.java:2547)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:709)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:217)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:63)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:14)
at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:189)
at org.kitesdk.morphline.json.ReadJsonBuilder$ReadJson.doProcess(ReadJsonBuilder.java:110)
at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
... 17 more
My agent pipeline is setup to have ingested json data from a spooldir source , extracted and transformed using a morphline interceptor.
See excerpts of flume config below
#GENERIC
a1.channels = mem-channel
a1.sources = r2
a1.sinks = k2 k3
#CHANNEL
a1.channels.mem-channel.type = memory
a1.channels.mem-channel.capacity = 10000000
a1.channels.mem-channel.transactionCapacity = 1000
#SRC
a1.sources.r2.type = spooldir
a1.sources.r2.channels = mem-channel
a1.sources.r2.spoolDir = /path/to/some/directory
a1.sources.r2.interceptors = morphline
#INTERCEPTOR
a1.sources.r2.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
a1.sources.r2.interceptors.morphline.morphlineFile =/some/morphline/forex.conf
a1.sources.r2.interceptors.morphline.morphlineId = convertJsonToCSV
#SINK
a1.sinks.k2type = logger
a1.sinks.k2.channel = mem-channel
a1.sinks.k3.type = file_roll
a1.sinks.k3.channel = mem-channel
a1.sinks.k3.sink.directory = /tmp/flume
a1.sinks.k3.batchSize = 1
RCA
The answer was staring at me all the while.The issue was not the morphline configuration nor the input son file.
Closer look at the flume WARNING line
[WARN - org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:143)] Line length exceeds max (2048), truncating line!
gave good direction to the solution.By default spooldir source uses the deserialiser.LINE.It has a parameter deserializer.maxLineLength with default value 2048. From the Flume User Guide , this parameter is defined as
Maximum number of characters to include in a single event. If a line exceeds this length, it is truncated, and the remaining characters on the line will appear in a subsequent event
The JSON objects could not be parsed as the data content was truncated on spooling processing which made it incomplete for the morphline to process the JSON byte array read.
Solution
I increased the value of deserializer.maxLineLength to 10000 (to be safe as my number of character from the son file could be large in future).
I am trying to add some computed columns to a SparkR data frame, as follows:
Orders <- withColumn(Orders, "Ready.minus.In.mins",
(unix_timestamp(Orders$ReadyTime) - unix_timestamp(Orders$InTime)) / 60)
Orders <- withColumn(Orders, "Out.minus.In.mins",
(unix_timestamp(Orders$OutTime) - unix_timestamp(Orders$InTime)) / 60)
The first command executes ok, and head(Orders) reveals the new column. The second command throws the error:
15/12/29 05:10:02 ERROR RBackendHandler: col on 359 failed
Error in select(x, x$"*", alias(col, colName)) :
error in evaluating the argument 'col' in selecting a method for function
'select': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
org.apache.spark.sql.AnalysisException: Cannot resolve column name
"Ready.minus.In.mins" among (ASAP, AddressLine, BasketCount, CustomerEmail, CustomerID, CustomerName, CustomerPhone, DPOSCustomerID, DPOSOrderID, ImportedFromOldDb, InTime, IsOnlineOrder, LineItemTotal, NetTenderedAmount, OrderDate, OrderID, OutTime, Postcode, ReadyTime, SnapshotID, StoreID, Suburb, TakenBy, TenderType, TenderedAmount, TransactionStatus, TransactionType, hasLineItems, Ready.minus.In.mins);
at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:159)
at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:159)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:158)
at org.apache.spark.sql.DataFrame$$anonfun$col$1.apply(DataFrame.scala:650)
at org.apa
Do I need to do something to the data frame after adding the new column before it will accept another one?
From the link, just use backsticks, when accessing the column, e.g.:
From using
df['Fields.fields1']
or something, use:
df['`Fields.fields1`']
Found it here: spark-issues mailing list archives
SparkR isn't entirely happy with "." in a column name.
Are the below properties in hive-site.xml correct for Hive access to cassandra??
(I HAVE COPIED ENTIRE HIVE-DEFAULT.XML CONTENT BUT HAVE CHANGED ONLY THE BELOW PROPERTIES)
javax.jdo.option.ConnectionURL : cassandra://localhost:9160
javax.jdo.option.ConnectionDriverName:org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbclass: jdbc:cassandra
hive.stats.jdbcdriver: org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbconnectionstring: jdbc:cassandra:;databaseName=TempStatsStore;create=true
I am running 1-node Cassandra. But, later would make it a minimum 2 node cluster.
When I run the below table creation command I get an error:
CREATE EXTERNAL TABLE MyHiveTable
(m string, n string, o string, p string)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
TBLPROPERTIES ( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "test",
"cassandra.cql3.type" = "text, text, text, text");
Error:
FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
don't know about jdo settings but you could try this link which is far better option for integrating hive with cassandra -
https://github.com/milliondreams/hive/tree/cas-support-cql/cassandra-handler
when i am trying to execute this query in Birt:
select a.ag_code , COUNT(distinct(a.usr_id)),b.AG_NAME
from photo a,
(select ag_code,AG_NAME from agent
WHERE AG_TYPE = 'AS' AND AG_USEFLAG = 'Y' AND AG_NAME LIKE 'M%')b
where a.upload_time BETWEEN TO_DATE('20131116000000','yyyymmddhh24miss')
AND TO_DATE('20131129235959','yyyymmddhh24miss')
and a.status = 'S'
and a.ag_code = b.ag_code
group by a.ag_code,b.AG_NAME
order by a.Ag_CODE,b.AG_NAME;
This exception arises:
org.eclipse.birt.report.engine.api.EngineException: Error happened while running the report.
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.doRun(DatasetPreviewTask.java:318)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.runDataset(DatasetPreviewTask.java:280)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.execute(DatasetPreviewTask.java:91)
at org.eclipse.birt.report.designer.data.ui.dataset.DataSetPreviewer.preview(DataSetPreviewer.java:68)
at org.eclipse.birt.report.designer.data.ui.dataset.ResultSetPreviewPage$5.run(ResultSetPreviewPage.java:366)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:121)
Caused by: org.eclipse.birt.data.engine.odaconsumer.OdaDataException: Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:ORA-00911: invalid character
;
java.sql.SQLException: ORA-00911: invalid character
at org.eclipse.birt.data.engine.odaconsumer.ExceptionHandler.newException(ExceptionHandler.java:52)
at org.eclipse.birt.data.engine.odaconsumer.ExceptionHandler.throwException(ExceptionHandler.java:108)
at org.eclipse.birt.data.engine.odaconsumer.ExceptionHandler.throwException(ExceptionHandler.java:84)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.getRuntimeMetaData(PreparedStatement.java:414)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.getProjectedColumns(PreparedStatement.java:377)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.doGetMetaData(PreparedStatement.java:347)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.execute(PreparedStatement.java:563)
at org.eclipse.birt.data.engine.executor.DataSourceQuery.execute(DataSourceQuery.java:972)
at org.eclipse.birt.data.engine.impl.PreparedOdaDSQuery$OdaDSQueryExecutor.executeOdiQuery(PreparedOdaDSQuery.java:503)
at org.eclipse.birt.data.engine.impl.QueryExecutor.execute(QueryExecutor.java:1208)
at org.eclipse.birt.data.engine.impl.ServiceForQueryResults.executeQuery(ServiceForQueryResults.java:233)
at org.eclipse.birt.data.engine.impl.QueryResults.getResultIterator(QueryResults.java:178)
at org.eclipse.birt.data.engine.impl.QueryResults.getResultMetaData(QueryResults.java:132)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.extractQuery(DatasetPreviewTask.java:352)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.doRun(DatasetPreviewTask.java:309)
... 5 more
Caused by: org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:ORA-00911: invalid character
;
java.sql.SQLException: ORA-00911: invalid character
at org.eclipse.birt.report.data.oda.jdbc.Statement.executeQuery(Statement.java:481)
at org.eclipse.birt.report.data.oda.jdbc.Statement.getMetaUsingPolicy1(Statement.java:420)
at org.eclipse.birt.report.data.oda.jdbc.Statement.getMetaData(Statement.java:316)
at org.eclipse.birt.report.data.oda.jdbc.bidi.BidiStatement.getMetaData(BidiStatement.java:56)
at org.eclipse.datatools.connectivity.oda.consumer.helper.OdaQuery.doGetMetaData(OdaQuery.java:423)
at org.eclipse.datatools.connectivity.oda.consumer.helper.OdaQuery.getMetaData(OdaQuery.java:390)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.getRuntimeMetaData(PreparedStatement.java:407)
... 16 more
Caused by: java.sql.SQLException: ORA-00911: invalid character
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:330)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:287)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:744)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:218)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:812)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1048)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:853)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1153)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3369)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3414)
at org.eclipse.birt.report.data.oda.jdbc.Statement.executeQuery(Statement.java:477)
... 22 more
Try aliasing your distinct count column - like so:
select a.ag_code , COUNT(distinct(a.usr_id)) as distinct_users, b.AG_NAME
...
You need to remove the ";" at the end of your query. Adding the ";" might work in most tools (like PL/SQL Developer), but those tools are removing the ";" before sending it to oracle.
This section of your error message
Caused by: org.eclipse.birt.data.engine.odaconsumer.OdaDataException: Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:ORA-00911: invalid character
Indicates your SQL is bad, Mark Bannister, has suggested a solution. Dependining on how bad your SQL is this part of the error message can be more helpfull, callling out specific areas to review.