AvroRuntimeException occurs when executing some hql in hive - hadoop

I was doing the Hadoop(2.6.0) twitter example by Flume(1.5.2) and Hive(0.14.0). I got data from twitter successfully via Flume and stored them to the my own hdfs.
But when I wanted to use hive to deal with these data to do some analyzing (only select one field from a table), the "Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException" exception happened and little useful information I could find related to this exception.
Actuall I can fetch most records of a file successfully (like the information below, I fetched 5100 rows successfully) but it would fail in the end. As a result I cannot deal with all the tweets files together.
Time taken: 1.512 seconds, Fetched: 5100 row(s)
Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
15/04/15 19:59:18 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.EOFException
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.avro.AvroRuntimeException: java.io.EOFException
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:222)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:153)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:629)
... 15 more
Caused by: java.io.EOFException
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
... 18 more
I use the hql below to create a table
CREATE TABLE tweets
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='file:///home/hduser/hive-0.14.0-bin/tweetsdoc_new.avsc');
then load tweets file from hdfs
LOAD DATA INPATH '/user/flume/tweets/FlumeData.1429098355304' OVERWRITE INTO TABLE tweets;
Could anyone tell me the possible reason, or an effective way to find more details of the exception?

I had this annoying problem as well.
I looked at the produced binary file and debugged Avro deserialization of bits.
The reason for this EOFException was that Flume inserts new line character byte after every event (you can notice 0x0A after every record).
Avro deserializer thinks the file hasn't finished and interprets that character as some number of blocks to read, but then can't read out that number of blocks without hitting EOF.

Related

You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved

I am using DataBricks as a service on Azure. This is my cluster info :
I ran below command and everythings was o.
%sql
Select
*
from db_xxxxx.t_fxxxxxxxxx
limit 10
Then I have updated some rows in above table. When I run above command again i have this error :
Error in SQL statement: SparkException: Job aborted due to stage failure: Task 3 in stage 2823.0 failed 4 times, most recent failure: Lost task 3.3 in stage 2823.0 (TID 158824, 10.11.49.6, executor 14): com.databricks.sql.io.FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART#storxfadev0501.dfs.core.windows.net/xsi-ed-faits/t_fait_xxxxxxxxxxx/_delta_log/00000000000000000022.json. It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:251)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:205)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:354)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:205)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
at org.apache.spark.scheduler.Task.run(Task.scala:112)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: HEAD https://storxfadev0501.dfs.core.windows.net/devdledxsi01/xsi-ed-faits/t_fait_photo_impact/_delta_log/00000000000000000022.json?timeout=90
StatusCode=404
StatusDescription=The specified path does not exist.
ErrorCode=
ErrorMessage=
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:912)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.open(AzureBlobFileSystem.java:169)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at com.databricks.spark.metrics.FileSystemWithMetrics.open(FileSystemWithMetrics.scala:282)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)
at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.<init>(HadoopFileLinesReader.scala:65)
at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.readFile(JsonDataSource.scala:134)
at org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:138)
at org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:136)
at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:147)
at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:134)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:235)
... 26 more
Caused by: HEAD https://storxfadev0501.dfs.core.windows.net/devdledxsi01/xsi-ed-faits/t_fait_photo_impact/_delta_log/00000000000000000022.json?timeout=90
StatusCode=404
StatusDescription=The specified path does not exist.
ErrorCode=
ErrorMessage=
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134)
at shaded.databricks.v20180920_b
This is expected behaviour when you update some rows in the table and immediately querying the table.
From the error message: It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
To resolve this issue, refresh all cached entries that are associated with the table.
REFRESH TABLE [db_name.]table_name
Refresh all cached entries associated with the table. If the table was previously cached, then it would be cached lazily the next time it is scanned.
In summary, you can either refresh the table (previous to execution ) name or restart the cluster
spark.sql("refresh TABLE schema.table")
It is possible the underlying files have been updated. You can
explicitly invalidate the cache in Spark by running 'REFRESH TABLE
tableName' command in SQL or by recreating the Dataset/DataFrame
involved. If Delta cache is stale or the underlying files have been
removed, you can invalidate Delta cache manually by restarting the
cluster.

Kafka Connect MySQL to Redshift - Dates Cause Error

I am getting this error when attempting to stream an RDS MySQL table into Redshift: Error converting data, invalid type for parameter
The problem field is a DATETIME in MySQL and timestamp without time zone in Redshift (same happens for timestamp with time zone). Note: pipeline was working fine until I populated the date field.
We are using Debezium as the Kafka Connect source for getting data from RDS into Kafka. And the JDBC sink connector with Redshift JDBC driver for the sink.
Also... I am able to get the data flowing if I make the Redshift field a varchar or a bigint. When I do this, I see that the data is coming across as a unix epoch integer in ms. But we'd really like a timestamp!
Error message in context:
2018-10-18 22:48:32,972 DEBUG || INSERT sql: INSERT INTO "funschema"."test_table"("user_id","subscription_code","source","receipt","starts_on") VALUES(?,?,?,?,?) [io.confluent.connect.jdbc.sink.BufferedRecords]
2018-10-18 22:48:32,987 WARN || Write of 28 records failed, remainingRetries=7 [io.confluent.connect.jdbc.sink.JdbcSinkTask]
java.sql.BatchUpdateException: [Amazon][JDBC](10120) Error converting data, invalid type for parameter: 5.
at com.amazon.jdbc.common.SStatement.createBatchUpdateException(Unknown Source)
at com.amazon.jdbc.common.SStatement.access$100(Unknown Source)
at com.amazon.jdbc.common.SStatement$BatchExecutionContext.createBatchUpdateException(Unknown Source)
at com.amazon.jdbc.common.SStatement$BatchExecutionContext.createResults(Unknown Source)
at com.amazon.jdbc.common.SStatement$BatchExecutionContext.doProcess(Unknown Source)
at com.amazon.jdbc.common.SStatement$BatchExecutionContext.processInt(Unknown Source)
at com.amazon.jdbc.common.SStatement.processBatchResults(Unknown Source)
at com.amazon.jdbc.common.SPreparedStatement.executeBatch(Unknown Source)
at io.confluent.connect.jdbc.sink.BufferedRecords.flush(BufferedRecords.java:138)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:66)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:75)
Thanks,
Tom

[Vertica][VJDBC](100172) One or more rows were rejected by the server

I got the following error when loading data from Impala to Vertica with Sqoop.
Error: java.io.IOException: Can't export data, please check failed map
task logs at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused
by: java.io.IOException: java.sql.BatchUpdateException:
[Vertica]VJDBC One or more rows were rejected by the server.
at
org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:233)
at
org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more Caused by: java.sql.BatchUpdateException:
[Vertica]VJDBC One or more rows were rejected by the server.
at com.vertica.jdbc.SStatement.processBatchResults(Unknown Source)
at com.vertica.jdbc.SPreparedStatement.executeBatch(Unknown Source)
at
org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:231)
And I was running the following command:
sudo -u impala sqoop export -Dsqoop.export.records.per.statement=xxx
--driver com.vertica.jdbc.Driver --connect jdbc:vertica://host:5433/db --username name --password pw --table table --export-dir /some/dir -m 1 --input-fields-terminated-by '\t' --input-lines-terminated-by '\n'
--batch
This error was not raised every time. I had several successful tests loading over 2 million rows of data. So I guess there might be some bad data that contains special characters in the rejected rows. This is very annoying because when this error raised, mapreduce job would rollback and retry. In this case, there would be lots of duplicate data in the target table.
Does anyone have idea if there is any sqoop export parameter that can be set to deal with special characters or if there is any way to skip the bad data, which means to disable rollback? Thanks!
This may not be just special characters. If you try to stuff 'abc' into a numeric field, for example, that row would get rejected. Even though you get this error, I believe it not until after the load and all data should be committed that could be committed (but I would verify that). If you isolate the "missing" rows you might be able to figure out what is wrong with the data or the field definition.
Common things to look for:
Stuffing character type data into numeric fields (maybe implicit conversions, or only show up when the values are non-NULL).
NULL values into NOT NULL fields
Counting characters and VARCHAR octets as equivalent. VARCHAR(x) represents octets, but a UTF-8 character can have multiple octets.
Similar to #3, strings too long to fit in designated fields.
In the driver, the batch inserts are being replaced with a COPY FROM STDIN statement. You might be able to find the statement in query_requests although I'm not sure it will help.
Sqoop doesn't really give you much opportunity to investigate this further (as far as I am aware, I checked the generic JDBC Loader). One could look at the return array for executeBatch() and tie this to your execution batch. Maybe modify the generic JDBC loader?
Hope this helps.

Hive unable to perform queries other than SELECT *

I am running hive on my system where I have successfully created a database and a table. I have loaded that table with a csv file which is located on my HDFS.
I am successfully able to describe the table in hive, seeing all of the columns that I intended to be created.
I am also successfully able to run the simple SELECT * FROM table; query which returns an enormous list of data.
My problem starts whenever I try to run a query that is any more complex than that. Specifically, when I try to run a query that is selecting a specific column name or selecting any aggregate of data. If I try anything else, I receive this error message after my map and reduce tasks have sat at 0% for a while.
Diagnostic Messages for this Task:
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:230)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:381)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:536)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NullPointerException
at org.ap
I have tried many different syntax techniques and performed numerous sanity checks to confirm that the table is actually there. What confuses me is that the SELECT * works while all other queries fail.
Any advice is appreciated.
Here is a query I ran with as many NULL checks as would allow: SELECT year FROM flights WHERE year != NULL AND length(year) > 0 AND year <> ''; This query still failed.
SELECT * doesn't invoke mapreduce jobs.
But any complex queries involve map reduce jobs.
Please check the MR job logs.
Also this can be a data issue, Data might be incompatible with the table schema.
Please check with fewer rows.
May be your input data consists any null values. Because,
if you use select all command that job will not enter into mapreduce phase.
if you select any specific column it will enter into mapreduce phase. so you may get this error.
What is happening here is that none of the queries involving mapreduce jobs are running.
The "select *" query doesn't invoke any mapreduce and just displays the data as it is. Please check your mapreduce logs and see if you can find something which is causing this.

Hibernate Persist for large field values gives ConstraintViolationException

I am facing an issue with persisting records for an Entity(YieldCurveArchive) with a field (reason) having a length equivalent to 2048 characters. Following are the scenarios and their results:
Upload new Entity with smaller reason fields: Works fine.
Upload Entity with reason fields
changed with larger (2048 characters) data set: Works fine. Again as
above. This is an update of the existing records.
Upload new Entity
with larger (2048 characters) reason fields: Fails.
I have also tried flushing the hibernate query buffer using entityManager.flush() but the above test results do not change.
I suspect there could be an issue with the buffering that Hibernate performs before actually doing a final insert into database:
Selection of the available records in database to compare with
existing dataset.
Insert being fired but only kept in the buffer of
hibernate to fire a bulk update.
Another select being fired for a
different client. Upon seeing another select, hibernate decides to
fire the insert into the database and then fails.
An excerpt of the trace from the failed logs:
2013-07-02 12:46:00,792 WARN [pool-1-thread-4]: org.hibernate.util.JDBCExceptionReporter - SQL Error: 1400, SQLState: 23000
2013-07-02 12:46:00,792 ERROR [pool-1-thread-4]: org.hibernate.util.JDBCExceptionReporter - ORA-01400: cannot insert NULL into ("CORE_TOTEM"."YC_MONTHLY_ARCHIVE"."PKEY")
2013-07-02 12:46:00,792 WARN [pool-1-thread-4]: org.hibernate.util.JDBCExceptionReporter - SQL Error: 1400, SQLState: 23000
2013-07-02 12:46:00,792 ERROR [pool-1-thread-4]: org.hibernate.util.JDBCExceptionReporter - ORA-01400: cannot insert NULL into ("CORE_TOTEM"."YC_MONTHLY_ARCHIVE"."PKEY")
2013-07-02 12:46:00,985 ERROR [pool-1-thread-4]: org.hibernate.event.def.AbstractFlushingEventListener - Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:94)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:275)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:64)
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:996)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1141)
at org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:65)
at markit.totem.dao.DaoHelper.list(DaoHelper.java:19)
at com.markit.totem.rates.yieldcurve.dao.YieldCurveArchiveDao.getArchivesMonthly(YieldCurveArchiveDao.java:172)
at com.markit.totem.rates.yieldcurve.dao.YieldCurveArchiveDao$$FastClassByCGLIB$$55ef1569.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:628)
at com.markit.totem.rates.yieldcurve.dao.YieldCurveMonthlyArchiveDao$$EnhancerByCGLIB$$6aa04fe7.getArchivesMonthly(<generated>)
at com.markit.totem.rates.yieldcurve.results.upload.monthly.YieldCurveMonthlyArchivePersister.getExisting(YieldCurveMonthlyArchivePersister.java:38)
at com.markit.totem.rates.yieldcurve.results.upload.YieldCurveArchivePersister.persist(YieldCurveArchivePersister.java:68)
at com.markit.totem.rates.yieldcurve.results.upload.YieldCurveArchivePersister$$FastClassByCGLIB$$ac0db3fb.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:628)
at com.markit.totem.rates.yieldcurve.results.upload.monthly.YieldCurveMonthlyArchivePersister$$EnhancerByCGLIB$$4b00cfa0.persist(<generated>)
at com.markit.totem.rates.yieldcurve.results.upload.YieldCurveResultsUploader.upload(YieldCurveResultsUploader.java:158)
at com.markit.totem.rates.yieldcurve.results.upload.YieldCurveResultsUploadTask.run(YieldCurveResultsUploadTask.java:53)
at com.markit.totem.workflow.WorkflowExecutor.executeWorkflowTask(WorkflowExecutor.java:258)
at com.markit.totem.workflow.WorkflowExecutor.executeSubWorkflow(WorkflowExecutor.java:227)
at com.markit.totem.workflow.WorkflowExecutor.access$000(WorkflowExecutor.java:17)
at com.markit.totem.workflow.WorkflowExecutor$1.run(WorkflowExecutor.java:72)
at markit.totem.dao.Transactionator.execute(Transactionator.java:19)
at markit.totem.dao.Transactionator$$FastClassByCGLIB$$c9204755.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:700)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:635)
at markit.totem.dao.Transactionator$$EnhancerByCGLIB$$4eec78e2.execute(<generated>)
at com.markit.totem.workflow.WorkflowExecutor.execute(WorkflowExecutor.java:83)
at com.markit.totem.workflow.WorkflowExecutor.execute(WorkflowExecutor.java:213)
at com.markit.totem.workflow.WorkflowManager$OldWorkflow.execute(WorkflowManager.java:218)
at com.markit.totem.workflow.WorkflowManager$1.run(WorkflowManager.java:119)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.sql.BatchUpdateException: ORA-01400: cannot insert NULL into ("CORE_TOTEM"."YC_MONTHLY_ARCHIVE"."PKEY")
at oracle.jdbc.driver.DatabaseError.throwBatchUpdateException(DatabaseError.java:367)
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:9055)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:70)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:268)
My initial thoughts were that it fails due to the encoding difference in database and default encoding from Hibernate.
Like underlying database uses Oracle and the NLS_CHARACTERSET is defined as:
NLS_CHARACTERSET AL32UTF8
Hence it stores, 1 character as 4 bytes and any special character as 6 bytes. My insert into the Oracle fails for 1001 but passes till i insert 1000 characters. However, if I do an update with more characters thats not the case, which makes it more confusing.
Any pointers would be of great help?

Resources