Cassandra 1.2.5 - invalid UTF8 bytes - utf-8

I am reading and writing massive data into/from a CF.
After a while, I get the following error:
INFO [MemoryMeter:1] 2013-07-03 09:41:34,438 Memtable.java (line 238) CFS(Keyspace='amlear', ColumnFamily='tmp2_rpt_rptStats_popkeywrd_sp_G') liveRatio is 4.12192 (just-counted was 4.12192). calculation took 168ms for 2048 columns
ERROR [ReadStage:706] 2013-07-03 09:41:56,187 CassandraDaemon.java (line 175) Exception in thread Thread[ReadStage:706,5,main]
java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes 37464646464646464646464638333943c08074656c65666f6e6f73206170617261746f7320792061636365736f72696f73
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1582)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes 37464646464646464646464638333943c08074656c65666f6e6f73206170617261746f7320792061636365736f72696f73
at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:54)
at org.apache.cassandra.dht.AbstractBounds.format(AbstractBounds.java:103)
at org.apache.cassandra.dht.AbstractBounds.getString(AbstractBounds.java:96)
at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1387)
at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1443)
at org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:46)
at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1076)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
... 3 more
NOTE, I recently upgraded from cassandra 1.1.4 to cassandra 1.2.5 (I don't know if it's relevant or not)
java version: 1.6.0_32
Does anyone have any idea how to solve this?

Caused by: org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes 37464646464646464646464638333943c08074656c65666f6e6f73206170617261746f7320792061636365736f72696f73
You have invalid UTF-8 bytes in the middle of this.
Specifically the 2-byte sequence c080 starting at the 17th byte is invalid. Not sure what character was intended, probably the NUL character (which should just be 00 in UTF-8). The first 2-byte sequence in UTF-8 is c280, corresponding to Unicode U+0080.
Broken UTF-8 encoder?

Related

Log4j2 encoding issue

When I try to run Elasticsearch on Windows 10 as main language is English, everything works fine. But if I change the main language as Turkish, I get error messages as:
2018-07-26 14:42:39,485 main ERROR Unable to locate plugin type for IfFileName
2018-07-26 14:42:39,633 main ERROR Unable to locate plugin for IfAccumulatedFileSize
2018-07-26 14:42:39,634 main ERROR Unable to locate plugin for IfFileName
2018-07-26 14:42:39,637 main ERROR Unable to invoke factory method in class org.apache.logging.log4j.core.appender.rolling.action.DeleteAction for element Delete: java.lang.NullPointerException java.lang.NullPointerException
at org.apache.logging.log4j.core.config.plugins.visitors.PluginElementVisitor.findNamedNode(PluginElementVisitor.java:103)
at org.apache.logging.log4j.core.config.plugins.visitors.PluginElementVisitor.visit(PluginElementVisitor.java:87)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:248)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:958)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:898)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:890)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:890)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:890)
at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:513)
at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:237)
at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:249)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:545)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:261)
at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:163)
at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:119)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:291)
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121)
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85)
2018-07-26 14:42:39,645 main ERROR Null object returned for Delete in DefaultRolloverStrategy.
So it seem like a charset problem. The file is encoded as UTF-8, I check it with Notepad++. Elasticsearch has JVM option -Dfile.encoding=UTF-8. I double checked the log4j2.properties file and IfFileName has no space after it.
And if I change IfFileName as ıfFileName (which ı is a Turkish character - lower I) error becomes:
2018-07-26 14:54:25,819 main ERROR Unable to locate plugin type for ıfFileName
Does anyone have an idea about how to fix that?
Adding -Duser.language=en JVM parameter fixed the problem.
I had the same problem but didn't know where to add the -Duser.language=en. However, I found it out it is under the sonar.properties, the line where sonar.search.javaAdditionalOpts= is located remove the # at the begining and write as sonar.search.javaAdditionalOpts=-Duser.language=en and save the file.
This is a bug in Log4j2, which uses String#toLowerCase() without a locale parameter: in the Turkish locale IfFileName is lowercased as ıffilename (with a dotless i). I have reported this as GH issue #1281.
Until this is fixed you can write plugin types in all lowercase (English) letters: e.g. iffilename instead of IfFileName.

Cache creation fails EhCache on IBM z/OS USS

I am trying to create a cache that could write entries to disk. It runs fine in my tests on Windows but when I deploy it on IBM z/OS USS, I end up in the following error. I have 0777 on the directory and there is enough space available on the disk. df -k gives me 405591/901440 for Available/Total. Any insights to where I should be looking to diagnose would be helpful. Following is my cache configuration-
CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder().with(new CacheManagerPersistenceConfiguration(new File(CACHE_DIR, "DictionaryCache")))
.withCache("dictionaryCache", CacheConfigurationBuilder.newCacheConfigurationBuilder(Integer.class, KeywordDictionary.class,
ResourcePoolsBuilder.newResourcePoolsBuilder()./*offheap(200, MemoryUnit.MB).*/disk(200, MemoryUnit.MB).heap(20, EntryUnit.ENTRIES))
.build()).build(true);
Caused by: java.lang.IllegalStateException: Cache 'dictionaryCache' creation in EhcacheManager failed.
at org.ehcache.core.EhcacheManager.createCache(EhcacheManager.java:287) ~[Classification-Engine-Scan-Job-2.0-SNAPSHOT.jar:na]
at org.ehcache.core.EhcacheManager.init(EhcacheManager.java:566) ~[Classification-Engine-Scan-Job-2.0-SNAPSHOT.jar:na]
... 19 common frames omitted
Caused by: org.ehcache.StateTransitionException: Initial table allocation failed.
Initial Table Size (slots) : 64
Allocation Will Require : 1KB
Table Page Source : org.terracotta.offheapstore.disk.paging.MappedPageSource#e88c7380
at org.ehcache.core.StatusTransitioner$Transition.succeeded(StatusTransitioner.java:209) ~[Classification-Engine-Scan-Job-2.0-SNAPSHOT.jar:na]
at org.ehcache.core.Ehcache.init(Ehcache.java:567) ~[Classification-Engine-Scan-Job-2.0-SNAPSHOT.jar:na]
at org.ehcache.core.EhcacheManager.createCache(EhcacheManager.java:260) ~[Classification-Engine-Scan-Job-2.0-SNAPSHOT.jar:na]
... 20 common frames omitted

HIVE - ORC read Issue with NULL Decimal Values - java.io.EOFException: Reading BigInteger past EOF

I encountered an issue around HIVE when loading an ORC external table with NULLs inside a column that was defined as DECIMAL(31,8). It looks like hive is unable to read the ORC file after loading and can no longer view the records with a NULL inside that field. Other records in the same ORC file can be read fine.
This has only occurred recently and we have made no changes to our HIVE version. Surprisingly previous ORC files that have been loaded into the same table that have NULLs in the DECIMAL field is queriable without issue.
We are using HIVE 1.2.1. The full stack trace spat out by HIVE is below, I've replaced the actual hdfs location with
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.io.IOException: Error reading file: <hdfs location>
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:685)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException: Error reading file: <hdfs location>
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:347)
... 13 more
Caused by: java.io.IOException: Error reading file: <hdfs location>
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1051)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1235)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1219)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1151)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1137)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:474)
... 17 more
Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed stream Stream for column 6 kind DATA position: 201 length: 201 range: 0 offset: 289 limit: 289 range 0 = 0 to 201 uncompressed: 362 to 362
at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1044)
... 24 more
Set this in your code hive.fetch.task.conversion=none

Unable to initialize any output collector in CDH5.3

15/05/24 06:11:40 INFO mapreduce.Job: Task Id : attempt_1432456238397_0004_m_000000_0, Status : FAILED
Error: java.io.IOException: Unable to initialize any output collector
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:439)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
I am using CDH 5.3 cloudera quickstart, I wrote MapReduce Program. When i run that on shell i getting above exception.
Can any one please help me on this, how to resolve
The error "Unable to initialize any output collector" indicates that the job failed to start the container's, there can be multiple reasons for the same. However, one must review the container logs at hdfs to identify the cause the error.
In this specific instance, the value of mapreduce.task.io.sort.mb value was entered greater than 2047 MB, however the maximum value which it allows is 2047 MB, thus anything above its causes the jobs to fail marking the value provided as Invalid.
Solution:
Set the value of mapreduce.task.io.sort.mb < 2048MB
Reference:
https://support.pivotal.io/hc/en-us/articles/205649987-Map-Reduce-job-failed-with-Unable-to-initialize-any-output-collector-
CDH5.2: MR, Unable to initialize any output collector
https://community.cloudera.com/t5/Storage-Random-Access-HDFS/HBase-MapReduce-Job-Error-java-io-IOException-Unable-to/td-p/23786

SEVERE error writing to S3 backup

I'm running OpsCenter 5.1.1 with Datastax Enterprise 4.5.1. It's a 3-node cluster on AWS and I'm backing up to S3 (still...) I've started seeing a new error. I think this is a different error than any I've posted b4.
$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8.39 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
I am seeing this error in the agent.log file
node1_agent.log: SEVERE: error after writing 15736832/16777216 bytes to https://cassandra-dev-bkup.s3.amazonaws.com/snapshots/407bb4b1-5c91-43fe-9d4f-767115668037/sstables/1430904167-reporting_test-transaction_lookup-jb-288-Index.db?partNumber=2&uploadId=.MA3X4RYssg7xL_Hr7Msgze.J4exDq9zZ_0Y7qEj9gZhJ570j73kZNr5_nbxactmPMJeKf0XyZfEC0KAplWOz9lpyRCtNeeDCvCmtEXDchH8F1J2c57aq4MrxfBcyiZr
java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3192)
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3175)
at com.google.common.io.CountingOutputStream.write(CountingOutputStream.java:53)
at com.google.common.io.ByteStreams.copy(ByteStreams.java:179)
at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.writePayloadToConnection(JavaUrlHttpCommandExecutorService.java:308)
at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:192)
at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:72)
at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:95)
at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.invoke(InvokeSyncToAsyncHttpMethod.java:128)
at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.apply(InvokeSyncToAsyncHttpMethod.java:94)
at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.apply(InvokeSyncToAsyncHttpMethod.java:55)
at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156)
at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
at com.sun.proxy.$Proxy48.uploadPart(Unknown Source)
at org.jclouds.aws.s3.blobstore.strategy.internal.SequentialMultipartUploadStrategy.prepareUploadPart(SequentialMultipartUploadStrategy.java:111)
at org.jclouds.aws.s3.blobstore.strategy.internal.SequentialMultipartUploadStrategy.execute(SequentialMultipartUploadStrategy.java:93)
at org.jclouds.aws.s3.blobstore.AWSS3BlobStore.putBlob(AWSS3BlobStore.java:89)
at org.jclouds.blobstore2$put_blob.doInvoke(blobstore2.clj:246)
at clojure.lang.RestFn.invoke(RestFn.java:494)
at opsagent.backups.destinations$create_blob$fn__12007.invoke(destinations.clj:69)
at opsagent.backups.destinations$create_blob.invoke(destinations.clj:64)
at opsagent.backups.destinations$fn__12170.invoke(destinations.clj:192)
at opsagent.backups.destinations$fn__11799$G__11792__11810.invoke(destinations.clj:24)
at opsagent.backups.staging$start_staging_BANG_$fn__12338$state_machine__7576__auto____12339$fn__12344$fn__12375.invoke(staging.clj:61)
at opsagent.backups.staging$start_staging_BANG_$fn__12338$state_machine__7576__auto____12339$fn__12344.invoke(staging.clj:59)
at opsagent.backups.staging$start_staging_BANG_$fn__12338$state_machine__7576__auto____12339.invoke(staging.clj:56)
at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944)
at clojure.core.async.impl.ioc_macros$take_BANG_$fn__7592.invoke(ioc_macros.clj:953)
at clojure.core.async.impl.channels.ManyToManyChannel$fn__4097.invoke(channels.clj:102)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
TL;DR -
Your SSTable which is 38866048 bytes, is both on your filesystem and on S3. This means the file has transferred over and you are in good shape. No need to worry about this error (though I opened an internal ticket to handle this kind of exception rather than throw a dump).
Details - A summary of what I suspect happened
1) There was a file transfer error when you reached 15736832 out of the 16777216 byte slice of the sstable.
2) At this point OpsCenter did not finish transferring the table or leave a partial version in s3
3) Another backup attempt later on moved the sstable with no error and a valid backup exists.

Resources