Error with reverse scan in HBase - hadoop

I'm getting an exception while performing reverse scan on a HBase table. There is some problem with seeking to previous row. Any suggestions will be highly appreciated. Error log is following:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Mon May 02 10:59:29 CEST 2016, RpcRetryingCaller{globalStartTime=1462179569123, pause=100, retries=35}, java.io.IOException: java.io.IOException: Could not seekToPreviousRow StoreFileScanner[HFileScanner for reader reader=file:/data/hbase-1.1.2/data/hbase/data/default/table/c8cdadcd1247e04720972ab5a25597a7/outlinks/3eac358ffb9d43018221fbddf9274ffd, compression=none, cacheConf=blockCache=LruBlockCache{blockCount=149348, currentSize=9919772624, freeSize=2866589744, maxSize=12786362368, heapSize=9919772624, minSize=12147044352, minFactor=0.95, multiSize=6073522176, multiFactor=0.5, singleSize=3036761088, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=Danmark2010-01-26T21:02:50Z/outlinks:.dk/1459765153334/Put, lastKey=Motorveje i Danmark2010-08-24T14:03:07Z/outlinks:\xC3\x98ver\xC3\xB8d/1459766037971/Put, avgKeyLen=70, avgValueLen=20, entries=49195292, length=4896832843, cur=Hj\xC3\xA6lp:Sandkassen2010-11-02T21:40:44Z/outlinks:Adriaterhav/1459771842796/Put/vlen=20/seqid=0] to key Hj\xC3\xA6lp:Sandkassen2010-11-02T21:34:14Z/outlinks:\xC4\x8Crnomelj/1459771842779/Put/vlen=20/seqid=0
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:457)
at org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:596)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: On-disk size without header provided is 196736, but block header contains 65582. Block offset: -1, data starts with: DATABLK*\x00\x01\x00.\x00\x01\x00\x1A\x00\x00\x00\x00\x8D\xA08\xE2\x01\x00\x00#\x00\x00\x01\x00
at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:500)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:85)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1625)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:673)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:646)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:425)
... 13 more
and source code is:
Table WHtable=connection.getTable(TableName.valueOf("table"));
Scan lowerClosestRowScan=new Scan();
lowerClosestRowScan.addFamily(Bytes.toBytes("outlinks"));
lowerClosestRowScan.setStartRow(Bytes.toBytes("A row"));
lowerClosestRowScan.setReversed(true);//to fetch last
ResultScanner lowerClosestRowScanner=WHtable.getScanner(lowerClosestRowScan);

Related

CommunicationsException: Communications link failure Caused by: org.hibernate.exception.JDBCConnectionException: could not extract ResultSet

When I am running multiple method concurrently then I am facing Communications link failure with JDBCConnectionException.
Bellow code is causing this issue.
Stream<Supplier<List<?>>> tasks = Stream.of(() -> dashboardResultSDHDao.getMarketResult(),
() -> dashboardResultSDHDao.getBatchResult(), () -> dashboardResultSDHDao.getModalityChatList(),
() -> dashboardResultSDHDao.getCoadingTimeList(), () -> dashboardResultSDHDao.getCoderPerformersList());
lists = tasks
// Supply all the tasks for execution and collect CompletableFutures
.map(CompletableFuture::supplyAsync).collect(Collectors.toList())
// Join all the CompletableFutures to gather the results
.stream().map(CompletableFuture::join).collect(Collectors.toList());
Please let me know how we can resolve it.
[1;31mERROR[0;39m o.h.e.jdbc.spi.SqlExceptionHelper.logExceptions - Communications link failure
The last packet successfully received from the server was 420,608 milliseconds ago. The last packet sent successfully to the server was 19,142 milliseconds ago.
03-02-2018 19:32:02.277 [35m[http-nio-8099-exec-9][0;39m
[1;31mERROR[0;39m c.o.oscar.dao.impl.AccuracyDaoImpl.getAccuracyResult - Exception :org.hibernate.exception.JDBCConnectionException: could not extract ResultSet
javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: could not extract ResultSet
at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1692)
at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1602)
at org.hibernate.jpa.internal.QueryImpl.getResultList(QueryImpl.java:492)
at com.omega.oscar.dao.impl.AccuracyDaoImpl.getAccuracyResult(AccuracyDaoImpl.java:98)
at com.omega.oscar.dao.impl.AccuracyDaoImpl$$FastClassBySpringCGLIB$$9b8ed9bf.invoke(<generated>)
at java.lang.Thread.run(Unknown Source)
Caused by: org.hibernate.exception.JDBCConnectionException: could not extract ResultSet
at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:115)
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:70)
... 177 common frames omitted
Caused by: java.net.SocketTimeoutException: Read timed out

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable

hive> select * from tweets_text ORDER BY created_time ASC LIMIT 10;
URL:
http://standbynamenode-zat6kzjl.canopy.com:8088/taskdetails.jsp?jobid=job_1483098353987_0020&tipid=task_1483098353987_0020_m_000000
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":814718430955900928,"created_unixtime":1483078985333,"created_time":"Fri Dec 30 06:23:05 +0000 2016","lang":"it","displayname":"VentagliP","time_zone":"Rome","msg":"RT cristinafalasch Mi fa male al cuoreche la virtù non possa viver liberadal morso dell’invidia
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":814718430955900928,"created_unixtime":1483078985333,"created_time":"Fri Dec 30 06:23:05 +0000 2016","lang":"it","displayname":"VentagliP","time_zone":"Rome","msg":"RT cristinafalasch Mi fa male al cuoreche la virtù non possa viver liberadal morso dell’invidia
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:543)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Unterminated string at 272 [character 273 line 1]
at org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:412)
at org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:174)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:143)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:107)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 9 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
First check hive-site.xml whether you updated with all the required properties and goto code which is creating issue. The url which is given is not opening for me.

Apache Phoenix UPSERT timeouts on large data

Exception in thread "main"
org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions:
Mon May 09 20:45:17 PDT 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=214280: row '�.�A`' on tabl..
Any suggestions?

Hive compression in ORC using Lz4

I am trying to compress RC and ORC file using LZ4. I have installed Hadoop-2.7.1 and Hive-1.2.1. In case of LZ4, I can compress RC file without any problem. But, when I try to load data in ORC file using LZ4, it is not working. I have created ORC table like below:
CREATE TABLE FINANCE_orc(
PERMNO STRING,
DATE STRING,
CUSIP STRING,
NCUSIP STRING,
COMNAM STRING,
TICKET STRING,
PERMCO STRING,
SHRCD STRING,
EXCHCD STRING,
HEXCD STRING,
SICCD STRING,
HSLCCD STRING,
PRC STRING,
VOL STRING,
RET STRING,
SHROUT STRING,
DLRET STRING,
VWRETD STRING,
EWRETD STRING,
SPRTRN STRING)
STORED AS ORC tblproperties ("orc.compress"="Lz4");
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.type = BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
INSERT OVERWRITE table finance_orc select * from finance;
But at the time of loading data it gives the following error:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:622)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:566)
... 16 more
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
at java.lang.Enum.valueOf(Enum.java:236)
at org.apache.hadoop.hive.ql.io.orc.CompressionKind.valueOf(CompressionKind.java:25)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:143)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:203)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:52)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
... 18 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I have used Snappy and Zlib with the same command and it is working fine. But problem is only with LZ4. I do not know the reason why?
The compression we can use in addition to ORC columnar compression is either of one of NONE, ZLIB, SNAPPY.
The default compression codec is ZLIB.
The compression codecs other than above are not allowed.
In general to get an idea about error, read the error log completely to find the issue to some extent. The error log said -
"org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4"
Now you can manually substitute the jar file for lz4 on official spark repo.

ERROR 2103: doing work on Longs

I have data
store trn_date dept_id sale_amt
1 2014-12-15 101 10007655
1 2014-12-15 101 10007654
1 2014-12-15 101 10007544
6 2014-12-15 104 100086544
8 2014-12-14 101 1000000
8 2014-12-15 101 100865761
I'm trying to aggregate the data using below code -
Loading the data (tried both the way using HCatLoader() and using PigStorage())
data = LOAD 'data' USING org.apache.hcatalog.pig.HCatLoader();
group_table = GROUP data BY (store, tran_date, dept_id);
group_gen = FOREACH grp_table GENERATE
FLATTEN(group) AS (store, tran_date, dept_id),
SUM(table.sale_amt) AS tota_sale_amt;
Below is the Error stack Trace which I'm getting while running the job
================================================================================
Pig Stack Trace
---------------
ERROR 2103: Problem doing work on Longs
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grouped_all: Local Rearrange[tuple]{tuple}(false) - scope-1317 Operator Key: scope-1317): org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:183)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:84)
at org.apache.pig.builtin.AlgebraicLongMathBase$Intermediate.exec(AlgebraicLongMathBase.java:108)
at org.apache.pig.builtin.AlgebraicLongMathBase$Intermediate.exec(AlgebraicLongMathBase.java:102)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:369)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:333)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:77)
================================================================================
As i was looking for solution, many of said it is due to loading the data using HCatalog Loader. so i have tried loading the data using "PigStorage()".
still getting the same error.
This is may be because of the way you are storing data in hive. If any aggregation is going to happen on any column do mention it's data type integer or numeric.
Basically each aggregation function returns data with it's default data type,
Like -
AVG returns DOUBLE
SUM returns DOUBLE
COUNT returns LONG
I don't think so this is the issue while storing it into hive because you already tried PigStore() it means, it's just data type issue while you are passing it to aggregation.
Try to change the data type before passing it to aggregation and try.

Resources