Error in Accumulo's tablet server when scanning for data - hadoop

I have a bunch of tables in Accumulo with one master and 2 tablet servers containing a bunch of tables storing millions of records. The problem is that whenever I scan the tables to get a few records out, the tablet server logs keep throwing this error
2015-11-12 04:38:56,107 [hdfs.DFSClient] WARN : Failed to connect to /192.168.250.12:50010 for block, add to deadNodes and continue. java.io.IOException: Got error, status message opReadBlock BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 received exception java.io.IOException: Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
java.io.IOException: Got error, status message opReadBlock BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 received exception java.io.IOException: Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.accumulo.core.file.rfile.bcfile.Utils$Version.<init>(Utils.java:264)
at org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:823)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:843)
at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69)
at org.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compactor.java:279)
at org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:322)
at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214)
at org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1976)
at org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2093)
at org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)
I think it is more of a HDFS related issue as opposed to an Accumulo one, so I checked the logs of the datanode and found the same message,
Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
But as INFO in the logs. What I don't understand is that why am I getting this error.
I can see that the pool name of the file (BP-1881591466-192.168.1.111-1438767154643) that I am trying to access contains a IP address (192.168.1.111) which does not match the IP address of any of the servers (self and remote). Actually, 192.168.1.111 was the old IP address of the Hadoop Master server, but I had changed it. I use domain names instead of IP addresses so the only place where I made the changes were in the host files of the machines in the cluster. None of the Hadoop/Accumulo configurations use IP addresses. Does anyone know what the issue is here? I have spent days on it and still am not able to figure it out.

The error you are receiving indicates that Accumulo cannot read part of one of its files from HDFS. The NameNode is reporting that a block is located on a particular DataNode (in your case, 192.168.250.12). However, when Accumulo attempts to read from that DataNode, it fails.
This likely indicates a corrupt block in HDFS, or a temporary network issue. You can try to run hadoop fsck / (the exact command may vary, depending on version) to perform a health check of HDFS.
Also, the IP address mismatch in the DataNode appears to indicate that the DataNode is confused about the HDFS pool it is a part of. You should restart that DataNode after double-checking its configuration, DNS, and /etc/hosts for any anomolies.

Related

Reg: database is not starting up an error

getting below error while starting the database:-
startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/mis/PARAMETERFILE/spfile.276.967375255'
ORA-17503: ksfdopn:10 Failed to open file +DATA/mis/PARAMETERFILE/spfile.276.967375255
ORA-04031: unable to allocate 56 bytes of shared memory ("shared pool","unknown object","KKSSP^24","kglseshtSegs")
Your database cannot find the SPFILE (newer init.ora) within ASM with the actual system parameters or has no permissions to access it.
Either your Grid Infrastructure stack or the dbs/spfile.ora is pointing to the wrong file.
To find out what the grid infrastructure stack is using, run "srvctl" which should display the parameterfile name the database should be using
srvctl config database -d <dbname>
...
Spfile: +DATA/<dbname>/PARAMETERFILE/spfile.269.1066152225
...
Then check (as the grid user), if the file indeed is not visible (by using asmcmd):
asmcmd
ASMCMD> ls +DATA/<dbname>/PARAMETERFILE/
spfile.269.1066152225
If the name is different, then you got the issue... (and you have to point to the correct file).
If the name is correct, then it could be wrong permissions on the oracle executable(s) (check My Oracle Support):
RAC Database Can't Start: ORA-01565, ORA-17503: ksfdopn:10 Failed to open file +DATA/BPBL/spfileBPBL.ora (Doc ID 2316088.1)

DynamoDBStorageHandler Hive connector

when I run this command from Hive shell in our EMR cluster:
CREATE EXTERNAL TABLE my_db.my_table
(col1 string, ...)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "table_name",
"dynamodb.column.mapping" = "col1:col1 ... "
);
I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.net.ConnectTimeoutException Call From ip-xx-xx-xx-xxx.ec2.internal/xx.xx.xx.xxx to ip-yy-yy-yy-yyy.ec2.internal:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=ip-yy-yy-yy-yyy.ec2.internal/yy.yy.yy.yyy:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout)
The EMR cluster is in VPC.
I Tried editing the Inbound/Outbound rules of the security group of the master node, so far with no success.
Thanks, Michael
AWS Support were able to assist me: the problem was that the database location in Glue was pointing to old HDFS address ip-yy-yy-yy-yyy.ec2.internal (different then xx.xx.xx.xxx), according to the master's node of the previous cluster. I changed to location to point to S3 and the problem was resolved.

org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 21

I have yarn cluster with spark(1.6.1), hdfs and hive(2.1). My workflows worked fine for few months till this day (without any changes in code / on environments). I started to get errors like this:
org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 21
Serialization trace:
outputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:238)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:226)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:745)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:49)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:318)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Using hive i can do simple selects, but every other operation which needs spark ends with Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3) in console, and error above in yarn logs.
Now my every hive database is paralyzed (i have few). I was trying to solve this problem whole day, but couldnt do antything (hive restart, yarn node's restarts, changing yarn master).
What do you think causes the problem and how can it be solved?
I figured it out.
After restarting hive-server2 for small period of time instead of getting error: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 26 i got error: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hadoop.hive.ql.io.RCFileOutputFormat. With second form it was obvious, that spark executed on node's didn't have some jars on classpath. I don't know the reason, why spark in one moment was unable to load these jars, but after copying them manually to his lib folder on every node and restarting node everything went back to normal.

Talend workflow for importing multiple gzip file un-archiving and creating calculated field

I want to 1)read multiple gzip files from a path, 2)un-archive and 3)create a calculated field. So far i have been successful in doing 1 and 2. For 3, i thought that tMap will do the needful, however don't know why i am unable to connect the un-archive component with tMap.
Edit1:
I don't know why tdelemited and tMap have the error message showing?
below is the message i got
Starting job Migration_1 at 09:36 04/04/2017.
[statistics] connecting to socket on port 3336
[statistics] connected
[statistics] disconnected
Job Migration_1 ended at 09:36 04/04/2017. [exit code=0]
Edit2: i tried with all suggested steps, yet it does not give me the required output and to my surprise there is no error message in the log to debug anything.
Starting job Migration_1 at 12:36 04/04/2017.
[statistics] connecting to socket on port 3463
[statistics] connected
[statistics] disconnected
Job Migration_1 ended at 12:36 04/04/2017. [exit code=0]
tFileUnarchive will just unarchive the zip files, but you will still have to read the files contained in these zips. tFileUnarchive component does not provide this reading part.
After the tFileList-->tFileUnarchive subjob, you should have a file-reading subjob, such as :
tFileList--iterate-->tFileInput*-->tMap
tFileList should be set to read the repository where you extracted the gzip files.

Hive Browser Throwing Error

I am trying to put some basic query in hive editor in hue browser , but it is returning the following error whereas my Hivecli works fine and able to execute queries. Could someone help me?
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=1,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='r\t\x80\xac\x1a\xa0K\xf8\xa4\xa0\x85?\x03!\x88\xa9',
guid='\x852\x0c\x87b\x7fJ\xe2\x9f\xee\x00\xc9\xeeo\x06\xbc')),
orientation=4, maxRows=-1):
TFetchResultsResp(status=TStatus(errorCode=0, errorMessage="Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]",
sqlState=None,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]:24:23",
'org.apache.hive.service.cli.operation.OperationManager:getOperationLogRowSet:OperationManager.java:229',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:687',
'sun.reflect.GeneratedMethodAccessor14:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:606',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:415',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy19:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745'], statusCode=3), results=None,
hasMoreRows=None)
This error could be either due to HiveServer2 not running or Hue does not have access to hive_conf_dir.
Check whether the HiveServer2 has been started and is running. It uses the port 10000 by default.
netstat -ntpl | grep 10000
If it is not running, start the HiveServer2
$HIVE_HOME/bin/hiveserver2
Also check the Hue configuration file hue.ini. The hive_conf_dir property must be set under [beeswax] section. If not set, add this property under [beeswax]
hive_conf_dir=$HIVE_HOME/conf
Restart supervisor after making these changes.

Resources