Connection Refused Exception while using Teradata Hadoop Connector - hadoop

I am currently using the free version of Teradata Hadoop connector teradata-connector 1.3.4 to load data to Teradata. I am using internal.fastload to load the data.
Database version is 14.10
jdbc driver version is 15.0
Sometimes I face Connection refused exception while running the job but this issue goes off while reseting the load job 2-3 times. Also this has nothing to do with the load on teradata database as the load is pretty normal. The exception which is thrown is below:
15/10/29 22:52:54 INFO mapreduce.Job: Running job: job_1445506804193_290389
com.teradata.connector.common.exception.ConnectorException: Internal fast load socket server time out
at com.teradata.connector.teradata.TeradataInternalFastloadOutputFormat$InternalFastloadCoordinator.beginLoading(TeradataInternalFastloadOutputFormat.java:642)
at com.teradata.connector.teradata.TeradataInternalFastloadOutputFormat$InternalFastloadCoordinator.run(TeradataInternalFastloadOutputFormat.java:503)
at java.lang.Thread.run(Thread.java:745)
15/10/29 23:39:29 INFO mapreduce.Job: Job job_1445506804193_290389 running in uber mode : false
15/10/29 23:39:29 INFO mapreduce.Job: map 0% reduce 0%
15/10/29 23:40:08 INFO mapreduce.Job: Task Id : attempt_1445506804193_290389_m_000001_0, Status : FAILED
Error: com.teradata.connector.common.exception.ConnectorException: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.<init>(Socket.java:434)
at java.net.Socket.<init>(Socket.java:211)
at com.teradata.connector.teradata.TeradataInternalFastloadOutputFormat.getRecordWriter(TeradataInternalFastloadOutputFormat.java:301)
at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.<init>(ConnectorOutputFormat.java:84)
at com.teradata.connector.common.ConnectorOutputFormat.getRecordWriter(ConnectorOutputFormat.java:33)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:624)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:744)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1591)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Any pointers in this regard will definitely help.
Thanks in advance.

Root cause: com.teradata.connector.common.exception.ConnectorException: Internal fast load socket server time out
Internal fast load server socket time out
When running export job using the "internal.fastload" method, the following error may occur: Internal fast load socket server time out
This error occurs because the number of available map tasks currently is less than the number of map tasks specified in the command line by parameter of "-nummappers".
This error can occur in the following conditions:
(1) There are some other map/reduce jobs running concurrently in the Hadoop cluster, so there are not enough resources to allocate specified map tasks for the export job.
(2) The maximum number of map tasks is smaller than existing map tasks added expected map tasks of the export jobs in the Hadoop cluster.
When the above error occurs, please try to increase the maximum number of map tasks of the Hadoop cluster, or decrease the number of map tasks for the export job
There is good trouble shooter PDF available #teradata
If you get any type of errors, have a look at above PDF and get it fixed.
Have a look at other map reduce properties if you have to fine tune them.

ravindra-babu's answer is correct since the answer is buried in the pdf docs. The support.teradata.com kb article KB0023556 also offers more details on the why.
Resolution
All mappers should run simultaneously. If they are not
running simultaneously, try to reduce the number of mappers in TDCH
job through -nummappers argument.
Resubmit the TDCH job after
changing the -nummappers
Honestly it's a very confusing error and could be displayed better.

Related

Hbase Bulk load - Map Reduce job failing

I have map reduce job for hbase bulk load. Job is converting data into Hfiles and loading into hbase but after certain map % job is failing. Below is the exception that I am getting.
Error: java.io.FileNotFoundException: /var/mapr/local/tm4/mapred/nodeManager/spill/job_1433110149357_0005/attempt_1433110149357_0005_m_000000_0/spill83.out.index
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800)
at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:74)
at org.apache.hadoop.mapred.MapRFsOutputBuffer.mergeParts(MapRFsOutputBuffer.java:1382)
at org.apache.hadoop.mapred.MapRFsOutputBuffer.flush(MapRFsOutputBuffer.java:1627)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:709)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Only thing that i noticed in job that for small set of data it is working fine but as data grows job starts failing.
Let me know if anyone has faced this issue.
Thanks
This was a bug in MapR. I got reply on MapR forum. If someone is facing similar issue then refere below link.
http://answers.mapr.com/questions/163440/hbase-bulk-load-map-reduce-job-failing-on-mapr.html

Unable to close file because the last block does not have enough number of replicas

From the error message it is quite obvious that, there was a problem in saving a replica of a particular block related to a file. The reason might be, there was a problem in accessing a data node to save a particular block(replica of a block).
Please refer below for the complete log:
I found another user "huasanyelao" - https://stackoverflow.com/users/987275/huasanyelao also had a similar exception/problem but the use case was different.
Now, how do we solve these kind of problems? I understand that there is no fixed solution to handle in all scenarios.
1. What is the immediate step I need to take to fix errors of this kind?
2. If there are jobs for which I'm not monitoring the log at that time. What approaches do I need to take to fix such issues.
P.S: Apart from fixing Network or Access Issues, what other approaches should I follow.
Error Log:
*15/04/10 11:21:13 INFO impl.TimelineClientImpl: Timeline service address: http://your-name-node/ws/v1/timeline/
15/04/10 11:21:14 INFO client.RMProxy: Connecting to ResourceManager at your-name-node/xxx.xx.xxx.xx:0000
15/04/10 11:21:34 WARN hdfs.DFSClient: DataStreamer Exception
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:29)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:512)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1516)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
15/04/10 11:21:40 INFO hdfs.DFSClient: Could not complete /user/xxxxx/.staging/job_11111111111_1212/job.jar retrying...
15/04/10 11:21:46 INFO hdfs.DFSClient: Could not complete /user/xxxxx/.staging/job_11111111111_1212/job.jar retrying...
15/04/10 11:21:59 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/xxxxx/.staging/job_11111111111_1212
Error occured in MapReduce process:
java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1903)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1871)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1836)
at org.apache.hadoop.mapreduce.JobSubmitter.copyJar(JobSubmitter.java:286)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at com.xxx.xxx.xxxx.driver.GenerateMyFormat.runMyExtract(GenerateMyFormat.java:222)
at com.xxx.xxx.xxxx.driver.GenerateMyFormat.run(GenerateMyFormat.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.xxx.xxx.xxxx.driver.GenerateMyFormat.main(GenerateMyFormat.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)*
We had similar issue. Its primarily attributed to dfs.namenode.handler.count was not enough. Increasing that may help in some small clusters but it is because of DOS issue where nameNode couldn't handle no. of connections or RPC call and your Pending deletion blocks will grow humongous. Validate the hdfs audit logs and see any mass deletion happening or other hdfs actions and match with the jobs which might be overwhelming NN . Stoping those tasks will help HDFS to recover.

Cascading 2.0.0 job failing on hadoop FileNotFoundException job.split

When i run my job on a larger dataset, lots of mappers / reducers fail causing the whole job to crash. Here's the error i see on many mappers:
java.io.FileNotFoundException: File does not exist: /mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201405050818_0001/job.split
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1933)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1924)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:608)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:429)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Has anybody been able to solve this problem ? I see another human experiencing the same pain as me (here), sadly he could not be saved in time.
After hours of debugging, I found absolutely nothing useful in hadoop logs (as usual). Then i tried the following changes:
Increasing the cluster size to 10
Increase the failure limits :
mapred.map.max.attempts=20
mapred.reduce.max.attempts=20
mapred.max.tracker.failures=20
mapred.max.map.failures.percent=20
mapred.max.reduce.failures.percent=20
I was able to run my cascading job on large amounts of data subsequently. It seems like a problem caused by cascading.

Hadoop Mapper running slow

I am trying to run a job with both mappers and reducers but the mappers are running slow..
If for the same input i disable reducers, the mappers finish in 3 mins
while for mapper-reducer jobs, even at the end of 30 mins the Mappers are not finished.
I am using hadoop 1.0.3 ..I tried both with and without compression of map output. I removed the older version of hadoop 0.20.203 and reinstalled everything from scratch for 1.0.3
Also the Jobtracker logs are filled with:
2012-10-03 10:26:20,138 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54311: readAndProcess threw exception java.lang.RuntimeException: readObject can't find class . Count of bytes read: 0
java.lang.RuntimeException: readObject can't find class
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:185)
at org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:102)
at org.apache.hadoop.ipc.Server$Connection.processData(Server.java:1303)
at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1282)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.ClassNotFoundException:
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:183)
Can anyone tell what may be wrong
if your mapper is getting completed in 3 mins. then its not slow with batch processing nature. Yes with your used version of mapreduce you need to make sure that you are using correct no of reducers. if you have cluster size is X then try to use number of reducer as X-1 . See if this helps or not

Does hadoop really handle datanode failure?

In our hadoop setup, when a datanode crashes (or) hadoop doesn't respond on the datanode, reduce task fails unable to read from the failed node(exception below). I thought hadoop handles data node failures and that is the main purpose of creating hadoop. Is anybody facing similar problem with their clusters? If you have a solution, please let me know.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1547)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1483)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1391)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1302)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1234)
When a task of a mapreduce job fails Hadoop will retry it on another node
You can take a look at the jobtracker (:50030/jobtracker.jsp) and see the blacklisted nodes(nodes that have problems with their keep-alive) or drill to a running/completed job and see the number of killed tasks/retries as well as deadnodes, decommisioned nodes etc.
I've had a similar problem on a cluster where executing tasks failed on some nodes due to "out of memory" problems. They were definitely restarted on other nodes. The computation eventually failed because it was badly designed, causing all nodes to run out of memory, and eventually the threshold for cancelling the job was reached.

Resources