Cannot connect to Zookeeper when running a mapreduce job

Cannot connect to Zookeeper when running a mapreduce job - hadoop

I am running a map reduce job using an Accumulo table as input and storing the data in another table in Accumulo. This is the run method
public int run(String[] args) throws Exception {
Opts opts = new Opts();
opts.parseArgs(PivotTable.class.getName(), args);
Configuration conf = getConf();
conf.set("formula", opts.formula);
Job job = Job.getInstance(conf);
job.setJobName("Pivot Table Generation");
job.setJarByClass(PivotTable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(PivotTableMapper.class);
job.setCombinerClass(PivotTableCombiber.class);
job.setReducerClass(PivotTableReducer.class);
AccumuloInputFormat.setInputTableName(job, opts.dataTable);
BatchWriterConfig bwConfig = new BatchWriterConfig();
AccumuloOutputFormat.setBatchWriterOptions(job, bwConfig);
AccumuloOutputFormat.setDefaultTableName(job, opts.pivotTable);
AccumuloOutputFormat.setCreateTables(job, true);
job.setInputFormatClass(AccumuloInputFormat.class);
job.setOutputFormatClass(AccumuloOutputFormat.class);
opts.setAccumuloConfigs(job);
return job.waitForCompletion(true) ? 0 : 1;
}
The problem though is that when I run the job, I get an exception that says that it cannot connect to zookeeper.
Error: java.lang.RuntimeException: Failed to connect to zookeeper (zookeeper.1:22181) within 2x zookeeper timeout period 30000
at org.apache.accumulo.fate.zookeeper.ZooSession.connect(ZooSession.java:124)
at org.apache.accumulo.fate.zookeeper.ZooSession.getSession(ZooSession.java:164)
at org.apache.accumulo.fate.zookeeper.ZooReader.getSession(ZooReader.java:43)
at org.apache.accumulo.fate.zookeeper.ZooReader.getZooKeeper(ZooReader.java:47)
at org.apache.accumulo.fate.zookeeper.ZooCache.getZooKeeper(ZooCache.java:59)
at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:159)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)
at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:169)
at org.apache.accumulo.core.client.ZooKeeperInstance.<init>(ZooKeeperInstance.java:159)
at org.apache.accumulo.core.client.ZooKeeperInstance.<init>(ZooKeeperInstance.java:140)
at org.apache.accumulo.core.client.mapreduce.RangeInputSplit.getInstance(RangeInputSplit.java:364)
at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat$AbstractRecordReader.initialize(AbstractInputFormat.java:495)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I checked to see if zookeeper was up and it was running. I ran telnet to see if the port was up and it was up.
I am using $ACCUMULO_HOME/bin/tool.sh to run the job. Any help would be appreciated.

It was an issue with the hosts file in my hadoop slaves. The hostname mappings were not correct.

Related

storm-twitter hdfsBolt hiveBolt failure on Kerberized cluster

Trying to set up a storm-twitter stream on cluster using hdfsBolt and HiveBolt for flushing data to disk/hive table. using https://github.com/pvillard31/storm-twitter as reference. Followed all instructions to pass keytabs/principal both inside topology and storm.yaml as per https://github.com/apache/storm/blob/master/external/storm-hive/README.md But still getting error on both the bolts.
For HDFSBolt getting:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
For HiveBolt getting:
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
Tried many different ways following other posts to make storm aware about secure cluster but seems like it is still expecting a SIMPLE authentication.
my storm.yaml has as follows where 'xyz' is user that is running the zookeeper/nimbus/supervisor/topology on a docker container:
storm.zookeeper.servers:
- "localhost"
nimbus.host: "127.0.0.1"
nimbus.seeds: ["localhost"]
ui.port: 5555
logviewer.port: 5566
hive.keytab.file : "/home/user/.kt/xyz.keytab"
hive.kerberos.principal : "hive/_HOST#HADOOP.DOMAIN.ORG"
hive.metastore.uris : "thrift://<f.q.d.n>:9083"
hdfs.keytab.file : "/home/user/.kt/xyz.keytab"
hdfs.kerberos.principal : "xyz#HADOOP.DOMAIN.ORG"
topology.auto-credentials : ["org.apache.storm.hive.security.AutoHive", "org.apache.storm.hdfs.security.AutoHDFS"]
hiveCredentialsConfigKeys : ["hivecluster"]
"hivecluster": {"hive.keytab.file": "/home/user/storm/hive.keytab", "hive.kerberos.principal": "hive/_HOST#HADOOP.DOMAIN.ORG", "hive.metastore.uris": "thrift://<f.q.d.n>:9083"}
hdfsCredentialsConfigKeys : ["hdfscluster"]
"hdfscluster": {"hdfs.keytab.file": "/home/user/.kt/xyz.keytab", "hdfs.kerberos.principal": "xyz#HADOOP.DOMAIN.ORG"}
I also included keytab info inside topology config:
Config config = new Config();
config.put(HdfsSecurityUtil.STORM_KEYTAB_FILE_KEY, "/home/user/.kt/xyz.keytab");
config.put(HdfsSecurityUtil.STORM_USER_NAME_KEY, "xyz#HADOOP.DOMAIN.ORG");
As well as cluster xmls in hdfsBolt:
public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/yarn-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hive/conf/hive-site.xml"));
this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
}
built a shaded jar to include everything except storm-core.
Any help would be appreciated.

java.net.ConnectException: Your endpoint configuration is wrong;

I am running word count program from my windows machine on hadoop cluster which is setup on remote linux machine.
Program is running successfully and I am getting output but I am getting following exception and my waitForCompletion(true) is not returning true.
java.io.IOException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:430)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:870)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:331)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:328)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:328)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:612)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1629)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
at practiceHadoop.WordCount$1.run(WordCount.java:60)
at practiceHadoop.WordCount$1.run(WordCount.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at practiceHadoop.WordCount.main(WordCount.java:24)
Caused by: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
at org.apache.hadoop.ipc.Client.call(Client.java:1437)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy16.getJobReport(Unknown Source)
at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:326)
... 17 more
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1383)
... 26 more
My MapReduce Program which I run on eclipse (windows)
UserGroupInformation ugi = UserGroupInformation.createRemoteUser("admin");
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
try {
Configuration configuration = new Configuration();
configuration.set("yarn.resourcemanager.address", "192.168.33.75:50001"); // see step 3
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.app.mapreduce.am.env",
"HADOOP_MAPRED_HOME=/home/admin/hadoop-3.1.0");
configuration.set("mapreduce.map.env", "HADOOP_MAPRED_HOME=/home/admin/hadoop-3.1.0");
configuration.set("mapreduce.reduce.env", "HADOOP_MAPRED_HOME=/home/admin/hadoop-3.1.0");
configuration.set("fs.defaultFS", "hdfs://192.168.33.75:54310"); // see step 2
configuration.set("mapreduce.app-submission.cross-platform", "true");
configuration.set("mapred.remote.os", "Linux");
configuration.set("yarn.application.classpath",
"{{HADOOP_CONF_DIR}},{{HADOOP_COMMON_HOME}}/share/hadoop/common/*,{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*,"
+ " {{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*,{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*,"
+ "{{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/*,{{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/lib/*,"
+ "{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*,{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*");
configuration.set("mlv_construct", "min");
configuration.set("column_name", "TotalCost");
Job job = Job.getInstance(configuration);
job.setJar("C:\\Users\\gauravp\\Desktop\\WordCountProgam.jar");
job.setJarByClass(WordCount.class); // use this when uploaded the Jar to the server and
// running the job directly and locally on the server
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(MapForWordCount.class);
job.setReducerClass(ReduceForWordCount.class);
Path input = new Path("/user/admin/wordCountInput.txt");
Path output = new Path("/user/admin/output");
FileSystem fs = FileSystem.get(configuration);
fs.delete(output);
FileInputFormat.addInputPath(job, input);
FileOutputFormat.setOutputPath(job, output);
if (job.waitForCompletion(true)) {
System.out.println("Job done...");
}
One more observation :
My connection from windows machine to remote linux machine ports (54310 and 50001) vanish after some time.
HDFS port connection status
yarn port connection status
I am stuck here from last 5 days. Please help me. Thanks in advance.

Check if your ResourceManager and NodeManager services are up and running using jps command. In my case only NameNode and DataNode services were up and above were not running. So when running a INSERT query on Hive, when it tried to run map reduce job it was failing with above error.
Starting yarn services mentioned above fixed the issue for me.

java.lang.NullPointerException: writeSupportClass should not be null while writing parquet file in a spark streaming job

In a spark streaming job, I am saving my rdd data into a parquet file in HDFS of Hadoop using code snippet below:
readyToSave.foreachRDD((VoidFunction<JavaPairRDD<Void, MyProtoRecord>>) rdd -> {
Configuration configuration = rdd.context().hadoopConfiguration();
Job job = Job.getInstance(configuration);
ParquetOutputFormat.setWriteSupportClass(job, ProtoWriteSupport.class);
ProtoParquetOutputFormat.setProtobufClass(job, MyProtoRecord.class);
rdd.saveAsNewAPIHadoopFile("path-to-hdfs", Void.class, MyProtoRecord.class, ParquetOutputFormat.class, configuration);
});
and I get exception below:
java.lang.NullPointerException: writeSupportClass should not be null
at parquet.Preconditions.checkNotNull(Preconditions.java:38)
at parquet.hadoop.ParquetOutputFormat.getWriteSupport(ParquetOutputFormat.java:326)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:272)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
how can I solve the problem?

Found out the problem!
while calling "saveAsNewAPIHadoopFile() method, you sould specify your job's configuration (job.getConfiguration()):
readyToSave.foreachRDD((VoidFunction<JavaPairRDD<Void, MyProtoRecord>>) rdd -> {
Configuration configuration = rdd.context().hadoopConfiguration();
Job job = Job.getInstance(configuration);
ParquetOutputFormat.setWriteSupportClass(job, ProtoWriteSupport.class);
ProtoParquetOutputFormat.setProtobufClass(job, MyProtoRecord.class);
rdd.saveAsNewAPIHadoopFile("path-to-hdfs", Void.class, MyProtoRecord.class, ParquetOutputFormat.class, job.getConfiguration());
});

org.apache.thrift.transport.TTransportException: Cannot open without port?

there
I am quite new to Hive, and a java app which accesses hive with kerberos authentication, like below:
try
{
System.setProperty("java.security.krb5.conf", "/haManage/krb5.conf");
StringBuilder sBuilder = new StringBuilder();
sBuilder.append("jdbc:hive2://ha-cluster/default");
sBuilder.append(";zk.quorum=").append("x.x.x.x,x.x.x.x");//ip list
sBuilder.append(";zk.port=").append("24002");
if (isSecureVer) {
sBuilder.append(";user.principal=")
.append("hadoop#HADOOP.COM")
.append(";user.keytab=")
.append("/home/hdclient/gyj/user.keytab")
.append(";sasl.qop=auth-conf;auth=KERBEROS;principal=hive/" +
"hadoop.hadoop.com#HADOOP.COM;zk.principal=zookeeper/hadoop.hadoop.com");
}
url = sBuilder.toString();
logger.info(url);
Class.forName("org.apache.hive.jdbc.HiveDriver");
connToHive = DriverManager.getConnection(url,"","");
} catch (Exception e)
{
logger.error("Error occurs",e);
}
But exception happens, shown below:
Caused by: org.apache.thrift.transport.TTransportException: Cannot open without port.
at org.apache.thrift.transport.TSocket.open(TSocket.java:172) ~[hive-exec-0.14.0.jar:0.14.0]
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248) ~[hive-exec-0.14.0.jar:0.14.0]
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) ~[hive-exec-0.14.0.jar:0.14.0]
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) ~[hive-exec-0.14.0.jar:0.14.0]
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) ~[hive-exec-0.14.0.jar:0.14.0]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_45]
at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) ~[hadoop-common-2.6.4.jar:na]
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) ~[hive-exec-0.14.0.jar:0.14.0]
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190) ~[hive-jdbc-1.1.0.jar:1.1.0]
... 6 common frames omitted
Any effort will be appreciated.

While you have the zookeeper port specified as a query string parameter (needed for kerberos auth) you also need to have the port for hive after the hostname part of the URL. The normal port used by Hive is 10000, so your URL might start like this:
sBuilder.append("jdbc:hive2://ha-cluster:10000/default");

FTP file name in FileInputFormat.setInputPath

I have a code that reads data from FTP server using mapreduce code . The code we use to connect to ftp server is as follows `
String inputPath = args[0];
String outputPath = args[1];
Configuration conf1 = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf1, args).getRemainingArgs();
Path arg = new Path(inputPath);
FTPFileSystem ftpfs = new FTPFileSystem();
Path arg1 =new Path(outputPath);
ftpfs.setConf(conf1);
String ftpUser = URLEncoder.encode("username", "UTF-8");
String ftpPass = URLEncoder.encode("password", "UTF-8");
String url = String.format("ftp://%s:%s#ftpserver.com",
ftpUser, ftpPass);
ftpfs.initialize(new URI(url), conf1);
JobConf conf = new JobConf(FTPIF.class);
FileOutputFormat.setOutputPath(conf, arg1));
FileInputFormat.setInputPaths(conf, ftpfs.makeQualified(arg));
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(NullWritable.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setInputFormat(CustomInputFormat.class);
conf.setMapperClass(CustomMap.class);
conf.setReducerClass(CustomReduce.class);
JobClient.runJob(conf);
`
The problem is this code works perfectly fine in pseudo mode but gives a login failed on server error when run on a cluster.the error stack trace is
ERROR security.UserGroupInformation: PriviledgedActionException as:username (auth:SIMPLE) cause:java.io.IOException: Login failed on server - 0.0.0.0, port - 21
Exception in thread "main" java.io.IOException: Login failed on server - 0.0.0.0, port - 21
at org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:389)
at org.apache.hadoop.fs.FileSystem.getFileStatus(FileSystem.java:2106)
at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1566)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1503)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:174)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1041)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1033)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
at FTPIF.run(FTPIF.java:164)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at FTPIF.main(FTPIF.java:169)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208
The cluster has connectivity to ftp . The credentials used are correct. Any ideas why the code is not able to connect to ftp ?

If you have many nodes on your cluster and multiple mappers are trying to open connections to your FTP server then you can exceed the limit of FTP users which FTP server supports.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cannot connect to Zookeeper when running a mapreduce job - hadoop

It was an issue with the hosts file in my hadoop slaves. The hostname mappings were not correct.

Related

storm-twitter hdfsBolt hiveBolt failure on Kerberized cluster

java.net.ConnectException: Your endpoint configuration is wrong;

java.lang.NullPointerException: writeSupportClass should not be null while writing parquet file in a spark streaming job

org.apache.thrift.transport.TTransportException: Cannot open without port?

FTP file name in FileInputFormat.setInputPath

Categories

Resources