Spark/Hadoop/Yarn cluster communication requires external ip? - hadoop

I deployed Spark (1.3.1) with yarn-client on Hadoop (2.6) cluster using bdutil, by default, the instances are created with Ephemeral external ips, and so far spark works fine. With some security concerns, and assuming the cluster is internal accessed only, I removed the external ips from the instances; after that, the spark-shell will not even run, and seemed it cannot communicate with Yarn/Hadoop, and just stuck indefinitely. Only after I added the external ips back, the spark-shell starts working properly.
My question is, is external ips of the nodes required to run spark over yarn, and why? If yes, will there be any concerns regarding security, etc? Thanks!

Short Answer
You need external IP addresses to access GCS, and default bdutil settings set GCS as the default Hadoop filesystem, including for control files. Use ./bdutil -F hdfs ... deploy to use HDFS as the default instead.
Security shouldn't be a concern when using external IP addresses unless you've added too many permissive rules to your firewall rules in your GCE network config.
EDIT: At the moment there appears to be a bug where we set spark.eventLog.dir to a GCS path even if the default_fs is hdfs. I filed https://github.com/GoogleCloudPlatform/bdutil/issues/35 to track this. In the meantime just manually edit /home/hadoop/spark-install/conf/spark-defaults.conf on your master (you might need to sudo -u hadoop vim.tiny /home/hadoop/spark-install/conf/spark-defaults.conf to have edit permissions on it) to set spark.eventLog.dir to hdfs:///spark-eventlog-base or something else in HDFS, and run hadoop fs -mkdir -p hdfs:///spark-eventlog-base to get it working.
Long Answer
By default, bdutil also configures Google Cloud Storage as the "default Hadoop filesystem", which means that control files used by Spark and YARN require access to Google Cloud Storage. Additionally, external IPs are required in order to access Google Cloud Storage.
I did manage to partially repro your case after manually configuring intra-network SSH; during startup I actually see the following:
15/06/26 17:23:05 INFO yarn.Client: Preparing resources for our AM container
15/06/26 17:23:05 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.0-hadoop2
15/06/26 17:23:26 WARN http.HttpTransport: exception thrown while executing request
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:625)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:933)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getBucket(GoogleCloudStorageImpl.java:1557)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1512)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage.getItemInfo(CacheSupplementedGoogleCloudStorage.java:516)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1016)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.exists(GoogleCloudStorageFileSystem.java:382)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configureBuckets(GoogleHadoopFileSystemBase.java:1639)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.configureBuckets(GoogleHadoopFileSystem.java:71)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1587)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:776)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:739)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:216)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:384)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:102)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:381)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016)
at $line3.$read$$iwC$$iwC.<init>(<console>:9)
at $line3.$read$$iwC.<init>(<console>:18)
at $line3.$read.<init>(<console>:20)
at $line3.$read$.<init>(<console>:24)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.<init>(<console>:7)
at $line3.$eval$.<clinit>(<console>)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:973)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:157)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As expected, simply by calling org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start it tries to contact Google Cloud Storage, and fails because there's not GCS access without external IPs.
To get around this, you can simply use -F hdfs when creating your cluster to use HDFS as your default filesystem; in that case everything should work intra-cluster even without external IP addresses. In that mode, you can still even continue to use GCS whenever you have external IP addresses assigned by specifying full gs://bucket/object paths as your Hadoop arguments. However, note that in that case, as long as you've removed the external IP addresses, you won't be able to use GCS unless you also configure a proxy server and funnal all data through your proxy; the GCS configs for that is fs.gs.proxy.address.
In general, there's no need to worry about security just because of having external IP addresses unless you've opened up new permissive rules in your "default" network firewall rules in Google Compute Engine.

Related

hdfs put fails from laptop to remote hadoop cluster

I have my hadoop cluster set up on a different network. Because of this, hdfs put is failing when I run it from my laptop.
Is there a port I should forward or something to access the datanodes remotely? I see it's using the local ip address in the error message.
Here is the command: hdfs dfs -put ~/Documents/reddit-streaming/redditStreaming/target/redditStreaming-1.0-SNAPSHOT.jar hdfs://mydns.asuscomm.com:8021/user/me/jars/
and here is the error message:
2021-10-14 18:04:55,704 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742036_1212
java.net.UnknownHostException
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
2021-10-14 18:04:55,708 WARN hdfs.DataStreamer: Abandoning BP-668799564-192.168.50.7-1633461871664:blk_1073742036_1212
2021-10-14 18:04:55,752 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.50.31:9866,DS-60974173-31d6-4dcb-a2ba-05ab6431db66,DISK]
2021-10-14 18:05:00,801 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742037_1213
java.net.UnknownHostException
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
2021-10-14 18:05:00,801 WARN hdfs.DataStreamer: Abandoning BP-668799564-192.168.50.7-1633461871664:blk_1073742037_1213
2021-10-14 18:05:00,833 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.50.19:9866,DS-aeaca5a1-562c-4f35-b2fb-6f0b51c5f695,DISK]
2021-10-14 18:05:00,869 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/me/jars/redditStreaming-1.0-SNAPSHOT.jar._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2329)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2942)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:915)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:593)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:600)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:568)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:552)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1573)
at org.apache.hadoop.ipc.Client.call(Client.java:1519)
at org.apache.hadoop.ipc.Client.call(Client.java:1416)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:530)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1084)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1898)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1700)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
I have this property in my hdfs-site.xml file on my laptop:
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
I can also see in the UI that both datanodes are running.
I assume you've forwarded the namenode port (8021) since it can see that 2 datanodes exist?
Yes, the datanodes have their own ports that need to be available to the client for data to actually be written
Check the value for dfs.datanode.address and make sure you can establish a connection to the port listed there for each datanode.
If you look at the error, you can see this is 9866
Excluding datanode DatanodeInfoWithStorage[192.168.50.31:9866
And also, IIUC, the use.datanode.hostname config needs to actually be in the cluster, not your local laptop config, for the protocol to return the hostnames rather than the IPs
There is also an HTTP Port you can open if you want to see each Datanode's web-portal (should be available to be accessed from the Namenode UI as well)
The alternative, more secure / less exposed, option is to establish an edge-node between the networks that you can only SSH to & SFTP files into (assuming you don't otherwise have a shared fileserver), then run your hdfs commands from there. You can setup a SOCKS proxy if you needed to access a Web UI in that network
To re-iterate, you should not expose a Hadoop cluster without Kerberos & TLS over dynamic DNS through any internet-facing router

Setup multiple kafka connect sinks

I am working on streaming the data from postgreSQL to HDFS. I had setup confluent environment on HDP 2.6 sandbox. My jdbc source configs for postgreSQL are
name=jdbc_1
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:postgresql://host:port/db?currentSchema=schema&user=user&password=password
mode=timestamp
timestamp.column.name=col1
validate.non.null=false
topic.prefix=psql-
All other properties for connection are also fine and i am running it by
./bin/connect-standalone ./etc/kafka/connect-standalone.properties ./etc/kafka-connect-jdbc/source.properties
Its working fine and creating topics based on the number of tables in the database as
psql-table1
psql-table2
Now i want to run HDFS sinks on all the topics to create separate dir for every table in the postgreSQL database.
But when i run HDFS sink with command
./bin/connect-standalone ./etc/kafka/connect-standalone.properties ./etc/kafka-connect-hdfs/hdfs-postGres.properties
by running the source i am getting error
ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:113)
org.apache.kafka.connect.errors.ConnectException: Unable to start REST server
at org.apache.kafka.connect.runtime.rest.RestServer.start(RestServer.java:214)
at org.apache.kafka.connect.runtime.Connect.start(Connect.java:53)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:95)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:331)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:299)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:235)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:398)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.kafka.connect.runtime.rest.RestServer.start(RestServer.java:212)
... 2 more
and if i stop the source connection and start the sink it works fine.
Anyone can help me that how i can setup multiple sink connectors.
Kafka Connect starts a rest server on port 8083.
If you run more that one standalone connector on a single machine, you need to change it with the rest.port property
Or you can run connect-distributed, then POST your source and sink configurations individually as JSON payloads running on a single Connect server, then you wouldn't have this Address already in use issue.

How install alluxio1.2 on openstack

I try to install alluxio1.2 on a VM centos on openstack with spark and hdfs but the installation doesn't works. Spark and hdfs are already install and work
ERROR logger.type (AlluxioMaster.java:main) - Uncaught exception while running Alluxio master, stopping it and exiting.
java.lang.RuntimeException: java.net.BindException: Address already in use
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at alluxio.web.UIWebServer.startWebServer(UIWebServer.java:164)
at alluxio.master.AlluxioMaster.startServingWebServer(AlluxioMaster.java:467)
at alluxio.master.AlluxioMaster.startServing(AlluxioMaster.java:452)
at alluxio.master.AlluxioMaster.startServing(AlluxioMaster.java:447)
at alluxio.master.AlluxioMaster.start(AlluxioMaster.java:389)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:86)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at alluxio.web.UIWebServer.startWebServer(UIWebServer.java:154)
... 5 more
Are there a special installation to install alluxio on one openstack machine ?
It looks like the Alluxio master's web UI cannot start because the address is already in use. This happens if the port is taken by another process. Alluxio web UI uses the port 19999 for the web UI by default. If you expect another process to be using that port, you can change the Alluxio master web UI port by changing the configuration parameter (http://www.alluxio.org/docs/master/en/Configuration-Settings.html#master-configuration), alluxio.master.web.port, to another port number.

Giraph ZooKeeper port problems

I am trying to run the SimpleShortestPathsVertex (aka SimpleShortestPathComputation) example described in the Giraph Quick Start. I am running this on a Hortonworks Sandbox instance (HDP 2.1) using VirtualBox, and I packaged giraph.jar using profile hadoop_2.0.0.
When I try to run the example using
hadoop jar giraph.jar org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsVertex -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/user/hue/tinygraph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hue/output/shortestpaths -w 1
I get the following exception
2014-04-30 07:22:15,390 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to sandbox.hortonworks.com:22181 with poll msecs = 3000
2014-04-30 07:22:15,396 WARN [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
I have found a work around - it seems that Giraph expects ZooKeeper to be running on port 22181, while it is actually running on 2181. I have simply used the Ambari interface to set ZooKeeper to run on 22181 (go to http://127.0.0.1:8080/, login admin/admin, Services tab, ZooKeeper and change the port to 22181, save and Service Actions -> Restart All.
Does anyone have a better solution for this problem? Is there a config via which the port should be specified, or is this port in the Giraph source code a typo?
Yes, you can specify each time you run a Giraph job via using option -Dgiraph.zkList=localhost:2181
Also you can set it up in Hadoop configs and then you don't have to pass on this option each time you submit a Giraph job. For that add the following line in conf/core-site.xml file :
<property><name>giraph.zkList</name><value>localhost:2181</value></property>
[Please check the syntax, I don't recall it on top my head and currently I don't have access to a cluster to check it]

Accumulo on Cloudera CDH4 - Access denied when starting components

I have a small cluster up and running with Cloudera CDH4 Hadoop and Map Reduce v1. Namenode/Secondary Namenode/Jobtracker all on different machines. My three servers are also acting as Zookeeper servers.
I'm trying to install Accumulo 1.4.4 on top of this cluster. I get the same behavior with Accumulo 1.5.0. I am able to bin/accumulo init and initialize Accumulo, but starting the individual components fail. I'm trying to make my Namenode the Accumulo master.
bin/start-server.sh localhost monitor spits out a very encouraging Starting monitor on localhost, but nothing gets started. If I examine logs/monitor_localhost.err I find a stacktrace:
-bash-4.1$ cat logs/monitor_localhost.err
Thread "monitor" died null
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.accumulo.start.Main$1.run(Main.java:91)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2464)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2456)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2323)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
at org.apache.accumulo.core.file.FileUtil.getFileSystem(FileUtil.java:554)
at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceIDFromHdfs(ZooKeeperInstance.java:258)
at org.apache.accumulo.server.conf.ZooConfiguration.getInstance(ZooConfiguration.java:65)
at org.apache.accumulo.server.conf.ServerConfiguration.getZooConfiguration(ServerConfiguration.java:49)
at org.apache.accumulo.server.conf.ServerConfiguration.getSystemConfiguration(ServerConfiguration.java:58)
at org.apache.accumulo.server.monitor.Monitor.run(Monitor.java:440)
at org.apache.accumulo.server.monitor.Monitor.main(Monitor.java:433)
... 6 more
Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessDeclaredMembers)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:399)
at java.security.AccessController.checkPermission(AccessController.java:557)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.Class.checkMemberAccess(Class.java:2237)
at java.lang.Class.getDeclaredFields(Class.java:1805)
at org.apache.hadoop.util.ReflectionUtils.getDeclaredFieldsIncludingInherited(ReflectionUtils.java:315)
at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder.initRegistry(MetricsSourceBuilder.java:92)
at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder.<init>(MetricsSourceBuilder.java:56)
at org.apache.hadoop.metrics2.lib.MetricsAnnotations.newSourceBuilder(MetricsAnnotations.java:42)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:212)
at org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:54)
at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:97)
at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:190)
... 18 more
The AccessControlException: access denied looks like the important line to me, but I can't imagine what access is being restricted. I'm running everything as the hdfs user, which owns the entire /opt/accumulo-1.4.4/ directory where accumulo is un-tarred. The /accumulo directory in HDFS is also owned by the hdfs user. SELinux is permissive. Searching online has proved fruitless, has anyone dealt with this error before?
Much thanks.
I started browsing the Apache accumulo-users mailing list archive and came across the solution.
http://mail-archives.apache.org/mod_mbox/accumulo-user/201312.mbox/%3CB9CB2B2BF27F0F46B8ECF781831E00E710970A9F%400015-its-exmb10.us.saic.com%3E
I was copying the accumulo.policy.example to accumulo.policy because I thought I needed it in my configuration. Once I deleted the accumulo.policy file my issues went away and I've been able to stand up Accumulo (1.5.0 at least, 1.4.4 still has some issues for me)

Resources