Elassandra Node failing to start - amazon-ec2

We are building Elassandra Cluster on a Single AWS Region with the DC being spread over 3 AZ(s). We are using ec2snitch and facing the following exception while starting the 2nd node in the cluster. The 1st one comes up fine. Version used is Elassandra 5.5.0.24
2018-11-09 03:22:38,945 INFO [main] org.elassandra.discovery.CassandraDiscovery.doStart(CassandraDiscovery.java:152) localNode name=10.XX.YY.ZZZ id=<<UUID>> localAddress=/10.XX.YY.ZZZ publish_host=10.XX.YY.ZZZ:1234
{5.5.0}: Initialization Failed ...
- NullPointerException[null]
2018-11-09 03:22:38,949 ERROR [main] org.apache.cassandra.service.ElassandraDaemon.main(ElassandraDaemon.java:588) Exception
java.lang.NullPointerException: null
at org.elassandra.discovery.CassandraDiscovery.updateClusterGroupsFromGossiper(CassandraDiscovery.java:286)
at org.elassandra.discovery.CassandraDiscovery.doStart(CassandraDiscovery.java:172)
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:69)
at org.elasticsearch.node.Node.start(Node.java:757)
at org.apache.cassandra.service.ElassandraDaemon.activate(ElassandraDaemon.java:232)
at org.apache.cassandra.service.ElassandraDaemon.main(ElassandraDaemon.java:551)

Related

Expected hostname at index 7 for neo4j bolt (3.5.21)

We run neo4j (3.5.21) in an EC2 instance. Today, after I restarted the server, noticed this error:
Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start
Service start logs:
Active database: graph.db
Directories in use:
home: /var/lib/neo4j
config: /etc/neo4j
logs: /var/log/neo4j
plugins: /var/lib/neo4j/plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/run/neo4j
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 22577). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /var/log/neo4j/neo4j.log for current status.
This is what I see in neo4j.log:
2022-12-03 20:29:49.886+0000 INFO Bolt enabled on 0.0.0.0:7687.
2022-12-03 20:29:51.968+0000 INFO Started.
2022-12-03 20:29:52.121+0000 INFO Stopping...
2022-12-03 20:29:52.231+0000 INFO Stopped.
2022-12-03 20:29:52.233+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: org.neo4j.graphdb.config.InvalidSettingException: Unable to construct bolt discoverable URI using '' as hostname: Expected hostname at index 7: bolt://:7687
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:133)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.lambda$addBoltConnectorFromConfig$1(DiscoverableURIs.java:155)
at java.util.Optional.ifPresent(Optional.java:159)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.addBoltConnectorFromConfig(DiscoverableURIs.java:145)
at org.neo4j.server.rest.discovery.CommunityDiscoverableURIs.communityDiscoverableURIs(CommunityDiscoverableURIs.java:38)
at org.neo4j.server.CommunityNeoServer.lambda$createDBMSModule$0(CommunityNeoServer.java:99)
at org.neo4j.server.modules.DBMSModule.start(DBMSModule.java:59)
at org.neo4j.server.AbstractNeoServer.startModules(AbstractNeoServer.java:249)
at org.neo4j.server.AbstractNeoServer.access$700(AbstractNeoServer.java:102)
at org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter.start(AbstractNeoServer.java:541)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: java.net.URISyntaxException: Expected hostname at index 7: bolt://:7687
at java.net.URI$Parser.fail(URI.java:2847)
at java.net.URI$Parser.failExpecting(URI.java:2853)
at java.net.URI$Parser.parseHostname(URI.java:3389)
at java.net.URI$Parser.parseServer(URI.java:3235)
at java.net.URI$Parser.parseAuthority(URI.java:3154)
at java.net.URI$Parser.parseHierarchical(URI.java:3096)
at java.net.URI$Parser.parse(URI.java:3052)
at java.net.URI.<init>(URI.java:673)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:128)
... 15 more
2022-12-03 20:29:52.243+0000 INFO Neo4j Server shutdown initiated by request
EC2: t3.large
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1092-aws x86_64)
I have already tried restarting the server, restarting the service multiple times without any success. We have not changed anything on the networking (vpc, subnet, security groups, network interface, etc)
Curious if there's a config I am missing. Any help will be much appreciated.

FileNotFoundException while executing spark job

I am trying to execute a program on Spark. I have a cluster with a master and two slave nodes. I am receiving the following error during execution.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 4.0 failed 4 times, most recent failure: Lost task 3.3 in stage 4.0 (TID 44, hadoopslave3): java.lang.RuntimeException: java.io.FileNotFoundException: File /home/ubuntu/hadoop/hadoop-te/dl4j/1485860107978_-4ccc8c8/0/data/dataset_4-4ccc8c8_68.bin does not exist
Driver stacktrace is as follows:
Driver stacktrace:
at og.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
17/01/31 10:56:08 INFO scheduler.TaskSetManager: Lost task 1.3 in stage 4.0 (TID 45) on executor hadoopslave3: java.lang.RuntimeException (java.io.FileNotFoundException: File /home/ubuntu/hadoop/hadoop-te/dl4j/1485860107978_-4ccc8c8/0/data/dataset_2-4ccc8c8_77.bin does not exist) [duplicate 3]
However, I can see all the dataset objects (.bin files) created on the HDFS. Any suggesstions ?
Since you have a cluster with "two slave nodes" set up: do you also have a Hadoop filesystem set up? If not, then that's your issue.
The example you're linking to, when using a non-local cluster, transfers references to datasets using Hadoop. The behavior of this example has been made more predictable (with a big error message) in release 0.8.0.

Spark NullPointerException on SQLListener.onTaskEnd while finishing task

I have a Spark application using Scala which perform series of transformation, then writing the result to parquet file.
The transformation part finished without problem, the result output is written to HDFS correctly. The application is running on top of YARN cluster of 30 nodes.
However, the Spark application itself will not complete and exit the YARN. It will remain in resource manager.
After hanging for about an hour (consuming resources and vcores), then either it finishes or throw an error and killed itself.
Here is the error log of the application. Appreciate if anyone can shed some light on this matter.
16/08/24 14:51:12 INFO impl.ContainerManagementProtocolProxy: Opening proxy : phhdpdn013x.company.com:8041
16/08/24 14:51:22 INFO cluster.YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (phhdpdn013x.company.com:54175) with ID 1
16/08/24 14:51:22 INFO storage.BlockManagerMasterEndpoint: Registering block manager phhdpdn013x.company.com:24700 with 2.1 GB RAM, BlockManagerId(1, phhdpdn013x.company.com, 24700)
16/08/24 14:51:29 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
16/08/24 14:51:29 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
16/08/24 15:11:00 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception
java.lang.NullPointerException
at org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
16/08/24 15:11:46 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception
java.lang.NullPointerException
aa
What is your version of Spark?
Your ERROR looks a lot like this issue
https://issues.apache.org/jira/browse/SPARK-12339

Exception thrown by Agent Spark Job Server

I am trying to run Spark Job server on a multi node server.
I have set master="yarn-client" on the namenode
When I run server_start.sh
I get an error
Error: Exception thrown by the agent : java.lang.NullPointerException
The error is not coming due to used port. Port 8090 is free

Remotely connect to spark on yarn cluster in client mode

I have a remote spark on yarn cluster that if I use rstudio server(web version) hosted on that cluster to connect in client mode I can do the following:
sc <- SparkR::sparkR.init(master = "yarn-client")
However if I try to use rstudio on my local machine to connect to that spark cluster the same way then I have errors:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master
...
ERROR Utils: Uncaught exception in thread nioEventLoopGroup-2-2
java.lang.NullPointerException
...
ERROR RBackendHandler: createSparkContext on org.apache.spark.api.r.RRDD failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
A more detailed error message on hadoop application tracking page is like this:
User: blueivy
Name: SparkR
Application Type: SPARK
Application Tags:
State: FAILED
FinalStatus: FAILED
Started: 27-Oct-2015 11:07:09
Elapsed: 4mins, 39sec
Tracking URL: History
Diagnostics:
Application application_1445628650748_0027 failed 2 times due to AM Container for appattempt_1445628650748_0027_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://master:8088/proxy/application_1445628650748_0027/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1445628650748_0027_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:618)
at java.lang.Thread.run(Thread.java:785)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
I have the same configurations and environment for hadoop and spark with remote cluster: spark 1.5.1, hadoop 2.6.0 and ubuntu 14.04. Anyone can help me find what's my mistake here?

Resources