Cassandra Snapshot and Restart - bash

Being a level 1 novice in Linux (Ubuntu 9), shell and cron, I've had some difficulty figuring this out. Each night, I'd like to take a snapshot of our Cassandra nodes and restart the process.
Why? Because our team is hunting down a memory leak that requires a process restart every 3 weeks or so. The root cause has been difficult to track down. In the meantime, I'd like to put these cron jobs in place to reduce service interruption.
Thanks in advance for anyone who has some of these already figured out!

The general procedure is:
Run nodetool drain (http://www.riptano.com/docs/0.6/utilities/nodetool#nodetool-drain) on the node
Run nodetool snapshot
Kill the cassandra process
Start the cassandra process
When running nodetool snapshot, it is very important that you have JNA set up and working. This includes:
Having jna.jar in Cassandra's lib directory and either:
Running Cassandra as root, or
Increasing the memory locking limit using 'ulimit -l' or something like /etc/security/limits.conf
If this is all correct, you should see a message about "mlockall" succeeding in the logs on startup.
The other thing to keep an eye on is your disk space usage; this will grow as compactions occur and the old SSTables are replaced (but their snapshots remain).

Related

In a Ceph FS setup, how do I order the backup MDS?

I have a CephFS Octopus system running with two active meta data servers (MDS) and seven in standby for any failures. The two active MDS run on more up-to-date machines with more RAM and CPU power, while the backup MDS are on older systems.
Of the backup MDS, one is preferred to take over (reasons do not matter, only that it has good hardware capabilities). How can I set an order in which the backup deamons take over when an active MDS fails? Is there even such a possibility?
I found no options in the documentation and have been searching for a while now already; the search results all link me to the general MDS setup.
What you could do (although it's not always recommended and depends on your actual use-case) is to set allow_standby_replay to true. This would assign two daemons as "hot standby" daemons for each of the active daemons. If those are not the ones you prefer, stop them and other daemons will take over. After your desired daemon is standby, you can start the other again.
If one active daemon crashes, the standby-replay daemon takes over. In the meantime you need to fix why it crashed, bring it back online and then it is a standby again.

Manually start HDFS every time I boot?

Laconically: Should I start HDFS every that I come back to the cluster after a power-off operation?
I have successfully created a Hadoop cluster (after loosing some battles) and now I want to be very careful on proceeding with this.
Should I execute start-dfs.sh every time I power on the cluster, or it's ready to execute my application's code? Same for start-yarn.sh.
I am afraid that if I run it without everything being fine, it might leave garbage directories after execution.
Just from playing around with the Hortonworks and Cloudera sandboxes, I can say turning them on and off doesn't seem to demonstrate any "side-effects".
However, it is necessary to start the needed services everytime the cluster starts.
As far as power cycling goes in a real cluster, it is recommended to stop the services running on the respective nodes before powering them down (stop-dfs.sh and stop-yarn.sh). That way there are no weird problems and any errors on the way to stopping the services will be properly logged on each node.

Elasticsearch Out of Memory Crash -- How to Delete Data?

Well, I started piping data into ES until it ran itself out of memory and crashed. I run free and i see that all memory is entirely used up.
I want to delete some data out of it (old data) but i can't query against localhost:9200, it rejects the connection.
How to fix the fact that i can't delete out the old data?
If you want to go hardcore about it, you can always delete anything in your data folder:
> rm $ES_HOME/data/<clustername>
Note: replace <clustername> with your real cluster name (the default is elasticsearch)
Stop indexing. If it stabilizes itself after few minute then try deleting the data again. Restart the cluster.
If it's still stuck, stop the indexing and restart the cluster.
In any case, if the nodes went OOM they need to be restarted, as the state the JVM is in is unknown.

SparkException: Master removed our application

I know there are other very similar questions on Stackoverflow but those either didn't get answered or didn't help me out. In contrast to those questions I put much more stack trace and log file information into this question. I hope that helps, although it made the question to become sorta long and ugly. I'm sorry.
Setup
I'm running a 9 node cluster on Amazon EC2 using m3.xlarge instances with DSE (DataStax Enterprise) version 4.6 installed. For each workload (Cassandra, Search and Analytics) 3 nodes are used. DSE 4.6 bundles Spark 1.1 and Cassandra 2.0.
Issue
The application (Spark/Shark-Shell) gets removed after ~3 minutes even if I do not run any query. Queries on small datasets run successful as long as they finish within ~3 minutes.
I would like to analyze much larger datasets. Therefore I need the application (shell) not to get removed after ~3 minutes.
Error description
On the Spark or Shark shell, after idling ~3 minutes or while executing (long-running) queries, Spark will eventually abort and give the following stack trace:
15/08/25 14:58:09 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask
This is not very helpful (to me), that's why I'm going to show you more log file information.
Error Details / Log Files
Master
From the master.log I think the interesing parts are
INFO 2015-08-25 09:19:59 org.apache.spark.deploy.master.DseSparkMaster: akka.tcp://sparkWorker#172.31.46.48:46715 got disassociated, removing it.
INFO 2015-08-25 09:19:59 org.apache.spark.deploy.master.DseSparkMaster: akka.tcp://sparkWorker#172.31.33.35:42136 got disassociated, removing it.
and
ERROR 2015-08-25 09:21:01 org.apache.spark.deploy.master.DseSparkMaster: Application Shark::ip-172-31-46-49 with ID app-20150825091745-0007 failed 10 times, removing it
INFO 2015-08-25 09:21:01 org.apache.spark.deploy.master.DseSparkMaster: Removing app app-20150825091745-0007
Why do the worker nodes get disassociated?
In case you need to see it, I attached the master's executor (ID 1) stdout as well. The executors stderr is empty. However, I think it shows nothing useful to tackle the issue.
On the Spark Master UI I verified to see all worker nodes to be ALIVE. The second screenshot shows the application details.
There is one executor spawned on the master instance while executors on the two worker nodes get respawned until the whole application is removed. Is that okay or does it indicate some issue? I think it might be related to the "(it) failed 10 times" error message from above.
Worker logs
Furthermore I can show you logs of the two Spark worker nodes. I removed most of the class path arguments to shorten the logs. Let me know if you need to see it. As each worker node spawns multiple executors I attached links to some (not all) executor stdout and stderr dumps. Dumps of the remaining executors look basically the same.
Worker I
worker.log
Executor (ID 10) stdout
Executor (ID 10) stderr
Worker II
worker.log
Executor (ID 3) stdout
Executor (ID 3) stderr
The executor dumps seem to indicate some issue with permission and/or timeout. But from the dumps I can't figure out any details.
Attempts
As mentioned above, there are some similar questions but none of those got answered or it didn't help me to solve the issue. Anyway, things I tried and verified are:
Opened port 2552. Nothing changes.
Increased spark.akka.askTimeout which results in the Spark/Shark app to live longer but eventually it still gets removed.
Ran the Spark shell locally with spark.master=local[4]. On the one hand this allowed me to run queries longer than ~3 minutes successfully, on the other hand it obviously doesn't take advantage of the distributed environment.
Summary
To sum up, one could say that the timeouts and the fact long-running queries are successfully executed in local mode all indicate some misconfiguration. Though I cannot be sure and I don't know how to fix it.
Any help would be very much appreciated.
Edit: Two of the Analytics and two of the Solr nodes were added after the initial setup of the cluster. Just in case that matters.
Edit (2): I was able to work around the issue described above by replacing the Analytics nodes with three freshly installed Analytics nodes. I can now run queries on much larger datasets without the shell being removed. I intend not to put this as an answer to the question as it is still unclear what is wrong with the three original Analytics nodes. However, as it is a cluster for testing purposes, it was okay to simply replace the nodes (after replacing the nodes I performed a nodetool rebuild -- Cassandra on each of the new nodes to recover their data from the Cassandra datacenter).
As mentioned in the attempts, the root cause is a timeout between the master node, and one or more workers.
Another thing to try: Verify that all workers are reachable by hostname from the master, either via dns or an entry in the /etc/hosts file.
In my case, the problem was that the cluster was running in an AWS subnet without DNS. The cluster grew over time by spinning up a node, the adding the node to the cluster. When the master was built, only a subset of the addresses in the cluster was known, and only that subset was added to the /etc/hosts file.
When dse spark was run from a "new" node, then communication from the master using the worker's hostname failed and the master killed the job.

Postgresql slow - how best to begin tuning?

My postgresql server seems to be intermittently going down. I have PgBouncer pool in front of it, so the website hits are well managed, or were until recently.
When I explore what's up with top command, I see the postmaster doing some CLUSTER. There's no cluster command in any of my cronjobs though. Is this what autovacuum is called these days?
How can I start to find out what's happening. What commands are the usual tricks in a PGSQL DBA's toolbox? I'm a bit new to this database, and only looking for starting points.
Thank you!
No, autovacuum never runs CLUSTER. You have something on your system that's doing so - daemon, cron job, or whatever. Check individual user crontabs.
CLUSTER takes an exclusive lock on the table. So that's probably why you think the system is "going down" - all queries that access this table will wait for the CLUSTER to complete.
The other common cause of people reporting intermittent issues is checkpoints taking a long time on slow disks. You can enable checkpoint logging to see if that's an issue. There's lots of existing info on ealing with checkpointing performance issues, so I won't repeat it here.
The other key tools are:
The pg_stat_activity and pg_locks views
pg_stat_statements
The PostgreSQL logs, with a useful log_line_prefix, log_checkpoints enabled, a log_min_duration_statement, etc
auto_explain

Resources