In a Ceph FS setup, how do I order the backup MDS? - cluster-computing

I have a CephFS Octopus system running with two active meta data servers (MDS) and seven in standby for any failures. The two active MDS run on more up-to-date machines with more RAM and CPU power, while the backup MDS are on older systems.
Of the backup MDS, one is preferred to take over (reasons do not matter, only that it has good hardware capabilities). How can I set an order in which the backup deamons take over when an active MDS fails? Is there even such a possibility?
I found no options in the documentation and have been searching for a while now already; the search results all link me to the general MDS setup.

What you could do (although it's not always recommended and depends on your actual use-case) is to set allow_standby_replay to true. This would assign two daemons as "hot standby" daemons for each of the active daemons. If those are not the ones you prefer, stop them and other daemons will take over. After your desired daemon is standby, you can start the other again.
If one active daemon crashes, the standby-replay daemon takes over. In the meantime you need to fix why it crashed, bring it back online and then it is a standby again.

Related

How to reconfigure a non-HA HDFS cluster to HA with minimal/no downtime?

I have a single namenode HDFS cluster with multiple datanodes that store many terabytes of data. I want to enable high availability on that cluster and add another namenode. What is the most efficient and least error-prone way to achieve that? Ideally that would work without any downtime or with a simple restart.
The two options that came to mind are:
Edit the configuration of the namenode to facilitate the HA features and restart it. Afterwards add the second namenode and reconfigure and restart the datanodes, so that they are aware that the cluster is HA now.
Create an identical cluster in terms of data, but with two namenodes. Then migrate the data from the old datanodes to the new datanodes and finally adjust the pointers of all HDFS clients.
The first approach seems easier, but requires some downtime and I am not sure if that is even possible. The second one is somehow cleaner, but there are potential problems with the data migration and the pointers adjustments.
You won't be able to do this in-place without any down time; a non-HA setup is exactly that, not highly available, and thus any code/configuration changes require downtime.
To incur the least amount of downtime while doing this in-place, you probably want to:
Set up configurations for an HA setup. This includes things like a shared edits directory or journal nodes - see https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html or https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html.
Create a new fsImage using the hdfs dfsadmin command. This will ensure that the NameNode is able to restart quickly (on startup, the NN will read the most recent fsImage, then apply all edits from the EditLog that were created after that fsImage).
Restart your current NameNode and put it into active mode.
Start the new NameNode in standby.
Update configurations on DataNodes and restart them to apply.
Update configurations on other clients and restart to apply.
At this point everything will be HA-aware, and the only downtime incurred was a quick restart of the active NN - equivalent to what you would experience during any code/configuration change in a non-HA setup.
Your second approach should work, but remember that you will need twice as much hardware, and maintaining consistency across the two clusters during the migration may be difficult.

HDFS migrate datanodes servers to new servers

I want to migrate our hadoop server with all the data and components to new servers (newer version of redhat).
I saw a post on cloudera site about how to move the namenode,
but I dont know how to move all the datanodes without data loss.
We have replica factor 2.
If I will shutdown 1 datanode at a time hdsfs will generate new replicas?
Is there A way to migrate all the datanodes at once? what is the correct way to transfer all (about 20 server) datanodes to a new cluster?
Also I wanted to know if hbase will have the same problem or if I can just to delete and add the roles on the new servers
Update for clearify:
My Hadoop cluster already contains two sets of servers (They are in the same hadoop cluster, I just splited it logicly for the example)
First set is the older version of linux servers
Second set is the newer version of linux servers
Both sets are already share data and components (the namenode is in the old set of servers).
I want to remove all the old set of servers so only the new set of servers will remain in the hadoop cluster.
Does the execution should be like:
shutdown one datanode (from old servers set)
run balancer and wait for finish
do the same for the next datanodes
because if so, the balancer operation takes a lot of time and the whole operation will take a lot of time.
Same problem is for the hbase,
Now hbase region and master are only on the old set of servers, and I want remove it and install on the new set of servers without data loss.
Thanks
New Datanodes can be freely added without touching the namenode. But you definitely shouldn't shut down more than one at a time.
As an example, if you pick two servers to shut down at random, and both hold a block of a file, there's no chance of it replicating somewhere else. Therefore, upgrade one at a time if you're reusing the same hardware.
In an ideal scenario, your OS disk is separated from the HDFS disks. In which case, you can unmount them, upgrade the OS, reinstall HDFS service, remount the disks, and everything will work as previously. If that isn't how you have the server set up, you should do that before your next upgrade.
In order to get replicas added to any new datanodes, you'll need to either 1) Increase the replication factor or 2) run the HDFS rebalancer to ensure that the replicas are shuffled across the cluster
I'm not too familiar with Hbase, but I know you'll need to flush the regionservers before you install and migrate that service to other servers. But if you flush the majority of them without rebalancing the regions, you'll have one server that holds all the data. I'm sure the master server has similar caveats, although hbase backup seems to be a command worth trying.
#guylot - After adding the new nodes and running the balancer process take the old nodes out of the cluster by going through the decommissioning process. The decommissioning process will move the data to another node in your cluster. As a matter of precaution, only run against on one node at a time. This will limit the potential for a lost data incident.

Manually start HDFS every time I boot?

Laconically: Should I start HDFS every that I come back to the cluster after a power-off operation?
I have successfully created a Hadoop cluster (after loosing some battles) and now I want to be very careful on proceeding with this.
Should I execute start-dfs.sh every time I power on the cluster, or it's ready to execute my application's code? Same for start-yarn.sh.
I am afraid that if I run it without everything being fine, it might leave garbage directories after execution.
Just from playing around with the Hortonworks and Cloudera sandboxes, I can say turning them on and off doesn't seem to demonstrate any "side-effects".
However, it is necessary to start the needed services everytime the cluster starts.
As far as power cycling goes in a real cluster, it is recommended to stop the services running on the respective nodes before powering them down (stop-dfs.sh and stop-yarn.sh). That way there are no weird problems and any errors on the way to stopping the services will be properly logged on each node.

Postgresql slow - how best to begin tuning?

My postgresql server seems to be intermittently going down. I have PgBouncer pool in front of it, so the website hits are well managed, or were until recently.
When I explore what's up with top command, I see the postmaster doing some CLUSTER. There's no cluster command in any of my cronjobs though. Is this what autovacuum is called these days?
How can I start to find out what's happening. What commands are the usual tricks in a PGSQL DBA's toolbox? I'm a bit new to this database, and only looking for starting points.
Thank you!
No, autovacuum never runs CLUSTER. You have something on your system that's doing so - daemon, cron job, or whatever. Check individual user crontabs.
CLUSTER takes an exclusive lock on the table. So that's probably why you think the system is "going down" - all queries that access this table will wait for the CLUSTER to complete.
The other common cause of people reporting intermittent issues is checkpoints taking a long time on slow disks. You can enable checkpoint logging to see if that's an issue. There's lots of existing info on ealing with checkpointing performance issues, so I won't repeat it here.
The other key tools are:
The pg_stat_activity and pg_locks views
pg_stat_statements
The PostgreSQL logs, with a useful log_line_prefix, log_checkpoints enabled, a log_min_duration_statement, etc
auto_explain

How to configure Hadoop 2.2 to fit this situation?

I have installed the hadoop 2.2 with four machines. They are:
namenodes: master1,master2
datanodes: slave1,slave2
The master1 is installed on my notebook, and I want to close the master1 when I sleep. And
when the master1 is in the active state, the master2 is in the standby state. When I close my notebook, will the hadoop cluster automatically change the active namenode to the master2?
I don't know if I understand the meaning of hadoop v2's multiple namenodes. Does the feature fit my situation described above? Thanks.
You need to use the High Availability feature for the NameNodes. When it detects the Primary NameNode going offline, the Secondary will automatically take over.
There will be a brief hiccup in the operation of the cluster, however, as the Secondary node will delay in responding to block location requests for a short amount of time (30 to 60 seconds usually), giving all Data nodes enough time to point to the new NN.

Resources