Can Minio repair the damaged node? - minio

I build a 4 nodes minio cluster,after running for some time,one node was accidentlly being deleted.When I bring the node back online,there is no data being uploaded to this node since then,only another 3 nodes is working.
Is there some way to repair the damaged node and make it work.

As far as I can tell the current setup forces you to first heal the node and then the bucket. Meaning that if you have something like:
minio.exe --config-dir c:\data_config server --address ":8001" d:\node1 e:\node2 f:\node3 g:\node4
And disk f: goes bad you replace such drive and then, assuming you've setup local as an alias you have to do:
// this will bring your disk up to speed
mc admin heal local
And then:
// this will bring your data up to speed
mc admin heal local/bucket - where bucket is the bucket name you want to sync

Yes it can using mc admin heal command, Currently available as beta feature - its a work in progress and should fully ready in future releases.

Related

Problem with ShadowCopy, error 0x80042306

I have a problem with the Shadow Copy. Specifically, when I try to set up a Shadow Copy of a given volume, error 0x80042306 appears.
Additionally, there is no possibility to choose a Shadow Copy for the same volume, I simply cannot select my own partition to perform the copy on the same volume.
The second issue is that the partition to which the error pertains is part of a larger disk. We have a 30TB disk and expanded it by creating a new 70TB partition, and the error is related to this second one. Other disks perform correctly. The entire disk is on a disk array.
To preempt the question, all other backup applications have been removed and no other applications are using VSS.
There are only two Microsoft providers in the registry.
I would be grateful for any information.
Best regards,
We have uninstalled all backup applications.
We have tried to set up ShadowCopy on other disk/partitions.

How to proceed with a data node with corrupt disk file system

I would really appreciate help on the correct course of action. The setup is 3 ELK nodes which have all roles.
No shard replication is done. Node 3 experienced a failure on the disk which contains the data folder. An old copy (about a month) of that folder exists, and I know it would not be sufficient to copy the data in.
My question is, what is the correct course of action at this point which would return the stack to normal operation mode:
install a new disk and just launch the node? By a strike of luck, that was our least important data.
install the new disk and copy the old data and see if it can recover that data?
Also, would doing option 1, while launching an experimental node on which the data folder is mounted and restore whichever recoverable data and re-index them remotely to the original cluster?
Another option is to try to use the bin/elasticsearch-shard tool to see if you can repair part of the data.

Datadog monitoring Disk usage

I want to use datadog for monitoring my EC2 Instance Disk utilization and create alerts for it. I am using system.disk.in_use metric but I am not getting my root mount point in from sectionavg:system.disk.in_use{device:/dev/loop0} by {host} and my root mount point is /dev/root. I can see every loop mount point in the list but can't see the root. due to this, the data I am getting in the monitor is different than the actual server, for example, df -hT is showing 99% root in the server but on datadog monitoring it is showing 60%.
I am not too familiar with how to use datadog, can someone please help?
Try to research about it but not able to resolve the issue.
You can also try to use the device label to read in only the root volume such as:
avg:system.disk.in_use{device_label:/} by {host}
I personally found the metric system.disk.in_use to equal the total and instead added a formula that calculated the utilization using system.disk.total and system.disk.free to be more accurate.

Expanding root partition on AWS EC2

I created a public VPC and then added a bunch of nodes to it so that I can use it for a spark cluster. Unfortunately, all of them have a partition setup that looks like the following:
ec2-user#sparkslave1: lsblk
/dev/xvda 100G
/dev/xvda1 5.7G /
I setup a cloud manager on top of these machines and all of the nodes only have 1G left for HDFS. How do I extend the partition so that it takes up all of the 100G?
I tried created /dev/xvda2, then created a volume group, added all of /dev/xvda* to it but /dev/xvda1 doesn't get added as it's mounted. I cannot boot from a live CD in this case, it's on AWS. I also tried resize2fs but it says that the root partition already takes up all of the available blocks, so it cannot be resized. How do I solve this problem and how do I avoid this problem in the future?
Thanks!
I don't think you can just resize the running root volume. This is how you'd go about increasing the root size:
create a snapshot of your current root volume
create a new volume from this snapshot of the size that you want (100G?)
stop the instance
detach the old small volume
attach the new bigger volume
start instance
I had a same problem before, but can't remember the solution. Did you try to run
e2resize /dev/xvda1
*This is when your using ext3, which is usually the default. The e2resize command will "grow" the ext3 filesystem to use the remaining free space.

Storing mapreduce intermediate output on a remote server

I use a hadoop (version 1.2.0) cluster of 16 nodes, one with a public IP (the master) and 15 connected through a private network (the slaves).
Is it possible to use a remote server (in addition to these 16 nodes) for storing the output of the mappers? The problem is that the nodes are running out of disk space during the map phase and I cannot compress map output any more.
I know that mapred.local.dirin mapred-site.xml is used to set a comma-separated list of dirs where the tmp files are stored. Ideally, I would like to have one local dir (the default one) and one directory on the remote server. When the local disk fills, then I would like to use the remote disk.
I am not very sure about about this but as per the link (http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml) it says that:
The local directory is a directory where MapReduce stores intermediate data files.
May be a comma-separated list of directories on different devices in
order to spread disk i/o. Directories that do not exist are ignored.
Also there are some other properties which you should check out. These might be of help:
mapreduce.tasktracker.local.dir.minspacestart: If the space in mapreduce.cluster.local.dir drops under this, do not ask for more tasks. Value in bytes
mapreduce.tasktracker.local.dir.minspacekill: If the space in mapreduce.cluster.local.dir drops under this, do not ask more tasks until all the current ones have finished and cleaned up. Also, to save the rest of the tasks we have running, kill one of them, to clean up some space. Start with the reduce tasks, then go with the ones that have finished the least. Value in bytes.
The solution was to use the iSCSI technology. A technician helped us out to achieve that, so unfortunately I am not able to provide more details on that.
We mounted the remote disk to a local path (/mnt/disk) of each slave node, and created a tmp file there, with rwx priviledges for all users.
Then, we changed the $HADOOP_HOME/conf/mapred-site.xml file and added the property:
<property>
<name>mapred.local.dir</name>
<value>/mnt/disk/tmp</value>
</property>
Initially, we had two, comma-separated values for that property, with the first being the default value, but it still didn't work as expected (we still got some "No space left on device" errors). So we left only one value there.

Resources