Fix boot disk critical message - boot

I got this message from WHM.
Do I have to fix it? Thank you.
The filesystem “/dev/sda1”, which is mounted at “/boot”, has reached “critical” status because it is 95.63% full.
My disk is:
Device Size Used Available Percent Used Mount Point
/dev/sda1 99M 90M 4.4M 96% /boot
/dev/sda2 451G 63G 366G 15% /
/usr/tmpDSK 4.0G 435M 3.4G 12% /tmp

Obviously the message just tells you that the disk partition /dev/sda1 is 95.65% full. This partition is mounted as /boot, and the contents probably won't change much. The folder /boot usually only contains information for the bootloader, nothing more. Therefore it isn't very critical in my eyes.
You could also look inside this folder to look its contents up, but it should look something like this if the grub bootloader is used:
drwxr-xr-x 6 root root 4096 6. Mär 09:25 grub
-rw-r--r-- 1 root root 18777509 19. Jun 19:45 initramfs-linux-fallback.img
-rw-r--r-- 1 root root 3577244 19. Jun 19:45 initramfs-linux.img
-rw-r--r-- 1 root root 4517872 8. Jun 08:42 vmlinuz-linux
In case you are using GRUB (otherwise there might be a different directory): the /boot/grub folder contains the current grub configuration (which menu entries are displayed, menu timeout, ...), which is derived from your /etc/defaults/grub configuration. The tools update-grub/grub-mkconfig have to be called after changing the configuration in order to update the /boot/grub folder.
The /boot/initramfs-*.img are the initial ram disk images, which contain modules loaded at startup. They can be modified by editing the /etc/mkinitcpio.conf configuration file and running mkinitramfs with appropriate parameters afterwards. The fallback image should always stay there in case something goes wrong with the normal boot. The vmlinuz-linux file is basically the kernel executable.

To fix this issues, You need to move old kernel files to any other partition so that you will not face any disk space issues. You can move that files to /Boot-backup directory. first create it and then move that files to /Boot-backup directory.

Related

is it right to limit cleaning /tmp each day in hadoop cluster

We have HDP cluster version – 2.6.4
Cluster installed on redhat machines version – 7.2
We noticed about the following issue on the JournalNodes machines ( master machines )
We have 3 JournalNodes machines , and under /tmp folder we have thousands of empty folders as
drwx------. 2 hive hadoop 6 Dec 20 09:00 a962c02e-4ed8-48a0-b4bb-79c76133c3ca_resources
an also a lot of folders as
drwxr-xr-x. 4 hive hadoop 4096 Dec 12 09:02 hadoop-unjar6426565859280369566
with content as
beeline-log4j.properties BeeLine.properties META-INF org sql-keywords.properties
/tmp should be purged every 10 days according to the configuration file:
more /usr/lib/tmpfiles.d/tmp.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
# See tmpfiles.d(5) for details
# Clear tmp directories separately, to make them easier to override
v /tmp 1777 root root 10d
v /var/tmp 1777 root root 30d
# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp
You have new mail in /var/spool/mail/root
So we decrease the retention to 1d instead of 10d in order to avoid this issue
Then indeed /tmp have only folders content of one day
But I want to ask the following questions
Is it ok to configure the retention about /tmp in Hadoop cluster to 1day ?
( I almost sure it ok , but want to hear more opinions )
Second
Why HIVE generate thousands of empty folders as XXXX_resources ,
and is it possible to solve it from HIVE service , instead to limit the retention on /tmp
It is quite normal having thousands of folders in /tmp as long as there is still free space available for normal run. Many processes are using /tmp, including Hive, Pig, etc. One day retention period of /tmp maybe too small, because normally Hive or other map-reduce tasks can run more than one day, though it depends on your tasks. HiveServer should remove temp files but when tasks fail or aborted, the files may remain, also it depend on Hive version. Better to configure some retention, because when there is no space left in /tmp, everything stops working.
Read also this Jira about HDFS scratch dir retention.

SPARK_HOME/work filling worker nodes disk

I'm running spark jobs on a standalone cluster (generated using spark-ec2 1.5.1) using crontab and my worker nodes are getting hammered by these app files that get created by each job.
java.io.IOException: Failed to create directory /root/spark/work/app-<app#>
I've looked at http://spark.apache.org/docs/latest/spark-standalone.html and changed my spark-env.sh (located in spark/conf on the master and worker nodes) to reflect the following:
SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=3600"
Am I doing something wrong? I've added the line to the end of each spark-env.sh file on the master and both workers.
On maybe a related note, what are these mounts pointing to? I would use them, but I don't want to use them blindly.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 8256952 8256952 0 100% /
tmpfs 3816808 0 3816808 0% /dev/shm
/dev/xvdb 433455904 1252884 410184716 1% /mnt
/dev/xvdf 433455904 203080 411234520 1% /mnt2
Seems like a 1.5.1 issue - I'm no longer using the spark-ec2 script to spin up the cluster. Ended up creating a cron job to clear out the directory as mentioned in my comment.

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- (on Windows)

I am running Spark on Windows 7. When I use Hive, I see the following error
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
The permissions are set as the following
C:\tmp>ls -la
total 20
drwxr-xr-x 1 ADMIN Administ 0 Dec 10 13:06 .
drwxr-xr-x 1 ADMIN Administ 28672 Dec 10 09:53 ..
drwxr-xr-x 2 ADMIN Administ 0 Dec 10 12:22 hive
I have set "full control" to all users from Windows->properties->security->Advanced.
But I still see the same error.
I have checked a bunch of links, some say this is a bug on Spark 1.5?
First of all, make sure you are using correct Winutils for your OS. Then next step is permissions.
On Windows, you need to run following command on cmd:
D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
Hope you have downloaded winutils already and set the HADOOP_HOME variable.
First thing first check your computer domain. Try
c:\work\hadoop-2.2\bin\winutils.exe ls c:/tmp/hive
If this command says access denied or FindFileOwnerAndPermission error (1789): The trust relationship between this workstation and the primary domain failed.
It means your computer domain controller is not reachable , possible reason could be you are not on same VPN as your system domain controller.Connect to VPN and try again.
Now try the solution provided by Viktor or Nishu.
You need to set this directory's permissions on HDFS, not your local filesystem. /tmp doesn't mean C:\tmp unless you set fs.defaultFs in core-site.xml to file://c:/, which is probably a bad idea.
Check it using
hdfs dfs -ls /tmp
Set it using
hdfs dfs -chmod 777 /tmp/hive
Next solution worked on Windows for me:
First, I defined HADOOP_HOME. It described in detail here
Next, I did like Nishu Tayal, but with one difference:C:\temp\hadoop\bin\winutils.exe chmod 777 \tmp\hive
\tmp\hive is not local directory
Error while starting the spark-shell on VM running on Windows:
Error msg: The root scratch dir: /tmp/hive on HDFS should be writable. Permission denied
Solution:
/tmp/hive is temporary directory. Only temporary files are kept in this
location. No problem even if we delete this directory, will be created when
required with proper permissions.
Step 1) In hdfs, Remove the /tmp/hive directory ==> "hdfs dfs -rm -r /tmp/hive"
2) At OS level too, delete the dir /tmp/hive ==> rm -rf /tmp/hive
After this, started the spark-shell and it worked fine..
This is a simple 4 step process:
For Spark 2.0+:
Download Hadoop for Windows / Winutils
Add this to your code (before SparkSession initialization):
if(getOS()=="windows"){
System.setProperty("hadoop.home.dir", "C:/Users//winutils-master/hadoop-2.7.1");
}
Add this to your spark-session (You can change it to C:/Temp instead of Desktop).
.config("hive.exec.scratchdir","C:/Users//Desktop/tmphive")
Open cmd.exe and run:
"path\to\hadoop-2.7.1\bin\winutils.exe" chmod 777 C:\Users\\Desktop\tmphive
The main reason is you started the spark at wrong directory. please create folders in D://tmp/hive (give full permissions) and start your spark in D: drive
D:> spark-shell
now it will work.. :)
Can please try giving 777 permission to the folder /tmp/hive because what I think is that spark runs as a anonymous user(which will come in other user category) and this permission should be recursive.
I had this same issue with 1.5.1 version of spark for hive, and it worked by giving 777 permission using below command on linux
chmod -r 777 /tmp/hive
There is a bug in Spark Jira for the same. This has been resolved few days back. Here is the link.
https://issues.apache.org/jira/browse/SPARK-10528
Comments have all options, but no guaranteed solution.
Issue resolved in spark version 2.0.2 (Nov 14 2016). Use this version .
Version 2.1.0 Dec 28 2016 release has same issues.
Use the latest version of "winutils.exe" and try. https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe
I also faced this issue. This issue is related to network. I installed spark on Windows 7 using particular domain.
Domain name can be checked
Start -> computer -> Right click -> Properties -> Computer name,
domain and workgroup settings -> click on change -> Computer Name
(Tab) -> Click on Change -> Domain name.
When I run spark-shell command, it works fine, without any error.
In other networks I received write permission error.
To avoid this error, run spark command on Domain specified in above path.
I was getting the same error "The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-" on Windows 7. Here is what I did to fix the issue:
I had installed Spark on C:\Program Files (x86)..., it was looking for /tmp/hive under C: i.e., C:\tmp\hive
I downloaded WinUtils.exe from https://github.com/steveloughran/winutils. I chose a version same as what I chose for hadoop package when I installed Spark. i.e., hadoop-2.7.1
(You can find the under the bin folder i.e., https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin)
Now used the following command to make the c:\tmp\hive folder writable
winutils.exe chmod 777 \tmp\hive
Note: With a previous version of winutils too, the chmod command was setting the required permission without error, but spark still complained that the /tmp/hive folder was not writable.
Using the correct version of winutils.exe did the trick for me. The winutils should be from the version of Hadoop that Spark has been pre built for.
Set HADOOP_HOME environment variable to the bin location of winutils.exe. I have stored winutils.exe along with C:\Spark\bin files. So now my SPARK_HOME and HADOOP_HOME point to the same location C:\Spark.
Now that winultils has been added to path, give permissions for hive folder using winutils.exe chmod 777 C:\tmp\hive
You don't have to fix the permission of /tmp/hive directory yourself (like some of the answers suggested). winutils can do that for you. Download the appropriate version of winutils from https://github.com/steveloughran/winutils and move it to spark's bin directory (e. x. C:\opt\spark\spark-2.2.0-bin-hadoop2.6\bin). That will fix it.
I was running spark test from IDEA, and in my case the issue was wrong winutils.exe version. I think you need to match it with you Hadoop version. You can find winutils.exe here
/*
Spark and hive on windows environment
Error: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
Pre-requisites: Have winutils.exe placed in c:\winutils\bin\
Resolve as follows:
*/
C:\user>c:\Winutils\bin\winutils.exe ls
FindFileOwnerAndPermission error (1789): The trust relationship between this workstation and the primary domain failed.
// Make sure you are connected to the domain controller, in my case I had to connect using VPN
C:\user>c:\Winutils\bin\winutils.exe ls c:\user\hive
drwx------ 1 BUILTIN\Administrators PANTAIHQ\Domain Users 0 Aug 30 2017 c:\user\hive
C:\user>c:\Winutils\bin\winutils.exe chmod 777 c:\user\hive
C:\user>c:\Winutils\bin\winutils.exe ls c:\user\hive
drwxrwxrwx 1 BUILTIN\Administrators PANTAIHQ\Domain Users 0 Aug 30 2017 c:\user\hive

Restarting Amazon EMR cluster

I'm using Amazon EMR (Hadoop2 / AMI version:3.3.1) and I would like to change the default configuration (for example replication factor). In order for the change to take effect I need to restart the cluster and that's where my problems start.
How to do it? The script I found at ./.versions/2.4.0/sbin/stop-dfs.sh doesn't work. The slaves file ./.versions/2.4.0/etc/hadoop/slaves is empty anyway. There are some scripts in init.d:
$ ls -l /etc/init.d/hadoop-*
-rwxr-xr-x 1 root root 477 Nov 8 02:19 /etc/init.d/hadoop-datanode
-rwxr-xr-x 1 root root 788 Nov 8 02:19 /etc/init.d/hadoop-httpfs
-rwxr-xr-x 1 root root 481 Nov 8 02:19 /etc/init.d/hadoop-jobtracker
-rwxr-xr-x 1 root root 477 Nov 8 02:19 /etc/init.d/hadoop-namenode
-rwxr-xr-x 1 root root 1632 Oct 27 21:12 /etc/init.d/hadoop-state-pusher-control
-rwxr-xr-x 1 root root 484 Nov 8 02:19 /etc/init.d/hadoop-tasktracker
but if I for example stop the namenode something will start it again immediately. I looked for documentation and Amazon provides a 600 pages user guide but it's more how to use the cluster and not that much about maintenance.
On EMR 3.x.x , it used traditional SysVInit scripts for managing services. ls /etc/init.d/ can tell you the list of such services. You can restart a service like so,
sudo service hadoop-namenode restart
But if I for example stop the namenode something will start it again
immediately.
However, EMR also has a process called service-nanny that monitors hadoop related services and ensure all of em' are always running. This is the mystery process that brings it back.
So, for truly restarting a service, you would need to stop the service-nanny for a while and then restart/stop the necessary processes. Once you bring back service nanny , it will again do its job. So, you might run commands like -
sudo service service-nanny stop
sudo service hadoop-namenode restart
sudo service service-nanny start
Note that this behavior is different in 4.x.x and 5.x.x AMI's where upstart is used to stop/start applications and service-nanny no longer brings back applications.

Hadoop HDFS - Cannot give +x permission to files

So, I used Cloudera's installation and management tool to get a 3 node cluster of servers up and running.
I have HDFS running and can see / create directories etc.
I went ahead and installed the Fuse plugin which allows me to mount the HDFS as a file system. Everything works fine. I can write files to the folders etc.
Problem:
when I run 'chmod 777 ./file.sh' in the mounted drive, it doesnt give any errors but when i do a 'ls -l' it only has:
'-rw-rw-rw- 1 root nobody 26 Oct 5 08:57 run.sh'
When I run 'sudo -u hdfs hadoop fs -chmod 777 /run.sh' it still has the same permissions. No matter what I do in any way I cannot get execute permission on any files.
I have disabled permissions in Cloudera manager, and also chown'd the folder (chmod -R 777 the folder also). But nothing seems to be working.
Any ideas?
Seems like adding: "umask=000" to the fstab mount line did the trick. (also added exec for good measure)
Thanks!

Resources