Suppressing warnings for hadoop fs -get -p command - shell

I am copying huge number of files using hadoop fs -get -p command. I want to retain (timestamps, ownerships) Many of the files are not able to retain the permissions
as the userid are not available in the local machine. So for these files I am getting "get: chown: changing ownership /a/b/c.txt Operation not permitted)
Is it possible to suppress the error, because it might be possible that I might get other issues as well. If I do 2>/dev/null, this will suppress all the issues
So I don't want to use this option. Is there any way I can suppress ONLY issues related to Privileges.?
Any hint can be really helpful?

Not very elegant, but functionnal, use grep -v your_undesired_pattern
hadoop fs -get -p command 2>&1 | grep -v "changing ownership"

From the Hadoop side, no. The error is printed using System.err.println and is coming from the OS as the command execs chown.

Related

Bash Script Cant Write To Log Files

I've created a simple bash script that grabs some data and then outputs it to a log file. When I run the script without sudo it fails to write to the logs and says they are write-protected. It then ask me if it should unwrite-protect them, but this fails (permission denied).
If I run the script as sudo it appears to work without issue. How can I set these log file to be available to the script?
cd /home/pi/scripts/powermonitor/
python /home/pi/powermonitor/plugpower.py > plug.log
echo -e "$(sed '1d' /home/pi/scripts/powermonitor/plug.log)\n" > plug.log
sed 's/^.\{139\}//' plug.log > plug1.log
rm plug.log
grep -o -E '[0-9]+' plug1.log > plug.log
rm plug1.log
sed -n '1p' plug.log > plug1.log
rm plug.log
perl -pe '
I was being dumb. I just needed to set the write permissions on the log files.
The ability to write a file depends on the file permissions that have been assigned to that file or, if the file does not exist but you want to create a new file, then the permissions on the directory in which you want to write the file. If you use sudo, then you are temporarily becoming the root user, and the root user can read/write/execute any file at all without restriction.
If you run your script first using sudo and the script ends up creating a file, that file is probably going to be owned by the root user and will not be writable by your typical user. If you run your script without using sudo, then it's going to run under the username you used to connect to the machine and that user will need to have permission to write the log files.
You can change the ownership and permissions of directories and files by using the chown, chmod, chgrp commands. If you want to always run your script as sudo, then you don't have much to worry about. If you want to run these commands without sudo, that means you're running them as some other user and you will need to grant write permission to that user, whoever it is, in order to write the files/folders where the log files get written.
For instance, if I wanted to run the script as user sneakyimp and wanted the files written to /home/sneakyimp/logs/ then I'd need to make sure that directory was writable by sneakyimp:
sudo chown -R sneakyimp:sneakyimp /home/sneakyimp/logs
This command changes ownership of that directory and its contents to the user sneakyimp. You might also need to run some chmod commands to make sure they are writable by owner.

Rights of complex command (pipe)

I have a minor complex command using a pipe
python3 wlan.py -p taken | awk '{$10 = sprintf( "%.1f", $10 / 60); print $4 $6 $8 $10 ",min"}' | awk '{gsub(/,/," ");print}' >> /tmp/missed.log
and I get a permission error if this command is executed from a program but not from the command line (sudo). So, obviously there is an issue with the rights of the program. I have set the rights of python and awk to 777 to no avail. But the main question is: What are the rights of the >> command and how can I change them?
the error message is "writing missed.log - permission denied".
File access in a Unix-like environment is tied to who you are, not what programs you run.* When you run sudo python3 ..., you are changing who you are to a more privileged user for the duration of the python3 command. Once Python stops running, you are back to your normal self. Imagine that sudo is Clark Kent taking off his glasses and putting on his cape. Once the badguys have been defeated, Superman goes back to an ordinary Joe.
Your error message indicates your normal user account does not have the necessary permissions to access / and /tmp, and to write /tmp/missed.log. The permissions on wlan.py and /usr/bin/python3 aren't the issue here. I can think of four options (best to worst):
Put the output file somewhere other than in /tmp. You should always be able to write your home directory, so you should be able to run without sudo, with > ~/missed.log instead of > /tmp/missed.log.
When you run your pipeline "from a program," as you said, just include the sudo as if you were running it from the command line. That way you get consistent results.
Add yourself to the group owning /tmp. Do stat -c '%G' /tmp. That will tell you which group owns /tmp. Then, if that group is not root, do usermod -a -G <that group name> <your username>.
Change the permissions on /tmp. This is the bludgeon: possible, but not recommended. sudo rm -f /tmp/missed.log and sudo chmod o+rwx /tmp should make it work, but may open other vulnerabilities you don't want.
* Ignoring setuid, which doesn't seem to be the case here.

File not found exception while starting Flume agent

I have installed Flume for the first time. I am using hadoop-1.2.1 and flume 1.6.0
I tried setting up a flume agent by following this guide.
I executed this command : $ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
It says log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: ./logs/flume.log (No such file or directory)
Isn't the flume.log file generated automatically? If not, how can I rectify this error ?
Try this:
mkdir ./logs
sudo chown `whoami` ./logs
bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
The first line creates the logs directory in the current directory if it does not already exist. The second one sets the owner of that directory to the current user (you) so that flume-ng running as your user can write to it.
Finally, please note that this is not the recommended way to run Flume, just a quick hack to try it.
You are getting this error probably because you are running command directly on console, you've to first go to the bin in flume and try running your command there over console.
As #Botond says, you need to set the right permissions.
However, if you run Flume within a program, like supervisor or with a custom script, you might want to change the default path, as it's relative to the launcher.
This path is defined in your /path/to/apache-flume-1.6.0-bin/conf/log4j.properties. There you can change the line
flume.log.dir=./logs
to use an absolute path that you would like to use - you still need the right permissions, though.

How to view FsImage/Edit Logs file in hadoop

I'm Beginner in Hadoop. I wanted to view fs-image and Edit logs in hadoop. I have searched it in many blogs, nothing is clear. Please can any one tell me step by step procedure to view the Edit log/fs-image file in hadoop.
My version: Apache Hadoop: Hadoop-1.2.1
My Installed director is ![/home/students/hadoop-1.2.1]
I'm listing steps what i have tried based on some blogs.
Ex.1. $ hdfs dfsadmin -fetchImage /tmp
Ex.2. hdfs oiv -i /tmp/fsimage_0000000000000001386 -o /tmp/fsimage.txt
Nothing works for me.
It shows that hdfs is not a directory or a file.
For edit log, navigate to
/var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current
then;
ls -l
to view the complete name of the log file you want to extract; after then
hdfs oev -i editFileName -o /home/youraccount/Desktop/edits_your.xml -p XML
For the fsimage;
hdfs oiv -i fsimage -o /home/youraccount/Desktop/fsimage_your.xml
Go to the bin directory and try to execute the same commands

How to fix corrupt HDFS FIles

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.
Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.
When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.
You can use
hdfs fsck /
to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really
verbose especially on a large HDFS filesystem so I normally get down to
the meaningful output with
hdfs fsck / | egrep -v '^\.+$' | grep -v eplica
which ignores lines with nothing but dots and lines talking about replication.
Once you find a file that is corrupt
hdfs fsck /path/to/corrupt/file -locations -blocks -files
Use that output to determine where blocks might live. If the file is
larger than your block size it might have multiple blocks.
You can use the reported block numbers to go around to the
datanodes and the namenode logs searching for the machine or machines
on which the blocks lived. Try looking for filesystem errors
on those machines. Missing mount points, datanode not running,
file system reformatted/reprovisioned. If you can find a problem
in that way and bring the block back online that file will be healthy
again.
Lather rinse and repeat until all files are healthy or you exhaust
all alternatives looking for the blocks.
Once you determine what happened and you cannot recover any more blocks,
just use the
hdfs fs -rm /path/to/file/with/permanently/missing/blocks
command to get your HDFS filesystem back to healthy so you can start
tracking new errors as they occur.
If you just want to get your HDFS back to normal state and don't worry much about the data, then
This will list the corrupt HDFS blocks:
hdfs fsck -list-corruptfileblocks
This will delete the corrupted HDFS blocks:
hdfs fsck / -delete
Note that, you might have to use sudo -u hdfs if you are not the sudo user (assuming "hdfs" is name of the sudo user)
the solution here worked for me : https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html
su - <$hdfs_user>
bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
start all daemons and run the command as "hadoop namenode -recover -force" stop the daemons and start again.. wait some time to recover data.

Resources