Find, unzip and grep the content of multiple files in one step/command - shell

First I made a question here: Unzip a file and then display it in the console in one step
It works and helped me a lot. (please read)
Now I have a second issue. I do not have a single zipped log file but I have a lot of them in defferent folders, which I need to find first. The files have the same names. For example:
/somedir/server1/log.gz
/somedir/server2/log.gz
/somedir/server3/log.gz
and so on...
What I need is a way to:
find all the files like: find /somedir/server* -type f -name log.gz
unzip the files like: gunzip -c log.gz
use grep on the content of the files
Important! The whole should be done in one step.
I cannot first store the extracted files in the filesystem because it is a readonly filesystem. I need somehow to connect, with pipes, the output from one command to the input of the next.
Before, the log files were in text format (.txt), therefore I had not to unzip them first. In this case it was easy:
ex.
find /somedir/server* -type f -name log.txt | xargs grep "term"
Now I have to deal with zipped files. That means, after I find the files, I need first somehow do unzip them and then send the contents to grep.
With one file I do:
gunzip -p /somedir/server1/log.gz | grep term
But for multiple files I don't know how to do it. For example how to pass the output of find to gunzip and the to grep?!
Also if there is another way / "best practise" how to do that, it is welcome :)

find lets you invoke a command on the files it finds:
find /somedir/server* -type f -name log.gz -exec gunzip -c '{}' + | grep ...
From the man page:
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of
invocations of the command will be much less than the number
of matched files. The command line is built in much the same
way that xargs builds its command lines. Only one instance of
{} is allowed within the command, and (when find is being
invoked from a shell) it should be quoted (for example, '{}')
to protect it from interpretation by shells. The command is
executed in the starting directory. If any invocation with
the + form returns a non-zero value as exit status, then
find returns a non-zero exit status. If find encounters an
error, this can sometimes cause an immediate exit, so some
pending commands may not be run at all. This variant of -exec
always returns true.

Related

Create text document of every file in a directory recursively

Part of a script I currently use is using "ls -FCRlhLoprt" to list every file inside of a root directory recursively to a text document. The problem is, every time I run the script, ls includes that document in its output so the text document grows each time I run it. I believe I can use -i or --ignore, but how can I use that when ls is using a few variables? I keep getting errors:
ls "$lsopt" "$masroot"/ >> "$masroot"/"$client"_"$jobnum"_"$mas"_drive_contents.txt . #this works
If I try:
ls -FCRlhLoprt --ignore=""$masroot"/"$client"_"$jobnum"_"$mas"_drive_contents.txt"" "$masroot"/ >> "$masroot"/"$client"_"$jobnum"_"$mas"_drive_contents.txt #this does not work
I get errors. I basically want to not include the output back into the 2nd time I run this command.
Additional, all I am trying to do is create an easy to read document of every file inside of a directory recursively. If there is a better way, please let me know.
Additional, all I am trying to do is create an easy to read document of every file inside of a directory recursively. If there is a better way, please let me know.
To list every file in a directory recursively, the find command does exactly what you want, and admits further programmatic manipulation of the files found if you wish.
Examples:
To list every file under the current directory, recursively:
find ./ -type f
To list files under /etc/ and /usr/share, showing their owners and permissions:
find /etc /usr/share -type f -printf "%-100p %#m %10u %10g\n"
To show line counts of all files recursively, but ignoring subdirectories of .git:
find ./ -type f ! -regex ".*\.git.*" -exec wc -l {} +
To search under $masroot but ignore files generated by past searches, and dump the results into a file:
find "$masroot" -type f ! -regex ".*/[a-zA-Z]+_[0-9]+_.+_drive_contents.txt" | tee "$masroot/${client}_${jobnum}_${mas}_drive_contents.txt"
(Some of that might be slightly different on a Mac. For more information see man find.)

Linux command to copy recently created/updated files?

I want to copy recently created/updated files to another folder. Say, for eg, the files which created in last 3 days should be copied to another folder(/tmp). how to do that? Is it possible.
You can use the find command's mtime argument to find files that were last modified by a certain time and then use it's exec argument to copy them somewhere.
For example, this command will find files modified within three days in your current directory and copy them to your /tmp directory:
find . -mtime -3 -type f -exec cp "{}" /tmp \;
-mtime n File's data was last modified n*24 hours ago. See the comments for -atime to understand how rounding affects the
interpretation of file modification times.
-exec command ; Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command
until an argument consisting of ';' is encountered. The string '{}' is
replaced by the current file name being processed everywhere it occurs
in the arguments to the command, not just in arguments where it is
alone, as in some versions of find. Both of these constructions might
need to be escaped (with a '\') or quoted to protect them from
expansion by the shell. See the EXAMPLES section for examples of the
use of the -exec option. The specified command is run once for each
matched file. The command is executed in the starting directory. There
are unavoidable security problems surrounding use of the -exec action;
you should use the -execdir option instead.

mv Bash Shell Command (on Mac) overwriting files even with a -i?

I am flattening a directory of nested folders/picture files down to a single folder. I want to move all of the nested files up to the root level.
There are 3,381 files (no directories included in the count). I calculate this number using these two commands and subtracting the directory count (the second command):
find ./ | wc -l
find ./ -type d | wc -l
To flatten, I use this command:
find ./ -mindepth 2 -exec mv -i -v '{}' . \;
Problem is that when I get a count after running the flatten command, my count is off by 46. After going through the list of files before and after (I have a backup), I found that the mv command is overwriting files sometimes even though I'm using -i.
Here's details from the log for one of these files being overwritten...
.//Vacation/CIMG1075.JPG -> ./CIMG1075.JPG
..more log
..more log
..more log
.//dog pics/CIMG1075.JPG -> ./CIMG1075.JPG
So I can see that it is overwriting. I thought -i was supposed to stop this. I also tried a -n and got the same number. Note, I do have about 150 duplicate filenames. Was going to manually rename after I flattened everything I could.
Is it a timing issue?
Is there a way to resolve?
NOTE: it is prompting me that some of the files are overwrites. On those prompts I just press Enter so as not to overwrite. In the case above, there is no prompt. It just overwrites.
Apparently the manual entry clearly states:
The -n and -v options are non-standard and their use in scripts is not recommended.
In other words, you should mimic the -n option yourself. To do that, just check if the file exists and act accordingly. In a shell script where the file is supplied as the first argument, this could be done as follows:
[ -f "${1##*/}" ]
The file, as first argument, contains directories which can be stripped using ##*/. Now simply execute the mv using ||, since we want to execute when the file doesn't exist.
[ -f "${1##*/}" ] || mv "$1" .
Using this, you can edit your find command as follows:
find ./ -mindepth 2 -exec bash -c '[ -f "${0##*/}" ] || mv "$0" .' '{}' \;
Note that we now use $0 because of the bash -c usage. It's first argument, $0, can't be the script name because we have no script. This means the argument order is shifted with respect to a usual shell script.
Why not check if file exists, prior move? Then you can leave the file where it is or you can rename it or do something else...
Test -f or, [] should do the trick?
I am on tablet and can not easyly include the source.

What does this command Actually do? Is it correct?

I found this command line when I was checking a Bash Script! My question is what does this command do and is this command correct?
find / -name "*.src" | xargs cp ~/Desktop/Log.txt
find the files or directory with .src extension in / and copy file ~/Desktop/Log.txt as the find result's filename
for example if the output of the find command is
file.src
directory1.src
file2.src
xargs command will execute cp ~/Desktop/Log.txt file.src directory1.src file2.src which does not make any sense.
What the command is
find / -name "*.src"
Explanation: Recursively find all regular files, directories, and symlinks in /, for which the filename ends in .src
|
Explanation: Redirect stdout from the command on the left side of the pipe to stdin of the command on the right side
xargs cp ~/Desktop/Log.txt
Explanation: Build a cp command, taking arguments from stdin and appending them into a space-delimited list at the end of the command. If the pre-defined buffer space of xargs (generally bounded by ARG_MAX) is exhausted multiple cp commands will be executed,
e.g. xargs cp arg100 .... arg900 could be processed as cp a100 ... a500; cp a501 ... a900
Note The behavior of your cp varies a lot depending on its arguments
If ~/Desktop/Log.txt is the only argument, cp will throw stderr
If the last argument to cp is a directory
All preceding arguments that are regular files will be copied into it
Nothing will happen for all preceding arguments that are directories, except stderr will be thrown
If the last argument is a regular file
If there are 2 total arguments to cp and the first one is a regular file. Then the contents of the second argument file will be overwritten by the contents of the first argument
If there are more than 2 total arguments, cp will throw a stderr
So all in all, there are too many variables here for the behavior of your command to really ever be precisely defined. As such, I suspect, you wanted to do something else.
What it probably should have been
My guess is the author of the command probably wanted to redirect stdout to a log file (note the log file will be overwritten each time you run the command)
find / -name "*.src" > ~/Desktop/Log.txt
Additionally, if you are just looking for regular files with the .src extension, you should also add the -type f option to find
find / -type f -name "*.src" > ~/Desktop/Log.txt
It finds every file or directory that match the pattern *.src from /.
Then copy the file ~/Desktop/Log.txt to every result of previous command.

Understand pipe and redirection command

I want to understand the real power of pipe and redirection command.As per my understanding,| takes the output of one command result as the input of itself. And, > is helps in output redirecting .If it is so,
find . -name "*.swp" | rm
find . -name "*.swp" > rm
why this command is not working as expected .For me above command means
Find the all files recursively whose extension is .swp in current directory .
take the output of 1. and remove all whose resulted files .
FYI,yes i know how to accomplish this task . it can be done by passing -exec flag .
find . -name "*.swp"-exec rm -rf {} \;
But as I already mentioned,i want to accomplish it with > or | command.
If i was wrong and going in wrong direction,please correct me and explain redirection and pipe command. Where we use whose commands ? please dont mention simple book examples i read all whose thing . try to explain me some complicated thing .
I'll break this down by the three methods you have shown:
> will redirect all output from find into a file named rm (will not work, because you're just appending to a file).
| will pipe output from find into the rm command (will not work, because rm does not read on stdin)
-exec rm -rf {} \; will run rm -rf on each item ({}) that find finds (will work, because it passes the files as argument to rm).
You will want to use -exec flag, or pipe into the xargs command (man xargs), not | or > in order to achieve the desired behavior.
EDIT: as #dmckee said, you can also use the $() operator for string interpolation, ie: rm -rf $(find . -name "*.swp") (this will fail if you have a large number of files, due to argument length limits).
> simply redirects to a file named rm.
Piping via | to rm doesn't work because rm doesn't expect filenames via STDIN.
So you have to use xargs, which passes values from STDIN as arguments:
find . -name "*.swp"|xargs rm
This is dangerous because the filename may contain characters your shell considers a field seperator ($IFS).
So, you use:
find . -name "*.swp" -print0|xargs -0 rm
Which causes find print the filenames \0 sperated to STDOUT and xargs to read the filenames \0 seperated and pass them as arguments to rm.
Of course, the easiest way to achieve this would have been:
rm **/*.swp
assuming you use bash.
You should take some time and read about the basics of shell redirection again :) I think this is a good document: http://wiki.bash-hackers.org/howto/redirection_tutorial
I'll try to explain what went wrong for you:
find . -name "*.swp" | rm
This command redirects the find results, i.e. the stdout of find, to the stdin of the program rm. However, rm does not read on stdin (this is something you can read in the documentation of rm). rm is controlled via command line arguments, not via stdin. I think there is no way to make rm read from stdin at all. That's why nothing is deleted.
find . -name "*.swp" > rm
This command redirects newline-delimited find results (stdout of find) to a file called 'rm'. Again, nothing is deleted :)
Basically the <, >, >>, &>, &>> operators perform redirection from/to a file that actually exists in the file system. The pipe | redirects the standard output from one command to the standard input of another command. Simply spoken there are no files involved here. However, this approach only makes sense if the program to the left of the pipe actually writes something to stdout and the program to the right of the pipe reads from stdin and both programs understand each other, i.e. the reading program (the consumer) understands the output of the feeding program.
Redirection creates a file. So your >rm example just creates a file named ./rm into which the output of your command is saved.
Pipes are essentially a shorthand. one | two is like one >tmp; two <tmp except without the (explicit) temporary file.
Of course, rm doesn't read file names from standard input, so cmd | rm is basically useless (apart from situations where the pipeline continues with yet another command which does something with the input which rm didn't read). If you want that, there's xargs.
find . -name "*.swp" | xargs rm

Resources