limit number of results of find, head is not working [duplicate] - bash

This question already has answers here:
How do I limit the results of the command find in bash?
(2 answers)
Closed 5 years ago.
What can be the other thing besides head to limit my find command's result to only 50?
find -L $line | head 50 | -exec ls {} > $a$(basename $line) \;
The above thing is not working. I want to limit my output as it is taking long time to do task and decreasing performance of my computer. :<

How about use xargs instead of exec?
Check it, works good.
find / -type f | head -n5 | tr '\n' '\0' | xargs -0 ls
Your case:
find -L $line | head -n50 | tr '\n' '\0' | xargs -0 ls > $a$(basename $line)

To handle filenames with <newline>, it's best to use a null delimiter:
find -L "$line" -print0 | head -z -n 50 | xargs -0 ls > "$a$(basename "$line")"

Related

bash command inside a function to delete all files except the recent 5

I have a delete backup files function which takes in the arguments as a directory name and to backup the files of a specific directory and specific type of file like this delete_old_backup_files $(dirname $$abc) "$abc.*"
The function body is:
local fpath=$1
local fexpr=$2
# delete backup files older than a day
find $fpath -name "${fexpr##*/}" -mmin +1 -type f | xargs rm -f
Currently deleting files that are older than a day. Now I want to modify the function such that this function should delete all backup files of type $abc.*, except the last 5 backup files created. Tried various commands using stat or -printf but couldn't succeed.
What is the correct way of completing this function?
Assuming the filenames do not contain newline characters, would you please
try:
delete_old_backup_files() {
local fpath=$1
local fexpr=$2
find "$fpath" -type f -name "${fexpr##*/}" -printf "%T#\t%p\n" | sort -nr | tail -n +6 | cut -f2- | xargs rm -f --
}
-printf "%T#\t%p\n" prints the seconds since epoch (%T#) followed
by a tab character (\t) then the filename (%p) and a newline (\n).
sort -nr numerically sorts the lines in descending order (newer first,
older last).
tail -n +6 prints the 6th and following lines.
cut -f2- removes the prepended timestamp leaving the filename only.
[Edit]
In case of MacOS, please try instead (not tested):
find "$fpath" -type f -print0 | xargs -0 stat -f "%m%t%N" | sort -nr | tail -n +6 | cut -f2- | xargs rm --
In the stat command, %m is expanded to the modification time (seconds since epoch), %t is replaced with a tab, and %N to be a filename.
I would use sorting instead of find. You can use ls -t
$ touch a b c
$ sleep 3
$ touch d e f
ls -t | tr ' ' '\n' | tail -n +4
a
b
c
$ ls -t | tr ' ' '\n' | tail -n +4 | xargs rm
$ ls
d e f
From man ls:
-t sort by modification time, newest first
Make sure you create backups before you delete stuff :-)

How to display the tail of multiple files in bash

I am trying to monitor the progress of the newest five files and I want to see the last few lines of the each of them.
I was able to get the newest five files using the command:
ls *.log -lt | head -5
but I want to iterate through these five files and display the last 10 lines of each file. I was wondering if it can be done in a single bash command instead of a loop. But if it can't be done, I would appreciate a bash loop implementation too
tail can take multiple file names.
tail -n 10 $(ls -t *.log | head -5)
Add -F to monitor them continuously for changes.
If the file names might have spaces xargs will be more robust:
ls -t *.log | head -5 | xargs -d '\n' tail -n 10
Assuming the file names and path names do not contain special characters such as TAB or newline, how about:
while true; do
find . -type f -name "*.log" -printf "%T#\t%p\n" | sort -n | tail -5 | cut -f2 | xargs tail -10
sleep 1
done
ls *.log -1t | head -5 | while IFS= read -r file; do tail -10 "$file"; done

Count the number of files that modified in specif range of time [duplicate]

This question already has an answer here:
Getting all files which have changed in specific time range
(1 answer)
Closed 3 years ago.
I'm trying to build a one-line command in bash which counts the number of files of type *.lu in the current directory that were modified between 15:17 and 15:47 (date does not matter here). I'm not allowed to use find (otherwise it would been easy). I'm allowed to use basic commands like ls, grep, cut, wc and so on.
What I tried to do:
ls -la *.lu | grep <MISSING> | wc -l
First of all, I'll find all files *.lu, than I need to check with grep the date (which I'm not sure how to do) and than we need to count the number of lines. I think we need to insert also cut to get to the date and check it, but how? Also if current directory does not have *.lu files it will fail rather than returning 0.
How to solve it?
ls -l *.lu | grep -E '15:[2-3][0-9]|15:1[7-9]|15:4[0-7]' | wc -l
Should do it.
With awk:
ls -al *.lu | awk 'BEGIN{count=0} {if((substr($8,0,2) == "15") && (int(substr($8,4)) >=17 && int(substr($8,4)) <= 47 )){count++}} END{print count}'
UPDATE:
Without -E
ls -l *.lu | grep '15:[2-3][0-9]\|15:1[7-9]\|15:4[0-7]' | wc -l
Redirect error in case of zeros files:
ls -l *.lu 2> /dev/null | grep '15:[2-3][0-9]\|15:1[7-9]\|15:4[0-7]' | wc -l
This is pretty ugly and can probably be done better. However, I wanted to challenge myself to do this without regexes (excepting the sed one). I don't guarantee it'll handle all of your use cases (directories might be an issue).
ls -l --time-style="+%H%M" *.lu 2>/dev/null |
sed '/total/d' |
tr -s ' ' |
cut -d ' ' -f6 |
xargs -r -n 1 -I ARG bash -c '(( 10#ARG >= 1517 && 10#ARG <= 1547)) && echo ARG' |
wc -l
There is probably a way to avoid parsing ls via stat --format=%Y.

Largest file in the system and move it in unix

I am new to bash and im struggling with it. I have an assignment which the question is
Try to find that top 5 larger files in the entire file system ordered by size and move the file to /tmp folder and rename the file with current datetime format
I tried with the following code
du -a /sample/ | sort -n -r | head -n 5
Im getting the list, but i cannot able to move..
Suggestions please
Looks like a simple case of xargs:
du -a /sample/ | sort -n -r | head -n 5 | xargs -I{} mv {} /tmp
xargs here simply reads lines from standard input and appends them as arguments to the command, mv in this case. Because the -I{} is specified, the {} string is replaced for the argument by xargs. So mv {} /tmp is executed as mv <the first file> /tmp and mv <the second file> /tmp and so on. You can ex. add -t option to xargs or ex. add echo to see what's happenning: xargs -I{} -t echo mv {} /tmp.
Instead of running 5 processes, we could add /tmp on the end of the stream and run only one mv command:
{ du -a /sample/ | sort -n -r | head -n 5; echo /tmp; } | xargs mv
or like:
du -a . | sort -n -r | head -n 5 | { tee; echo /tmp; } | xargs mv
Note that using du -a will most probably not work with filenames with special characters, spaces, tabs and newlines. It will also include directories in it's output. If you want to filter the files only, move to much safer find:
find /sample/ -type f -printf '%s\t%p\n' | sort -n -r | cut -f2- | head -n5 | xargs -I{} mv {} /tmp
First we print each filename with it's size in bytes. Then we numerically sort the stream. Then we remove the size, ie. cut the stream on first '\t' tabulation. Then we get the head -n5 lines. Lastly, we copy with xargs. It will work for filenames not having special characters in filenames, like unreadable bytes, spaces, newlines and tabs.
For such corner cases it's preferred to use find and handle zero terminated strings, like this (note simply just -z and -0 options added):
find /sample/ -type f -printf '%s\t%p\0' | sort -z -n -r | cut -z -f2- | head -z -n5 | xargs -0 -I{} mv {} /tmp

Refining bash script with multiple find regex sed awk to array and functions that build a report

The following code is working, but it takes too long and everything I've tried to reduce it bombs either due to white spaces, inconsistent access.log syntax or something else.
Any suggestions to help cut down the finds to one find $LOGS -mtime -30 -type f - print0 and grep/sed/awk/sort once compared to multiple finds like this would be appreciated:
find $LOGS -mtime -30 -type f -print0 | xargs -0 grep -B 2 -w "RESULT err=0 tag=97" | grep -w "BIND" | sed '/uid=/!d;s//&\n/;s/.*\n//;:a;/,/bb;$!{n;ba};:b;s//\n&/;P;D' | sed 's/ //g' | sed s/$/,/g |awk '{a[$1]++}END{for(i in a)print i a[i]}' |sort -t , -k 2 -g > $OUTPUT1;
find $LOGS -mtime -30 -type f -print0 | xargs -0 grep -B 2 -w "RESULT err=0 tag=97" | grep -E 'BIND|LDAP connection from*' | sed '/from /!d;s//&\n/;s/.*\n//;:a;/:/bb;$!{n;ba};:b;s//\n&/;P;D' | sed 's/ //g' | sed s/$/,/g |awk '{a[$1]++}END{for(i in a)print i a[i]}' |sort -t , -k 2 -g > $IPAUTH0;
find $LOGS -mtime -30 -type f -print0 | xargs -0 grep -B 2 -w "RESULT err=49 tag=97" | grep -w "BIND" | sed '/uid=/!d;s//&\n/;s/.*\n//;:a;/,/bb;$!{n;ba};:b;s//\n&/;P;D' | sed 's/ //g' | sed s/$/,/g |awk '{a[$1]++}END{for(i in a)print i a[i]}' |sort -t , -k 2 -g > $OUTPUT2;
I've tried: for find | while read -r file; do grep1>output1 grep2>output2 grep3>output3 done and a few others, but cannot seem to get the syntax right and am hoping to cut down the repeats here.
The full script (stripped of some content) can be found here and runs against a Java program I wrote for an email report. NOTE: This runs against access logs in about 60GB of combined text.
I haven't looked closely at the sed/awk/etc section (and they'll be hard to work on without some example data), but you should be able to share the initial scans by greping for lines matching any of the patterns, storing that in a temp file, and then searching just that for the individual patterns. I'd also use find ... -exec instead of find ... | xargs:
tempfile=$(mktemp "${TMPDIR:-/tmp}/logextract.XXXXXX") || {
echo "Error creating temp file" >&2
exit 1
}
find $LOGS -mtime -30 -type f -exec grep -B 2 -Ew "RESULT err=(0|49) tag=97" {} + >"$tempfile"
grep -B 2 -w "RESULT err=0 tag=97" "$tempfile" | grep -w "BIND" | ...
grep -B 2 -w "RESULT err=0 tag=97" "$tempfile" | grep -E 'BIND|LDAP connection from*' | ...
grep -B 2 -w "RESULT err=49 tag=97" "$tempfile" | grep -w "BIND" | ...
rm "$tempfile"
BTW, you probably don't mean to search for LDAP connection from* -- the from* at the end means "fro" followed by 0 or more "m" characters.
A couple of general scripting recommendations: use lower- or mixed-case variables to avoid accidental conflicts with the various all-caps names that have special meanings. (Except when you want the special meaning, e.g. setting PATH.)
Also, putting double-quotes around variable references is generally a good idea to prevent unexpected word splitting and wildcard expansion... except that in some places your script depends on this, like setting LOGS="/log_dump/ldap/c*", and then counting on wildcard expansion happening when the variable is used. In these cases, it's usually better to use a bash array to store each item (e.g. filename) as a separate element:
logs=(/log_dump/ldap/c*) # Wildcard gets expanded when it's defined
...
find "${logs[#]}" -mtime ... # All that syntax gets all array elements in unmangled form
Note that this isn't really needed in cases like this where you know there aren't going to be any unexpected wildcards or spaces in the variable, but when you're dealing with unconstrained data this method is safer. (I work mostly on macOS, where spaces in filenames are just a fact of life, and I've learned the hard way to use script idioms that aren't confused by them.)

Resources