How to make this script grep only the 1st line - bash

for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' \
| xargs grep -A1 -l 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST'
done
Its ignoring the -A1. The end result is I just want it to show me files that contain any of matching words but only on the first line of the script. If there is a better more efficient less resource intensive way that would be great as well as this will be ran on very large shared servers.

Use awk instead:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' -exec \
awk 'FNR == 1 && /GLOBALS|preg_replace|array_diff_ukey|gzuncompress|gzinflate|post_var|sF=|qV=|_REQUEST/
{ print FILENAME }' {} +
done
This will print the current input file if the first line matches. It's not ideal, since it will read all of each file. If your version of awk supports it, you can use
awk '/GLOBALS|.../ { print FILENAME } {nextfile}'
The nextfile command will execute for the first line, effectively skipping the rest of the file after awk tests if it matches the regular expression.

The following code is untested:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' | while read -r; do
head -n1 "$REPLY" | grep -q 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST' \
&& echo "$REPLY"
done
done
The idea is to loop over each find result, explicitly test the first line, and print the filename if a match was found. I don't like it though because it feels so clunky.

for j in (find /home/$i/public_html/ -type f -iname '*.php');
do result=$(head -1l $j| grep $stuff );
[[ x$result |= x ]] && echo "$j: $result";
done
You'll need a little more effort to skip leasing blank lines. Fgrep will save resources.
A little perl would bring great improvement, but it's hard to type it on a phone.
Edit:
On a less cramped keyboard, inserted less brief solution.

Related

Chain grep commands to search for a pattern inside files that match another pattern

How can I chain multiple grep commands?
For example, if I want to search recursively for all PHP files that are publicly accessible, i.e those which contain $_user_location = 'public; and search for "SendQueue() inside all these files, what should I do?
Few of my failed attempts :
grep -rnw ./* -e "^.*user_location.*public" *.php | grep -i "^.*SendQueue().*" --color
grep -rnw ./* -e "^.*user_location.*public" *.php | xargs -0 -i "^.*SendQueue().*" --color
Print grep results with filename, extract filenames and pass those filenames to second grep.
grep -H ..... | cut -d: -f1 | xargs -d'\n' grep ....
Works as long as there are no : in filenames and usually there are none.
You could always do a plain old loop:
for i in *.php; do
if grep -q .... "$i"; then
grep .... "$i"
fi
done
Using awk:
$ awk '
/SendQueue\(\)/ { # hash all SendQueue() containing records
a[++i]=$0
}
/.*user_location.*public/ { # if condition met, flag up
f=1
}
END {
if(f) # if flag up
for(j=1;j<=i;j++) # output all hashed records
print a[j]
}' file
Testfile:
$_user_location = 'public;
SendQueue()
In the lack of sample output you only get:
SendQueue()
For multiple files:
$ for f in *.php ; do awk ... $f ; done
If you add -l option to your first grep, you'll get all the file names which you can feed to you second grep, like :
grep -i "^.*SendQueue().*" --color $(grep -l ...)
assuming you don't have special characters in file names.
Some alternative, could be quicker...
1. Using sed
sed -s '/\(SendQueue()\|_user_location = \o47public\)/H;${ x;s/\n/ /g;/SendQueue.*_user_location\|_user_location.*SendQueue/F;};d' *.php
Could by write:
sed -s '
/\(SendQueue()\|_user_location = \o47public\)/H;
${
x;
s/\n/ /g;
/SendQueue.*_user_location\|_user_location.*SendQueue/F;
};
d' *.php
Or with find:
find /path -type f -name '*.php' -exec sed -s '
/\(SendQueue()\|_user_location = \o47public\)/H;
${
x;
s/\n/ /g;
/SendQueue.*_user_location\|_user_location.*SendQueue/F;
};
d' {} +
2. Using grep
But reading each file only 1 time
grep -c "\(SendQueue()\|_user_location = 'public\)" *.php | grep :2$
or
grep -c "\(SendQueue()\|_user_location = 'public\)" *.txt | sed -ne 's/:2$//p'
Then
find /path -type f -name '*.php' -exec grep -c \
"\(SendQueue()\|_user_location = 'public\)" {} + |
sed -ne 's/:2$//p'
Of course, this work only if you're sure all sentence could be present only once.
Remark
To ensure no commented line will polute result, you could replace regex by
"^[^#/]*\(SendQueue()\|_user_location = 'public\)"
In all submited alternatives
I can mention two ways of doing this:
POSIX way
You can use find(1) in order to do recursive search. find is defined by POSIX and is most likely included in your system.
find . -type f -name '*.php' -exec grep -q "\$_user_location.*=.*'public" {} \; -exec grep 'SendQueue()' {} +
Here is the explanation for what this command does:
-type f Look for files
-name '*.php With the suffix .php
-exec grep -q ... {} \; Run the first grep sequence individually.
-exec grep {} + Run the second grep sequence on the files that were matched previously.
Ripgrep way
ripgrep is a really fast recursive grep tool. This will take much less search time, but you will need to obtain it separately.
rg --glob '*.php' -l "\$_user_location.*=.*'public" | xargs rg 'SendQueue\(\)'
Here is the explanation for what this command does:
--glob '*.php' Only looks inside files with the suffix .php
-l Only lists files that match
We enter the first query and pipe all the matching files to xargs
xargs runs rg with the second query and adds the received files as arguments so that ripgrep only searches those files.
Which one to use
ripgrep really shines on huge directories, but it really isn't necessary otherwise for what you are asking. Picking find is enough for most cases. The time you will spend obtaining ripgrep will probably be more than the time you will save by using it for this specific operation. ripgrep is a really nice tool regardless.
EDIT:
The find command has 2 -exec options:
-exec grep (...) {} \; This calls the grep command for each file match. This will run the following:
grep (query) file1.php
grep (query) file2.php
grep (query) file3.php
find tracks the command result for each file, and passes them to the next test if they succeed.
-exec grep (...) {} + This calls the command with all the files attached as arguments. This will expand as:
grep (query) file1.php file2.php file3.php

Select parent directory if non-unique directory is found

Hello I am trying to figure out how I can parse directories using built-in bash functionality.
The directory structure would look something like.
/home/mikal/PluginSDK/vendor_name1/ver1/plugin_name/plugin-config.json
/home/mikal/PluginSDK/vendor_name1/ver2/plugin_name/plugin-config.json
/home/mikal/PluginSDK/vendor_name2/ver1/plugin_name/plugin-config.json
/home/mikal/PluginSDK/vendor_name3/plugin_name/plugin-config.json
So far I have narrowed down to the name of the plugin which covers most of what I needed for the rest of the script.
find /home/mikal/PluginSDK -type f -name plugin-config.json | sed -r 's|/[^/]+$||' | awk -F "/" '{print $NF}'
The problem that I am running into is when the same vendor has different versions of plugin available for the same release. We may not always want to run a newer version of the plugin due to compatibility or performance of the plugin so having these show something like ver1-plugin_name or similar would be preferrable. I can't find anything that would be able to pick out the non-unique plugin/version so that I can make an array with all of the options.
This is the entirety of what I have written right now for this section of the script I am writing to make configuration changes to the system.
options=()
while IFS= read -r line; do
options+=( "$line" )
done < <( find /home/mikal/PluginSDK -type f -name plugin-config.json | sed -r 's|/[^/]+$||' | awk -F "/" '{print $NF}' )
select opt_number in "${options[#]}" "Quit";
do
if [[ $opt_number == "Quit" ]];
then
echo "Quitting"
break;
else
find /home/mikal/PluginSDK -type f -name plugin-config.json -exec sh -c "sed -i 's/"preferred": true/"preferred": false/g'" {} \;
find /home/mikal/PluginSDK/${options[$(($REPLY-1))]} -type f -name plugin-config.json -exec sh -c "sed -i 's/"preferred": false/"preferred": true/g'" {} \;
break;
fi
done
Desired output for the entire thing would be something like.
1.) Ver1-Plugin_name
2.) Ver2-Plugin_name
3.) Plugin_name
4.) Plugin_name
5.) Quit
I apologize if my formatting is bad. First time posting.
Maybe
lst=( Quit
$( find /home/mikal/PluginSDK -type f -name plugin-config.json |
awk -F/ '{ if (7==NF) { print $6 } else { print $6"-"$7 } }' )
select opt_number in "${lst[#]}"
. . .
You might want to c.f. BashFAQ 20 if your filenames could have any weirdness like embedded spaces.

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
EDIT:
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
done
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
done
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

How to use find utility with logical operators and post processing

Is there way to find all directories that have executable file that matches a partial name of parent directory?
Situation
/distribution/software_a_v1.0.0/software_a
/distribution/software_a_v1.0.1/software_a
/distribution/software_a_v1.0.2/config.cfg
I need result
/distribution/software_a_v1.0.0/software_a
/distribution/software_a_v1.0.1/software_a
I've gotten only so far
find /distribution -maxdepth 1 -type d #and at depth 2 -type f -perm /u=x and binary name matches directory name, minus version
Another way using awk:
find /path -type f -perm -u=x -print | awk -F/ '{ rec=$0; sub(/_v[0-9].*$/,"",$(NF-1)); if( $NF == $(NF-1) ) print rec }'
The awk part is based on your sample and stated condition ... name matches directory name, minus version. Modify it if needed.
I would use grep:
find /distribution -maxdepth 1 -type d | grep "/distribution/software_\w_v\d*?\.\d*?\.\d*?/software_\w"
I don't know if this is the most efficient, but here's one way you could do it, using just bash...
for f in /distribution/*/*
do
if [[ -f "${f}" && -x "${f}" ]] # it's a file and executable
then
b="${f##*/} # get just the filename
[[ "${f}" =~ "/distribution/${b}*/${b}" ]] && echo "${f}"
fi
done

Bash script to limit a directory size by deleting files accessed last

I had previously used a simple find command to delete tar files not accessed in the last x days (in this example, 3 days):
find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;
I now need to improve this script by deleting in order of access date and my bash writing skills are a bit rusty. Here's what I need it to do:
check the size of a directory /PATH/TO/FILES
if size in 1) is greater than X size, get a list of the files by access date
delete files in order until size is less than X
The benefit here is for cache and backup directories, I will only delete what I need to to keep it within a limit, whereas the simplified method might go over size limit if one day is particularly large. I'm guessing I need to use stat and a bash for loop?
I improved brunner314's example and fixed the problems in it.
Here is a working script I'm using:
#!/bin/bash
DELETEDIR="$1"
MAXSIZE="$2" # in MB
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
echo "usage: $0 [directory] [maxsize in megabytes]" >&2
exit 1
fi
find "$DELETEDIR" -type f -printf "%T#::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
BEGIN { curSize=0; }
{
curSize += $3;
if (curSize > maxbytes) { print $2; }
}
' \
| tac | awk '{printf "%s\0",$0}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
Here's a simple, easy to read and understand method I came up with to do this:
DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print $1}')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
then
for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
FILESIZE=`stat -c "%s" $f`
FILESIZE=$(($FILESIZE/1024))
DIRSIZE=$(($DIRSIZE - $FILESIZE))
if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
break
fi
done
fi
I didn't need to use loops, just some careful application of stat and awk. Details and explanation below, first the code:
find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
BEGIN{curSize=0; FS="::"}
{curSize += $2}
curSize > $X_SIZE{print $3}
'
| sed 's/ /\\ /g' \
| xargs rm
Note that this is one logical command line, but for the sake of sanity I split it up.
It starts with a find command based on the one above, without the parts that limit it to files older than 3 days. It pipes that to sed, to escape any spaces in the file names find returns, then uses xargs to run stat on all the results. The -f "%a::%z::%N" tells stat the format to use, with the time of last access in the first field, the size of the file in the second, and the name of the file in the third. I used '::' to separate the fields because it is easier to deal with spaces in the file names that way. Sort then sorts them on the first field, with -r to reverse the ordering.
Now we have a list of all the files we are interested in, in order from latest accessed to earliest accessed. Then the awk script adds up all the sizes as it goes through the list, and begins outputting them when it gets over $X_SIZE. The files that are not output this way will be the ones kept, the other file names go to sed again to escape any spaces and then to xargs, which runs rm them.

Resources