Script to search specific string with within a directory - bash

I am new to this site and shell scripting. I am still very much a novice and haven't had much success scripting because I am "attempting" to learn on my own. I was hoping one of you script guru's could get me on the right track. Here's the situation: I am a network engineer and often I need to find specific lines of code within 100's of files. For instance I might need to find out which devices are running specific code. Typically what I will do is the following which does exactly what I need it to do.
fgrep -w "" * | sort -t/ -k5 -n
I normally have to go to the directory where my configuration files are located and then pop whatever I am looking for in between the quotations to get my search result. What I would like to do is write a script that will ask me what I am searching for, then search the directory I am in, and then return the results. Any help would be greatly appreciated.
Many Thanks,
Diz

Add this to your .bashrc file, or whatever config file is loaded when you login:
mygrep() { fgrep -w "$1" * | sort -t/ -k5 -n; }
export -f mygrep
This sets up an alias that you can then use to search - use double quotes if you have a search string with spaces in:
$ mygrep SEARCH_PATTERN
$ mygrep "SEARCH WITH SPACES"

You can do by as follows
#!/bin/bash
read -p "Enter string you want to search?" str
find . -type f -exec grep "${str}" {} \;

Related

How can I iterate from a list of source files and locate those files on my disk drive? I'm using FD and RIPGREP

I have a very long list of files stored in a text file (missing-files.txt) that I want to locate on my drive. These files are scattered in different folders in my drive. I want to get whatever closest available that can be found.
missing-files.txt
wp-content/uploads/2019/07/apple.jpg
wp-content/uploads/2019/08/apricots.jpg
wp-content/uploads/2019/10/avocado.jpg
wp-content/uploads/2020/04/banana.jpg
wp-content/uploads/2020/07/blackberries.jpg
wp-content/uploads/2020/08/blackcurrant.jpg
wp-content/uploads/2021/06/blueberries.jpg
wp-content/uploads/2021/01/breadfruit.jpg
wp-content/uploads/2021/02/cantaloupe.jpg
wp-content/uploads/2021/03/carambola.jpg
....
Here's my working bash code:
while read p;
do
file="${p##*/}"
/usr/local/bin/fd "${file}" | /usr/local/bin/rg "${p}" | /usr/bin/head -n 1 >> collected-results.txt
done <missing-files.txt
What's happening in my bash code:
I iterate from my list of files
I use FD (https://github.com/sharkdp/fd) command to locate those files in my drive
I then piped it to RIPGREP (https://github.com/BurntSushi/ripgrep) to filter the results and find the closest match. The match I'm looking for should match the same file and folder structure. I only limit it to one result.
Then finally stored it on another text file where I can later then evaluate the lists for next step
Where I need help:
Is this the most effecient way to do this? I have over 2,000 files that I need to locate. I'm open to other solution, this is something I just divised.
For some reason my coded broke, It stopped returning results to "collected-results.txt". My guess is that it broke somewhere in the second pipe right after the FD command. I haven't setup any condition in case it encounters an error or it can't find the file so it's hard for me to determine.
Additional Information:
I'm using Mac, and running on Catalina
Clearly this is not my area of expertise
"Missing" sounds like they do not exist where expected.
What makes you think they would be somewhere else?
If they are, I'd put the filenames in a list.txt file with enough minimal pattern to pick them out of the output of find.
$: cat list.txt
/apple.jpg$
/apricots.jpg$
/avocado.jpg$
/banana.jpg$
/blackberries.jpg$
/blackcurrant.jpg$
/blueberries.jpg$
/breadfruit.jpg$
/cantaloupe.jpg$
/carambola.jpg$
Then search the whole machine, which is gonna take a bit...
$: find / | grep -f list.txt
/tmp/apricots.jpg
/tmp/blackberries.jpg
/tmp/breadfruit.jpg
/tmp/carambola.jpg
Or if you want those longer partial paths,
$: find / | grep -f missing-files.txt
That should show you the actual paths to wherever those files exist IF they do exist on the system.
From the way I understand it, you want to find all files what could match the directory structure:
path/to/file
So it should return something like "/full/path/to/file" and "/another/full/path/to/file"
Using a simple find command you can get a list of all files that match this criteria.
Using find you can search your hard disk in a single go with something of the form:
$ find -regex pattern
The idea is now to build pattern, which we can do from the file missing_files.txt. The pattern should look something like .*/\(file1\|file2\|...\|filen\). So we can use the following awk to do so:
$ sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt
So now we can do exactly what you did, but a bit quicker, in the following way:
pattern="$(sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt)"
pattern=".*/\($pattern\)"
find -regex "$pattern" > file_list.txt
In order to find the files, you can now do something like:
grep -F -f missing_files file_list.txt
This will return all the matching cases. If you just want the first case, i.e.
awk '(NR==FNR){a[$0]++;next}{for(i in a) if (!(i in b)) if ($0 ~ i) {print; b[i]}}' missing_files file_list.txt
Is this the most effecient way to do this?
I/O is mostly usually the biggest bottleneck. You are running some software fd to find the files for one file one at a time. Instead, run it to find all files at once - do single I/O for all files. In shell you would do:
find . -type f '(' -name "first name" -o -name "other name" -o .... ')'
How can I iterate from a list of source files and locate those files on my disk drive?
Use -path to match the full path. First build the arguments then call find.
findargs=()
# Read bashfaq/001
while IFS= read -r patt; do
# I think */ should match anything in front.
findargs+=(-o -path "*/$patt")
done < <(
# TODO: escape glob better, not tested
# see https://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13
sed 's/[?*[]/\\&/g' missing-files.txt
)
# remove leading -o
unset findargs[0]
find / -type f '(' "${findargs[#]}" ')'
Topics to research: var=() - bash arrays, < <(...) shell redirection with process substitution and when to use it (bashfaq/024), glob (and see man 7 glob) and man find.

Using find within a for loop to extract portion of file names as a variable (bash)

I have a number of files with a piece of useful information in their names that I want to extract as a variable and use in a subsequent step. The structure of the file names is samplename_usefulbit_junk. I'm attempting to loop through these files using a predictable portion of the file name (samplename), store the whole name in a variable, and use sed to extract the useful bit. It does not work.
samples="sample1 sample2 sample3"
for i in $samples; do
filename="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n')"
usefulbit="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n' | sed 's/.*samplename//g' | sed 's/junk.*//g')"
(More steps using $usefulbit or $(usefulbit) or ${usefulbit} or something)
done
find ./$FILE_DIR -maxdepth 1 -name 'sample1*' -printf '%f\n' and find ./$FILE_DIR -maxdepth 1 -name "sample1*" -printf '%f\n' both work, but no combination of parentheses, curly brackets, or single-, double-, or backquotes has got the loop to work. Where is this going wrong?
Try this:
for file in `ls *_*_*.*`
do
echo "Full file name is: $file"
predictable_portion_filename=${file_%%_*}
echo "predictable portion in the filename is: ${predictable_portion_filename}"
echo "---"
done
PS: $variable or ${variable} or "${variable}" or "$variable" are different than $(variable) as in the last case, $( ... ) makes a sub-shell and treats anything inside as a command i.e. $(variable) will make the sub-shell to execute a command named variable
In place of ls __., you can also use (to recursively find all files with that standard file name): ls -1R *_*_*.*
In place of using ${file%%_*} you can also use: echo ${file} | cut -d'_' -f1 to get the predictable value. You can use various other ways as well (awk, sed/ etc).
Excuse me, i can't do it with bash, may i show you another approach? Here is a shell (lua-shell) i am developing, and a demo as a solution for your case:
wws$ `ls ./demo/2
sample1_red_xx.png sample2_green_xx.png sample3_blue_xx.png
wws$ source ./demo/2.lua
sample1_red_xx.png: red
sample2_green_xx.png: green
sample3_blue_xx.png: blue
wws$
I really want to know your whole plan , unless you need bash as the only tool...
Er, i fogot to paste the script:
samples={"sample1", "sample2", "sample3"}
files = lfs.collect("./demo/2")
function get_filename(prefix)
for i, file in pairs(files) do
if string.match(file.name, prefix) then return file.name end
end
end
for i = 1, #samples do
local filename = get_filename(samples[i])
vim:set(filename)
:f_lvf_hy
print(filename ..": ".. vim:clipboard())
end
The 'get_filename()' seems a little verbose... i haven't finished the lfs component.
I'm not sure whether answering my own question with my final solution is proper stackoverflow etiquette, but this is what ultimately worked for me:
for i in directory/*.ext; do
myfile="$i"
name="$(echo $i | sed 's!.*/!!g' | sed 's/_junk*.ext//g')"
some other steps
done
This way I start with the file name already a variable (in a variable?) and don't have to struggle with find and its strong opinions. It also spares me from having to make a list of sample names.
The first sed removes the directory/ and the second removes the end of the file name and extension, leaving a variable $name that I use as a prefix when generating other files in subsequent steps. So much simpler!

Shell script: Count number of files in a particular type extension in single folder

I am new with shell script.
I need to save the number of files with particular extension(.properties) in a variable using shell script.
I have used
ls |grep .properties$ |wc -l
but this command prints the number of properties files in the folder. How can I assign this value in a variable.
I have tried
count=${ls |grep .properties$ |wc -l}
But it is showing error like:
./replicate.sh: line 57: ${ls |grep .properties$ |wc -l}: bad substitution
What is this type of errors?
Please anyone help me to save the number of particular files in a variable for future use.
You're using the wrong brackets, it should be $() (command output substitution) rather than ${} (variable substitution).
count=$(ls -1 | grep '\.properties$' | wc -l)
You'll also notice I've use ls -1 to force one file per line in case your ls doesn't do this automatically for pipelines, and changed the pattern to match the . correctly.
You can also bypass the grep totally if you use something like:
count=$(ls -1 *.properties 2>/dev/null | wc -l)
Just watch out for "evil" filenames like those with embedded newlines for example, though my ls seems to handle these fine by replacing the newline with a ? character - that's not necessarily a good idea for doing things with files but it works okay for counting them.
There are better tools to use if you have such beasts and you need the actual file name, but they're rare enough that you generally don't have to worry about it.
You could use a loop with globbing:
count=0
for i in *.properties; do
count=$((count+1))
done
If you are using a shell that supports arrays, you can simply capture all such file names
files=( *.properties )
and then determine the number of array elements
count=${#files[#]}
(The above assumes bash; other shells may require slightly different syntax.)
You'd better use find instead of parsing ls. Then, use the var=$(command) syntax to store the value.
var=$(find . -maxdepth 1 -name "*\.properties" | wc -l)
Reference: Why you shouldn't parse the output of ls.
To solve the problem appearing if any file name contains new lines, you can use what chepner suggests in the comments:
var=$(find . -maxdepth 1 -name "*\.properties" -exec 'echo 1' | wc -l)
so that for every match it will print not the name, but any random character (in this case, 1) and then the amount of them will be counted to produce the correct output.
Use:
count=`ls|grep .properties$ | wc -l`
echo $count
You could write your assignment like this:
count=$(ls -q | grep -c '\.properties$')
or
count=$(ls -qA | grep -c '\.properties$')
if you want to include hidden files.
This works with all kind of filenames because we're using ls with q.
Sure it's easier to link to some webpage that tells you to "never parse ls" than to read the ls manual and see there's a q option (and that most implementations default to q if the output is to a terminal device which explains why some people here state their ls seems to handle filenames with newlines just fine by replacing the newline with a ? character).

Bash Script - Copy latest version of a file in a directory recursively

Below, I am trying to find the latest version of a file that could be in multiple directories.
Example Directory:
~inventory/emails/2012/06/InventoryFeed-Activev2.csv 2012/06/05
~inventory/emails/2012/06/InventoryFeed-Activev1.csv 2012/06/03
~inventory/emails/2012/06/InventoryFeed-Activev.csv 2012/06/01
Heres the bash script:
#!/bin/bash
FILE = $(find ~/inventory/emails/ -name INVENTORYFEED-Active\*.csv | sort -n | tail -1)
#echo $FILE #For Testing
cp $FILE ~/inventory/Feed-active.csv;
The error I am getting is:
./inventory.sh: line 5: FILE: command not found
The script should copy the newest file as attempted above.
Two questions:
First, is this the best method to achive what I want?
Secondly, Whats wrong above?
It looks good, but you have spaces around the = sign. This won't work. Try:
#!/bin/bash
FILE=$(find ~/inventory/emails/ -name INVENTORYFEED-Active\*.csv | sort -n | tail -1)
#echo $FILE #For Testing
cp $FILE ~/inventory/Feed-active.csv;
... Whats wrong above?
Variable assignment. You are not supposed to put extra spaces around = sign. The following should work:
FILE=$(find ~/inventory/emails/ -name INVENTORYFEED-Active\*.csv | sort -n | tail -1)
... is this the best method to achive what I want?
Probably not. But the best way depends on many factors. Perhaps whoever writes those files, can put them in a right location in the first place. You can also check file modification time, but that could fail, too... So as long as it works for you, I'd say go for it :)

Using ssh remote plus grep

I'm running a shell script like above:
vQtde=`ssh user#server 'ls -lrt /mnta2/gvt/Interfaces/output/BI/sent/*.?${vDiaAnterior}* | grep "${vMDAtual}0[345678]:" |wc -l'`
And the return is on error: ksh: /usr/bin/sh: arg list too long
I know that the same script in local server return 9, how can I escape "" in remote grep ?
The variables are:
vDiaAtual=`date +%d`
vMesAtual=`date +%b`
vMDAtual=" $vMesAtual $vDiaAtual ";
vDiaAnterior=120614
The problem here is not with grep. The problem is following: the argument /mnta2/gvt/Interfaces/output/BI/sent/*.?${vDiaAnterior}* is expanded by shell (by ksh in the case) and the resulting list is too big.
It would be better to do simply ls -lrt /mnta2/gvt/Interfaces/output/BI/sent/ and then add additional grep after it.
Something like:
ls -lrt /mnta2/gvt/Interfaces/output/BI/sent/ | grep "\..${vDiaAnterior}" | grep ...
Based on information regarding that error message, I'm not sure if escaping the quotes is the real issue here.
What is it that you're ultimately trying to do? There's probably a slightly different way to approach it that avoids this problem. It appears that you're trying to count the number of files with a certain "last modified" date. Is this accurate? If so, I highly recommend against using the output of ls to do that. The output is inconsistent between platforms and can even change between versions. The find utility is much better suited for this sort of thing.
Try something like this instead:
dir=/mnta2/gvt/Interfaces/output/BI/sent/
pattern="*.?${vDiaAnterior}*"
time= # Fill this in based on the "last modified" time that you're looking for
find $dir -iname "$pattern" -mtime $time -exec printf '.' \; | wc -c
You can omit using the extra variables, they're only there to make the code more readable on the webpage.
This will search the given directory for all files with names that match the specified wildcard pattern and with "last modified" times that match whatever you specify. For each match found, the code printf '.' (which prints one dot to stdout) will be run. wc then counts the number of dot characters, which will be equal to the number of matching files found. The benefit of this method is that it minimizes the amount of data that needs to be piped between programs (including between the shell and ls). find handles the wildcard matching internally instead of requiring the shell to expand the wildcard and pass the result to ls. You're also only sending one character per matching file to wc instead of one long line of ls output per match. That should reduce the chances that you encounter the "arg list too long" error.
I resolved the problem with this ways:
- Create a file .sh in local server receiving a parameters:
#!/usr/local/bin/bash
vDiaAnterior="${1}";
vMDAtual="${2}";
ls -l /mnta2/gvt/Interfaces/output/BI/sent/*.?${vDiaAnterior}AMA | grep "${vMDAtual}[345678]:" | wc -l;
Call remote :
ssh user#server ". /mnta1/prod_med1/scriptsf/ver_jobs_3_horas.sh $vDiaAnterior '$vMDAtual'"
Result: 9 Files.
Best Regards,
Cauca

Resources