How can I use grep as cat - bash

I am creating a script that parses some files and greps out the necessary information.
I have set up many different variables in arrays that search for different aspects in the files.
e.g. dates, locations, types.
However I wanted to make each of these variables optional which is where I run into an issue.
the syntax of the command would have been simple
grep ${dates} filename | grep ${locations} | grep ${types}
However, due to variables being optional, the above won't work if a variable is unset.
I was trying to find a way to get grep to find anything (i.e. like egrep .* filename)
that way I could set the variables to the proper regex and have the command still run.
unfortunately, when I set the variable to equal '' it freezes, when I set it to '.' it just greps everything from every file in the current directory and when I leave the variable blank it takes the filename as the variable and waits for a filename.
is there anyway that I can set a variable so that grep $variable file outputs the same as cat would?
Many thanks in advance!

To get grep to act like cat use an empty string as a search pattern, i.e. grep "". Therefore to make some of those variables optional, but not have piped greps fail, just quote the variables:
grep "${dates}" filename | grep "${locations}" | grep "${types}"
Demonstration. Search {250,255,260...280} for the digits 5, 2, and 7:
x=5 y=2 z=7 ; seq 250 5 280 | grep "$x" | grep "$y" | grep "$z"
275
Now unset two of the variables, and it still works:
unset x y ; seq 250 5 280 | grep "$x" | grep "$y" | grep "$z"
270
275

If there aren't any $dates in filename then there is nothing to feed the rest of the pipeline.
I think the best way to do it is to grep for each string separately.
If you get a match, then grep for the next string.

Can you post the source file and a target output. From your question it sounds like you just need to use grep -E in conjunction with the | pipe delimiter.
grep -E "${dates}|${locations}|${types}" fileName
The above line should automatically get you every occurrence of the any of the patterns. This is not even a regex yet.

Related

bash cat exclude multiple files based on grep results

I have the following cat command that I use in a bash script. I look for $SAMPLE.txt file in subfolders 20* and combine them into 1 output.txt
cat /$FOLDER/20*/$SAMPLE.txt > /$OUTPUTFOLDER/output.txt
I now want to exclude certain files conditionally.
I found the following here https://unix.stackexchange.com/questions/246048/cat-files-except-one
$ shopt -s extglob
$ cat -- !(DISCARD).txt > catKEPT
I want to do something like this.
Look for $SAMPLE and a pattern '$PAT1' in a $SAMPLEFILE. This $SAMPLEFILE is comma seperated. If there is a match, I want to store the first field of this line & use it to exclude files from cat
I would use this command to look for $SAMPLE and $PAT1 & then cut to keep my first field. I would assign that to a variable 'EXLUDE_FOLDER'
EXCLUDE_FOLDER=grep '$SAMPLE' $SAMPLEFILE | grep '$PAT1' | cut -d "," -f 1
And then use it like this
cat /$FOLDER/20*/$SAMPLE.txt -- !($FOLDER/$EXLUDE_FOLDER/$SAMPLE.txt) > /$OUTPUTFOLDER/output.txt
I'm stuck at putting this into an if/statement and dealing with situations where grep results in multiple matches, so multiple files should be excluded
If SAMPLE and PAT are variables, you presumably want them expanded to their contents, which means you must put them in double quotes, not single quotes. Example:
SAMPLE=3
# Compare single quotes versus double
echo '$SAMPLE' # outputs $SAMPLE
echo "$SAMPLE" # outputs 3
If SAMPLEFILE is the name of a file, you must double-quote it, else it will fail if your filename has spaces in it, so you must use:
grep "$SAMPLE" "$SAMPLEFILE"
So, now you can test if your grep works like this:
grep "$SAMPLE" "$SAMPLEFILE" | grep "$PAT1" | cut -d "," -f 1
So, if that works, the next thing is that you want to capture the output of the command, so you need to use $(...). That means:
EXCLUDE_FOLDER=$(grep "$SAMPLE" "$SAMPLEFILE" | grep "$PAT1" | cut -d "," -f 1)
So, see test if that works now:
echo "$EXCLUDE_FOLDER"

Storing multiple grep parameters in a bash variable

I want to store multiple grep parameters in a bash variable, so I can define the parameters in the configuration section at the top of my file, and use it in multiple locations.
How do I need to define the variable and write the grep command?
Details
My first attempt
# CONFIG: grep parameters to further filter ...
GREP_PARAM="-E .*"
# ...
grep "^Stop " $1 | grep $GREP_PARAM | sed "..." >>$TFILE
results in
grep: ..: Is a directory
Using .\* or .\\* instead causes grep to not match anything, instead of everything.
Using grep "$GREP_PARAM" instead only works if GREP_PARAM contains a single parameter, but not otherwise; e.g. if it contains -v .*SAT.* or -v .\*SAT.\* or -v .\\*SAT.\\*, I get
grep: invalid option --
This is exactly what arrays were introduced to handle.
grep_options=(-E '.*')
grep "^Stop " "$1" | grep "${grep_options[#]}" | sed "..." >> "$TFILE"

Bash Command using wildcard as an argument

x=rankings* && grep -A10 -P '^Ranked :$' $x | tail -n +2 > results$x
This is a command that I can't get to work no matter the approach and I haven't been able to find anything within 10+ searches of stack overflow.
Is there a way to feed a wildcard through as an argument to a single line of commands? I want to make a list of files based on existing files in the directory.
The closest I have gotten is
x=rankingsX.Y.Z && grep -A10 -P '^Ranked :$' $x | tail -n +2 > results$x
where X Y Z are some numbers, but this hard-coding individually is the opposite of my objective - a single line command(not script file) that searches and outputs specific text into files using the original names.
A redirection only has one destination at a time; thus, an attempt to redirect to an expression which, when string-split and glob-expanded, results in more than one filename causes a "bad redirection" error.
What you want is x to have one value at a time, for each value the glob matches to -- which is to say that this is a job for a for loop.
for x in rankings*; do grep -A10 -P '^Ranked :$' "$x" | tail -n +2 >"results$x"; done
...which could also be written over multiple lines (even at an interactive shell), as in:
for x in rankings*; do
grep -A10 -P '^Ranked :$' "$x" | tail -n +2 >"results$x"
done

unusual chaining of "grep" in the shell

I stumbled upon a shell instruction which looks odd:
ls -a | grep ".qmail-" | grep -v "mail" | grep ".mail" > t ; echo $?
I suspect that the returned value would represent an error. Could anyone confirm this or explain in which circumstances this instruction would be applied ?
The first grep only allows through lines that contain qmail (preceded by any character and followed by a dash, but that is largely immaterial). The second grep strips out lines that contain mail, which means every line passed by the first grep is deleted by the second. There's nothing left for the third one to process, so the file t will always be empty. The value for $? should be 1, failure, since the third grep failed to find any lines that matched its pattern (because it got no lines to process).
It is a mistake.
It is hard to know how to fix it without knowing what it is trying to do.
The bash shell (and most other shells) let users use the output of one command as the input of another. This is accomplished with the | operator which is called a pipe. So the output of of ls -a is fed to grep ".qmail-" and so on. The > operator sends the output of the command to a file, in this case t. So ls -a | grep ".qmail-" | grep -v "mail" | grep ".mail" > t lists the contents of a directory and passes the output through successive filters before finally saving the output to the file t.
The semicolon signals the end of a command and allows multiple bash commands to be entered on a single line.
echo $? prints out the return value of the last executed command, in this case, ls -a | grep ".qmail-" | grep -v "mail" | grep ".mail" > t. By convention, any value besides 0 indicates some sort of error occurred.The Linux Documentation Project gives a list of some common exit codes.

Reading a file line by line in ksh

We use some package called Autosys and there are some specific commands of this package. I have a list of variables which i like to pass in one of the Autosys commands as variables one by one.
For example one such variable is var1, using this var1 i would like to launch a command something like this
autosys_showJobHistory.sh var1
Now when I launch the below written command, it gives me the desired output.
echo "var1" | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
But if i put the var1 in a file say Test.txt and launch the same command using cat, it gives me nothing. I have the impression that command autosys_showJobHistory.sh does not work in that case.
cat Test.txt | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
What I am doing wrong in the second command ?
Wrote all of below, and then noticed your grep statement.
Recall that ksh doesn't support .. as an indicator for 'expand this range of values'. (I assume that's your intent). It's also made ambiguous by your lack of quoting arguments to grep. If you were using syntax that the shell would convert, then you wouldn't really know what reg-exp is being sent to grep. Always better to quote argments, unless you know for sure that you need the unquoted values. Try rewriting as
grep '1[1-6]:[0-9][0-9]' | grep '24.12.2012'
Also, are you deliberately using the 'match any char' operator '.' OR do you want to only match a period char? If you want to only match a period, then you need to escape it like \..
Finally, if any of your files you're processing have been created on a windows machine and then transfered to Unix/Linux, very likely that the line endings (Ctrl-MCtrl-J) (\r\n) are causing you problems. Cleanup your PC based files (or anything that was sent via ftp) with dos2unix file [file2 ...].
If the above doesn't help, You'll have to "divide and conquer" to debug your problem.
When I did the following tests, I got the expected output
$ echo "var1" | while read line ; do print "line=${line}" ; done
line=var1
$ vi Test.txt
$ cat Test.txt
var1
$ cat Test.txt | while read line ; do print "line=${line}" ; done
line=var1
Unrelated to your question, but certain to cause comment is your use of the cat commnad in this context, which will bring you the UUOC award. That can be rewritten as
while read line ; do print "line=${line}" ; done < Test.txt
But to solve your problem, now turn on the shell debugging/trace options, either by changing the top line of the script (the shebang line) like
#!/bin/ksh -vx
Or by using a matched pair to track the status on just these lines, i.e.
set -vx
while read line; do
print -u2 -- "#dbg: Line=${line}XX"
autosys_showJobHistory.sh $line \
| grep 1[1..6]:[0..9][0..9] \
| grep 24.12.2012 \
| tail -1
done < Test.txt
set +vx
I've added an extra debug step, the print -u2 -- .... (u2=stderror, -- closes option processing for print)
Now you can make sure no extra space or tab chars are creeping in, by looking at that output.
They shouldn't matter, as you have left your $line unquoted. As part of your testing, I'd recommend quoting it like "${line}".
Then I'd comment out the tail and the grep lines. You want to see what step is causing this to break, right? So does the autosys_script by itself still produce the intermediate output you're expecting? Then does autosys + 1 grep produce out as expected, +2 greps, + tail? You should be able to easily see where you're loosing your output.
IHTH

Resources