Selecting Update queries alone from list of files using shell script - shell

I am trying to get Update queries from a list of files using this script.I need to take lines containing "Update" alone and not "Updated" or "UpdateSQL"As we know all update queries contain set I am using that as well.But I need to remove cases like Updated and UpdatedSQL can anyone help?
nawk -v file="$TEST" 'BEGIN{RS=";"}
/[Uu][Pp][Dd][Aa][Tt][Ee] .*[sS][eE][tT]/{ gsub(/.*UPDATE/,"UPDATE");gsub(/.*Update/,"Update");gsub(/.*update/,"update");gsub(/\n+/,"");print file,"#",$0;}
' "$TEST" >> $OUT

This seems more readable to me without all the [Uu] and it doesn't require grep:
{ line=tolower($0); if (line ~ /update .*set/ && line !~ /updated|updatesql/) { gsub ...

you can try using grep first, (and i assume you are on Solaris.)
grep -i "update.*set" "$TEST" | egrep -vi "updatesql|updated" | nawk .....

Related

How can I generate multiple counts from a file without re-reading it multiple times?

I have large files of HTTP access logs and I'm trying to generate hourly counts for a specific query string. Obviously, the correct solution is to dump everything into splunk or graylog or something, but I can't set all that up at the moment for this one-time deal.
The quick-and-dirty is:
for hour in 0{0..9} {10..23}
do
grep $QUERY $FILE | egrep -c "^\S* $hour:"
# or, alternately
# egrep -c "^\S* $hour:.*$QUERY" $FILE
# not sure which one's better
done
But these files average 15-20M lines, and I really don't want to parse through each file 24 times. It would be far more efficient to parse the file and count each instance of $hour in one go. Is there any way to accomplish this?
You can ask grep to output the matching part of each line with -o and then use uniq -c to count the results:
grep "$QUERY" "$FILE" | grep -o "^\S* [0-2][0-9]:" | sed 's/^\S* //' | uniq -c
The sed command is there to keep only the two digit hour and the colon, which you can also remove with another sed expression if you want.
Caveats: this solution works with GNU grep and GNU sed, and will produce no output, rather than "0", for hours with no log entries. Kudos to #EdMorton for pointing these issues out in the comments, and other issues that were fixed in the answer above.
Assuming the timestamp appears with a space before the 2-digit hour, then a colon after
gawk -v patt="$QUERY" '
$0 ~ patt && match($0, / ([0-9][0-9]):/, m) {
print > (m[1] "." FILENAME)
}
' "$FILE"
This will create 24 files.
Requires GNU awk for the 3-arg form of match()
This is probably what you really need, using GNU awk for the 3rd arg to match() and making assumptions about what your input might look like, what your QUERY variable might contain, and what the output should look like:
awk -v query="$QUERY" '
match($0, " ([0-9][0-9]):.*"query, a) { cnt[a[1]+0]++ }
END {
for (hr=0; hr<=23; hr++) {
printf "%02d = %d\n", hr, cnt[hr]
}
}
' "$FILE"
Don't really use all upper case for non-exported shell variables btw - see Correct Bash and shell script variable capitalization.

How to find files containing a string N times or more often using egrep

I have a folder with about 400-500 SQL-files and need the names of
only those who contain the string CREATE TABLE 3 times or more often.
While the command
$ egrep -rl "(CREATE TABLE)" ./*.sql
prints me of course all file-names, the command
$ egrep -rl "(CREATE TABLE.*){3}" ./*.sql
does not print any at all ...
Flags:
-R – recursive
-L – files-with-matches | print only names of FILEs containing matches
Your command
egrep -rl "(CREATE TABLE.*){3}" ./*.sql
looks for 3 CREATE TABLE's on one line.
When they are on different lines, you need to do something different,
and when you have GNU grep, you are lucky: It has the option -z.
# minimal change of your command
egrep -zrl "(CREATE TABLE.*){3}" ./*.sql
# moving option E to the options as suggested by #anubhava
grep -zErl "(CREATE TABLE.*){3}" ./*.sql
This awk will do the job:
awk 'FNR==1{n=0} /CREATE TABLE/{++n} n>2{print FILENAME; nextfile}' *.sql
Could you please try following. I am taking care of number of opened files in backend too here.
awk 'prev!=FILENAME{n=""}/CREATE TABLE/{++n} n>2{print FILENAME;prev=FILENAME;nextfile}' *.sql
Assuming the possibility of having multiple strings per line, (only covered by the answer of Walter A), here is it's awk version (one that supports nextfile)
awk '(FNR==1){n=0}
{n+=split($0,a,/CREATE TABLE/)-1}
(n>2) {print FILENAME; nextfile}' */.sql
If you don't have GNU grep (according to Walter A's solution) and neither you have an awk with nextfile, the following solutions can be used (POSIX):
awk '(FNR==1){n=0; p=1}
p {n+=split($0,a,/CREATE TABLE/)-1}
(n>2) && p {print FILENAME; p=0}' */.sql
The difference between the two solutions are:
Solution 1 will not process the full file as it will create an early termination per file if the condition is met.
Solution 2 cannot do such an action, however we can reduce computational time by avoiding split if the condition is satisfied.
Try this Perl solution
perl -le ' BEGIN { for(glob("*.sql")) { $x=qx(cat $_); $r++ for($x=~m/CREATE TABLE/g); print $_ if $r > 2 ; $r=0 } } '

Prevent awk from interpeting variable contents?

I'm attempting to parse a make -n output to make sure only the programs I want to call are being called. However, awk tries to interpret the contents of the output and run (?) it. Errors are something like awk: fatal: Cannot find file 'make'. I have gotten around this by saving the output as a temporary file and then reading that into awk. However, I'm sure there's a better way; any suggestions?
EDIT: I'm using the output later in my script and would like to avoid saving a file to increase speed if possible.
Here's what isn't working:
my_input=$(make -n file)
my_lines=$(echo $my_input | awk '/bin/ { print $1 }') #also tried printf and cat
Here's what works but obviously takes longer than it has to because of writing the file:
make -n file > temp
my_lines=$(awk '/bin/ { print $1 }' temp)
Many thanks for your help!
You can directly parse the output when it is generated by the following command and save the result in a file.
make -n file | grep bin > result.out
If you really want to go for an overkill awk solution, change your second line in the following way:
my_lines="$(awk '/bin/ { print }' temp)"

GREP: exclude file extensions in specific directory

My code takes added, modified, deleted, renamed, copied files from git status -s and compare them with the list of file paths from the file.
git status -s |
grep -E "^M|^D|^A|^R|^C" |
awk '{if ($1~/M+/ || $1~/D+/ || $1~/A+/ || $1~/R+/ || $1~/C+/) print $2}' |
grep --file=$list_of_files --fixed-strings |
grep -r --exclude="*.jar" "SVCS/bus/projects/Resources/"
Prints out git status like M foo.txt
Does some "filtering" operations
More filtering operations
Takes path to files for compare from the text file
Here I am trying to make so the last step would exclude .jar files from specific directory.
How can I do the last step? Or need to add something to the 4th step?
The simple fix is to change the last line to
grep -v 'SVCS/bus/projects/Resources/.*\.jar$'
but that really is some horrible code you have there.
Keeping in mind that grep | awk and awk | grep is an antipattern, how about this refactoring?
git status -s |
grep -E "^M|^D|^A|^R|^C" |
awk '{if ($1~/M+/ || $1~/D+/ || $1~/A+/ || $1~/R+/ || $1~/C+/)
... Hang on, what's the point of that? The grep already made sure that $1 contains one or more of those letters. The + quantifier is completely redundant here.
print $2}'
Will break on files with whitespace in them. This is a very common error which is aggravating because a lot of the time, the programmer knew it would break, but just figured "can't happen here".
git status -s | awk 'NR==FNR { files[$0] = 1; next }
/^[MDARC]/ { gsub(/^[MDARC]+ /, "");
if ($0 ~ /SVCS\/bus\/projects\/Resources\/.*\.jar$/)
next;
if ($0 in files) print }' "$list_of_files" -
The NR==FNR thing is a common idiom to read the first file into an array, then fall through to the next input file. So we read $list_of_files into the keys of the associative array files; then if the file name we read from git status is present in the keys, we print it. The condition to skip .jar files in a particular path is then a simple addition to this Awk script.
This assumes $list_of_files really is a list of actual files, as suggested by the file name. Your code will look for a match anywhere in that file, so a partial file name would also match (for example, if the file contains path/to/ick, a file named somepath/to/icktys/mackerel would match, and thus be printed). If that is the intended functionality, the above script will require some rather drastic modifications.

Save changes to a file AWK/SED

I have a huge text file delimited with comma.
19429,(Starbucks),390 Provan Walk,Glasgow,G34 9DL,-4.136909,55.872982
The first one is a unique id. I want the user to enter the id and enter a value for one of the following 6 fields in order to be replaced. Also, i'm asking him to enter a 2-7 value in order to identify which field should be replaced.
Now i've done something like this. I am checking every line to find the id user entered and then i'm replacing the value.
awk -F ',' -v elem=$element -v id=$code -v value=$value '{if($1==id) {if(elem==2) { $2=value } etc }}' $path
Where $path = /root/clients.txt
Let's say user enters "2" in order to replace the second field, and also enters "Whatever". Now i want "(Starbucks)" to be replaced with "Whatever" What i've done work fine but does not save the change into the file. I know that awk is not supposed to do so, but i don't know how to do it. I've searched a lot in google but still no luck.
Can you tell me how i'm supposed to do this? I know that i can do it with sed but i don't know how.
Newer versions of GNU awk support inplace editing:
awk -i inplace -v elem="$element" -v id="$code" -v value="$value" '
BEGIN{ FS=OFS="," } $1==id{ $elem=value } 1
' "$path"
With other awks:
awk -v elem="$element" -v id="$code" -v value="$value" '
BEGIN{ FS=OFS="," } $1==id{ $elem=value } 1
' "$path" > /usr/tmp/tmp$$ &&
mv /usr/tmp/tmp$$ "$path"
NOTES:
Always quote your shell variables unless you have an explicit reason not to and fully understand all of the implications and caveats.
If you're creating a tmp file, use "&&" before replacing your original with it so you don't zap your original file if the tmp file creation fails for any reason.
I fully support replacing Starbucks with Whatever in Glasgow - I'd like to think they wouldn't have let it open in the first place back in my day (1986 Glasgow Uni Comp Sci alum) :-).
awk is much easier than sed for processing specific variable fields, but it does not have in-place processing. Thus you might do the following:
#!/bin/bash
code=$1
element=$2
value=$3
echo "code is $code"
awk -F ',' -v elem=$element -v id=$code -v value=$value 'BEGIN{OFS=",";} /^'$code',/{$elem=value}1' mydb > /tmp/mydb.txt
mv /tmp/mydb.txt ./mydb
This finds a match for a line starting with code followed by a comma (you could also use ($1==code)), then sets the elemth field to value; finally it prints the output, using the comma as output field separator. If nothing matches, it just echoes the input line.
Everything is written to a temporary file, then overwrites the original.
Not very nice but it gets the job done.

Resources