bash - multiple operations without temp files (counting lines of code with custom exclusions) - bash

I want to keep each operation on its own line with interspersed comments
is there anyway to do this without the kludgy temp files
#!/bin/sh
git diff --stat `git hash-object -t tree /dev/null` > tmp.txt
# not my code
grep -v "^ kazmath" tmp.txt > tmp2.txt
grep -v "\.obj " tmp2.txt > tmp.txt
grep -v "\.png " tmp.txt > tmp2.txt
grep -v "\.gbo " tmp2.txt > tmp.txt
# not my code
grep -v "obj2opengl\.pl " tmp.txt > tmp2.txt
grep -v "\.txt " tmp2.txt > tmp.txt
grep -v "\.md " tmp.txt > tmp2.txt
grep -v "\.blend " tmp2.txt > tmp.txt
# +'s at end of line
sed 's/+*$//' tmp.txt > tmp2.txt
# ditch last line
sed '$d' < tmp2.txt > tmp.txt
echo -n "lines of code "
cut -d '|' -f 2 tmp.txt | awk '{ sum+=$1} END {print sum}'
rm tmp.txt
rm tmp2.txt

Use pipes and more powerful regular expressions with grep -E (aka egrep):
git diff --stat `git hash-object -t tree /dev/null` |
grep -v "^ kazmath" |
grep -E -v "\.(png|gbo|obj|txt|md|blend) " |
grep -v "obj2opengl\.pl " |
sed -e 's/+*$//' -e '$d' |
cut -d '|' -f 2 |
awk '{sum += $1 } END { print "lines of code " sum }'

Related

Bash Grep Takes 3 Days To Run. Anyway to Enhance it?

I have a script like this that I would like to seek some suggestions on enhancing it.
cd /home/output/
cat R*op.txt > R.total.op.txt
awk '{if( (length($8)>9) || ($8 ~ /^AAA/) ) {print $0}}' R.total.op.txt > temp && mv temp R.total.op.txt
cat S*op.txt > S.total.op.txt
awk '{if( (length($8)>9) || ($8 ~ /^AAA/) ) {print $0}}' S.total.op.txt > temp && mv temp S.total.op.txt
cat R.total.op.txt S.total.op.txt | awk '{print $4}' | sort -k1,1 | awk '!x[$1]++' > genes.txt
rm *total.op.txt
head genes.txt
cd /home/output/
for j in R1_with-genename R2_with-genename S1_with-genename S2_with-genename
do
**for i in `cat genes.txt`; do cat $j'.op.txt' | grep -w $i >> $j'_'$i'_gene.txt'**;done
done
ls -m1 *gene.txt | wc -l
find . -size 0 -delete
ls -m1 *gene.txt | wc -l
rm genes.txt
cd /home/output/
for i in `ls *gene.txt`
do
paste <(awk '{print $4"\t"$8"\t"$9"\t"$13}' $i | awk '!x[$1]++' | awk '{print $1}') <(awk '{print $4"\t"$8"\t"$9}' $i | awk '{if( (length($2)>9) || ($2 ~ /^AAA/) ) {print $0}}' | sort -k2,2 | awk '{ sum += $3 } END { if (NR > 0) print sum / NR }') <(awk '{print $4"\t"$8"\t"$9}' $i| awk '{if( (length($2)>9) || ($2 ~ /^AAA/) ) {print $0}}' | sort -k2,2 | wc -l) <(awk '{print $4"\t"$8"\t"$9"\t"$13}' $i | awk '{if( (length($2)>9) || ($2 ~ /^AAA/) ) {print $0}}' | sort -k2,2 | grep -v ":::" | wc -l) > $i'_stats.txt'
done
rm *gene.txt
cd /home/output/
for j in R1_with-genename R2_with-genename S1_with-genename S2_with-genename
do
cat $j*stats.txt > $j'.final.txt'
done
rm *stats.txt
cd /home/output/
for i in `ls *final.txt`
do
sed "1iGene_Name\tMean1\tCalculated\tbases" $i > temp && mv temp $i
done
head *final.txt
The very first for loop (marked with asterisks) that has cat genes.txt is the grep loop that is taking 3 days to finish. Can someone please advice any enhancements to the command and if this entire script can be made into a single command? Thanks in advance.
Try replacing the nested loops with a single awk.
awk 'FNR = NR {words[$0] = "\\b" $0 "\\b"; next}
{ for (i in words) if ($0 ~ words[i]) {
fn = FILENAME "_" i "_gene.txt";
print >> fn;
close(fn);
}' genes.txt {{R,S}{1,2}_with-genename}.op.txt
I suggest creating a sed script:
# name script
SEDSCRIPT=split.sed
# Make sure it is empty
echo "" > ${SEDSCRIPT}
# Loop through all the words in genes.txt and
# create sed command that will write that line to a file
for word in `cat genes.txt`; do
echo "/${word}/w ${word}.txt" >> ${SEDSCRIPT}
done
basenames="R1_with-genename R2_with-genename S1_with-genename S2_with-genename"
# Loop over input files
for name in "${basenames}"; do
# Run sed script against file
sed -n -f ${SEDSCRIPT} ${name}.op.txt
# Move the temporary files created by sed to their permanent names
for word in `cat genes.txt`; do
mv ${word}.txt ${name}_${word}_gene.txt
done
done

Why does my awk redirection not work?

Im trying to redirect my output to replace the contents of my file but if I do this it doesn't change my output at all
#!/bin/bash
ssh_config_path="$HOME/.ssh/config"
temp_ssh_config_path="$HOME/.ssh/config_temporary"
new_primary_username=$1
curr_primary_username=`awk '/^Host github\.com$/,/#Username/{print $2}' $ssh_config_path | tail -1`
new_user_name=`awk "/^Host github-$new_primary_username$/,/#Name/{print $2}" $ssh_config_path | tail -1 | sed 's/#Name //' | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'`
new_user_email=`awk "/^Host github-$new_primary_username$/,/#Email/{print $2}" $ssh_config_path | tail -1 | sed 's/#Email //' | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'`
echo "Switching from $curr_primary_username to $new_primary_username"
echo "Setting name to $new_user_name"
echo "Setting email to $new_user_email"
awk "
!x{x=sub(/github-$new_primary_username/,\"github.com\")}
!y{y=sub(/github\.com/,\"github-$curr_primary_username\")}
1" $ssh_config_path > temp_ssh_config_path && mv temp_ssh_config_path ssh_config_path
but if I do this I get the correct output on my terminal screen
#!/bin/bash
ssh_config_path="$HOME/.ssh/config"
temp_ssh_config_path="$HOME/.ssh/config_temporary"
new_primary_username=$1
curr_primary_username=`awk '/^Host github\.com$/,/#Username/{print $2}' $ssh_config_path | tail -1`
new_user_name=`awk "/^Host github-$new_primary_username$/,/#Name/{print $2}" $ssh_config_path | tail -1 | sed 's/#Name //' | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'`
new_user_email=`awk "/^Host github-$new_primary_username$/,/#Email/{print $2}" $ssh_config_path | tail -1 | sed 's/#Email //' | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'`
echo "Switching from $curr_primary_username to $new_primary_username"
echo "Setting name to $new_user_name"
echo "Setting email to $new_user_email"
awk "
!x{x=sub(/github-$new_primary_username/,\"github.com\")}
!y{y=sub(/github\.com/,\"github-$curr_primary_username\")}
1" $ssh_config_path
It's disappointing how far you've veered from the answers you were given but in any case here's the correct syntax for your script (untested since you didn't provide any sample input/output):
#!/bin/bash
ssh_config_path="$HOME/.ssh/config"
temp_ssh_config_path="$HOME/.ssh/config_temporary"
new_primary_username="$1"
curr_primary_username=$(awk 'f&&/#Username/{print $2; exit} /^Host github\.com$/{f=1}' "$ssh_config_path")
new_user_name=$(awk -v npu="$new_primary_username" 'f&&/#Name/{print $2; exit} $0~"^Host github-"npu"$"{f=1}' "$ssh_config_path")
new_user_email=$(awk -v npu="$new_primary_username" 'f&&/#Email/{print $2; exit} $0~"^Host github-"npu"$"{f=1}' "$ssh_config_path")
echo "Switching from $curr_primary_username to $new_primary_username"
echo "Setting name to $new_user_name"
echo "Setting email to $new_user_email"
awk -v npu="$new_primary_username" -v cpu="$curr_primary_username" '
!x{x=sub("github-"npu,"github.com")}
!y{y=sub(/github\.com/,"github-"cpu)}
1' "$ssh_config_path" > temp_ssh_config_path && mv temp_ssh_config_path "$ssh_config_path"
By doing that I noticed that your last statement was:
mv temp_ssh_config_path ssh_config_path
when you probably meant:
mv temp_ssh_config_path "$ssh_config_path"
and that would have caused a problem with your expected output file being empty.
The whole thing should, of course, have been written as just 1 simple awk script.

AWK variable input

I've got the following bash code:
md5sum -c checksum.md5 2>&1 | grep FAILED | awk '{print $1}' | sed 's/:$// > /tmp/check.tmp
awk '{system("wget http://example.com/"$1"")}' /tmp/check.tmp
How can I use awk without a temp file?
Something like
files=`md5sum -c checksum.md5 2>&1 | grep FAILED | awk '{print $1}' | sed 's/:$//`
awk '{system("wget http://example.com/"$1"")}' $files
You can simplify the whole command to this:
md5sum -c checksum.md5 2>&1 |\
awk -F'[:/]' '/FAILED/{system("wget http://example.com/"$(NF-1))}'
wget has a switch -i that can come in handy:
md5sum -c checksum.md5 2>&1 | \
sed -n '/FAILED$/ { s/: FAILED$//; s!^!http://example.com/!; p; }' | \
wget -i
Like this:
awk '{system("wget http://example.com/"$1"")}' <<< $files

bash pipe and printing with multiple filter

I was wondering if something like this exist:
tail -f file1 | grep "hello" > fileHello | grep "bye" > fileBye | grep "etc" > fileEtc
echo b1bla >> file1
echo b2hello >> file1
echo b3bye >> file1
echo b4hellobye >> file1
echo b5etc >> file1
echo b6byeetc >> file1
That will make that result :
file1:
b1bla
b2hello
b3bye
b4hellobye
b5etc
b6byeetc
fileHello:
b2hello
b4hellobye
fileBye:
b3bye
b4hellobye
b6byeetc
fileEtc:
b5etc
b6byeetc
Thanks!
Use tee with process substitution:
tail -f file1 | tee >(exec grep "hello" > fileHello) >(exec grep "bye" > fileBye) | grep "etc" > fileEtc
This works, but be aware that piping tail -f is likely to cause some unexpected buffering issues.
tail -f file1 |
awk '/hello/ { print > "fileHello"}
/bye/ { print > "fileBye"}
/etc/ { print > "fileEtc"}'

Bash: "xargs cat", adding newlines after each file

I'm using a few commands to cat a few files, like this:
cat somefile | grep example | awk -F '"' '{ print $2 }' | xargs cat
It nearly works, but my issue is that I'd like to add a newline after each file.
Can this be done in a one liner?
(surely I can create a new script or a function that does cat and then echo -n but I was wondering if this could be solved in another way)
cat somefile | grep example | awk -F '"' '{ print $2 }' | while read file; do cat $file; echo ""; done
Using GNU Parallel http://www.gnu.org/software/parallel/ it may be even faster (depending on your system):
cat somefile | grep example | awk -F '"' '{ print $2 }' | parallel "cat {}; echo"
awk -F '"' '/example/{ system("cat " $2 };printf "\n"}' somefile

Resources