bash to capture specific instance of pattern and exclude others - bash

I am trying to capture and read into $line the line or lines in file that have only del in them (line 2 is an example). Line 3 has del in it but it also has ins and the bash when executed currently captures both. I am not sure how to exclude anything but del and only capture those lines. Thank you :).
file
NM_003924.3:c.765_779dupGGCAGCGGCGGCAGC
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC
NM_003924.3:c.765_779delGGCAGCinsGGCGGCAGC
NM_003924.3:c.765_779insGGCAGCGGCGGCAGC
desired output
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC
bash w/ current output
while read line; do
if [[ $line =~ del ]] ; then echo $line; fi
done < file
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC
NM_003924.3:c.765_779delGGCAGCinsGGCGGCAGC

Could you please try following(if ok with awk).
awk '/del/ && !/ins/' Input_file

Try:
while read -r line; do
[[ $line =~ del && ! $line =~ ins ]] && printf '%s\n' "$line"
done < file
The revised code is also ShellCheck clean and avoids BashPitfall #14.
This solution may fail if the last line in the file does not have a terminating newline. If that is a concern, see the accepted answer to Read last line of file in bash script when reading file line by line for a fix.

Here is a sed solution. It negates the match del followed by ins and prints everything that has a del in it. -n to silent every other output.
$ sed -n -e '/del.*ins/!{/.*del.*/p}' inputFile
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Here is another answer using PCRE enabled grep. This should work with -P option in GNU grep
$ grep -P 'del\.*(?!.*ins)' inputFile
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Split it in 2 steps. You do not need a loop:
grep "del" file | grep -v "ins"

Related

Updating a config file based on the presence of a specific string

I want to be able to comment and uncomment lines which are "managed" using a bash script.
I am trying to write a script which will update all of the config lines which have the word #managed after them and remove the preceeding # if it exists.
The rest of the config file needs to be left unchanged. The config file looks like this:
configFile.txt
#config1=abc #managed
#config2=abc #managed
config3=abc #managed
config3=abc
This is the script I have created so far. It iterates the file, finds lines which contain "#managed" and detects if they are currently commented.
I need to then write this back to the file, how do I do that?
manage.sh
#!/bin/bash
while read line; do
STR='#managed'
if grep -q "$STR" <<< "$line"; then
echo "debug - this is managed"
firstLetter=${$line:0:1}
if [ "$firstLetter" = "#" ]; then
echo "Remove the initial # from this line"
fi
fi
echo "$line"
done < configFile.txt
With your approach using grep and sed.
str='#managed$'
file=ConfigFile.txt
grep -q "^#.*$str" "$file" && sed "/^#.*$str/s/^#//" "$file"
Looping through files ending in *.txt
#!/usr/bin/env bash
str='#managed$'
for file in *.txt; do
grep -q "^#.*$str" "$file" &&
sed "/^#.*$str/s/^#//" "$file"
done
In place editing with sed requires the -i flag/option but that varies from different version of sed, the GNU version does not require an -i.bak args, while the BSD version does.
On a Mac, ed should be installed by default, so just replace the sed part with.
printf '%s\n' "g/^#.*$str/s/^#//" ,p Q | ed -s "$file"
Replace the Q with w to actually write back the changes to the file.
Remove the ,p if no output to stdout is needed/required.
On a side note, embedding grep and sed in a shell loop that reads line-by-line the contents of a text file is considered a bad practice from shell users/developers/coders. Say the file has 100k lines, then grep and sed would have to run 100k times too!
This sed one-liner should do the trick:
sed -i.orig '/#managed/s/^#//' configFile.txt
It deletes the # character at the beginning of the line if the line contains the string #managed.
I wouldn't do it in bash (because that would be slower than sed or awk, for instance), but if you want to stick with bash:
#! /bin/bash
while IFS= read -r line; do
if [[ $line = *'#managed'* && ${line:0:1} = '#' ]]; then
line=${line:1}
fi
printf '%s\n' "$line"
done < configFile.txt > configFile.tmp
mv configFile.txt configFile.txt.orig && mv configFile.tmp configFile.txt

'sed: no input files' when using sed -i in a loop

I checked some solutions for this in other questions, but they are not working with my case and I'm stuck so here we go.
I have a csv file that I want to convert all to uppercase. It has to be with a loop and occupate 7 lines of code minimum. I have to run the script with this command:
./c_bash.sh student-mat.csv
So I tried this Script:
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -e 's/\(.*\)/\U\1/'
else
echo "$line"
fi
((c++))
done < student-mat.csv
I know that maybe there are a couple of unnecessary things on it, but I want to focus in the sed command because it looks like the problem here.
That script shows this output:(first 5 lines):
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
GP,F,17,U,GT3,T,1,1,AT_HOME,OTHER,COURSE,FATHER,1,2,0,NO,YES,NO,NO,NO,YES,YES,NO,5,3,3,1,1,3,4,5,5,6
GP,F,15,U,LE3,T,1,1,AT_HOME,OTHER,OTHER,MOTHER,1,2,3,YES,NO,YES,NO,YES,YES,YES,NO,4,3,2,2,3,3,10,7,8,10
GP,F,15,U,GT3,T,4,2,HEALTH,SERVICES,HOME,MOTHER,1,3,0,NO,YES,YES,YES,YES,YES,YES,YES,3,2,2,1,1,5,2,15,14,15
GP,F,16,U,GT3,T,3,3,OTHER,OTHER,HOME,FATHER,1,2,0,NO,YES,YES,NO,YES,YES,NO,NO,4,3,2,1,2,5,4,6,10,10
GP,M,16,U,LE3,T,4,3,SERVICES,OTHER,REPUTATION,MOTHER,1,2,0,NO,YES,YES,YES,YES,YES,YES,NO,5,4,2,1,2,5,10,15,15,15
Now that I see that it works, I want to apply that sed command permanently to the csv file, so I put -i after it:
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -i -e 's/\(.*\)/\U\1/'
else
echo "$line"
fi
((c++))
done < student-mat.csv
But the output instead of applying the changes, shows this:(first 5 lines)
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
sed: no input files
sed: no input files
sed: no input files
sed: no input files
sed: no input files
So checking a lot of different solutions on the internet, I also tried to change single quoting to double quoting.
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -i -e "s/\(.*\)/\U\1/"
else
echo "$line"
fi
((c++))
done < student-mat.csv
But in this case, instead of applying the changes, it generate a file with 0 bytes. So no output when I do this:
cat student-mat.csv
My expected solution here is that, when I apply this script, it changes permanently all the data to uppercase. And after applying the script, it should show this with the command cat student-mat.csv: (first 5 lines)
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
GP,F,17,U,GT3,T,1,1,AT_HOME,OTHER,COURSE,FATHER,1,2,0,NO,YES,NO,NO,NO,YES,YES,NO,5,3,3,1,1,3,4,5,5,6
GP,F,15,U,LE3,T,1,1,AT_HOME,OTHER,OTHER,MOTHER,1,2,3,YES,NO,YES,NO,YES,YES,YES,NO,4,3,2,2,3,3,10,7,8,10
GP,F,15,U,GT3,T,4,2,HEALTH,SERVICES,HOME,MOTHER,1,3,0,NO,YES,YES,YES,YES,YES,YES,YES,3,2,2,1,1,5,2,15,14,15
GP,F,16,U,GT3,T,3,3,OTHER,OTHER,HOME,FATHER,1,2,0,NO,YES,YES,NO,YES,YES,NO,NO,4,3,2,1,2,5,4,6,10,10
GP,M,16,U,LE3,T,4,3,SERVICES,OTHER,REPUTATION,MOTHER,1,2,0,NO,YES,YES,YES,YES,YES,YES,NO,5,4,2,1,2,5,10,15,15,15
Sed works on files, not on lines. Do not read lines, use sed on the file. Sed can exclude the first line by itself. See sed manual.
You want:
sed -i -e '2,$s/\(.*\)/\U\1/' student-mat.csv
You can do shorter with s/.*/\U&/.
Your code does not work as you think it does. Note that your code removes the second line from the output. Your code:
reads first line with read -r line
echo "$line" first line is printed
c++ is incremented
read -r line reads second line
then sed processes the rest of the file (from line 3 till the end) and prints them in upper case
then c++ is incremented
then read -r line fails, and the loop exits

How to read specific lines in a file in BASH?

while read -r line will run through each line in a file. How can I have it run through specific lines in a file, for example, lines "1-20", then "30-100"?
One option would be to use sed to get the desired lines:
while read -r line; do
echo "$line"
done < <(sed -n '1,20p; 30,100p' inputfile)
Saying so would feed lines 1-20, 30-100 from the inputfile to read.
#devnull's sed command does the job. Another alternative is using awk since it avoids doing the read and you can do the processing in awk itself:
awk '(NR>=1 && NR<=20) || (NR>=30 && NR<=100) {print "processing $0"}' file

fast way to replace characters in file ignoring comment lines

How can I replace/delete characters in a file while leaving comment lines unchanged? I'm looking for a something to the effect of the following lines (where 'X' is replaced for 'Y' in file.txt), just substantially faster:
while read line
do
if [[ ${line:0:1} = "#" ]]
then
echo "$line"
else
echo "$line" | tr "X" "Y"
fi
done < file.txt
Thank you!
Equivalent, more accurate (and faster) will be this sed command as compared to your script:
sed '/^ *#/!{s/X/Y/g;}' file.txt
This means match any line that doesn't have 0 or more spaces followed by # at the start of line and replace X with Y globally.
i am willing to bet perl will be faster than all above :
perl -i -pe 's/X/Y/g unless /^#/' file.txt
for fast replacement, use sed, and only replace in lines not starting with "#":
cat foo.txt | sed -e '/^#/! s/X/Y/g'
sed -i '/^#/! s/{what_to_replace}/{to_what_to_replace}/g' file.txt
awk version:
awk '!/^ *#/{gsub(/X/,"Y")}1' file.txt
Do look for word boundaries to prevent sub strings of your substitution from getting replaced. For example, with gawk you can use \< and \>

Help with Bash script

I'm trying to get this script to basically read input from a file on a command line, match the user id in the file using grep and output these lines with line numbers starting from 1)...n in a new file.
so far my script looks like this
#!/bin/bash
linenum=1
grep $USER $1 |
while [ read LINE ]
do
echo $linenum ")" $LINE >> usrout
$linenum+=1
done
when i run it ./username file
i get
line 4: [: read: unary operator expected
could anyone explain the problem to me?
thanks
Just remove the [] around read line - they should be used to perform tests (file exists, string is empty etc.).
How about the following?
$ grep $USER file | cat -n >usrout
Leave off the square brackets.
while read line; do
echo $linenum ")" $LINE
done >> usrout
just use awk
awk -vu="$USER" '$0~u{print ++d") "$0}' file
or
grep $USER file |nl
or with the shell, (no need to use grep)
i=1
while read -r line
do
case "$line" in
*"$USER"*) echo $((i++)) $line >> newfile;;
esac
done <"file"
Why not just use grep with the -n (or --line-number) switch?
$ grep -n ${USERNAME} ${FILE}
The -n switch gives the line number that the match was found on in the file. From grep's man page:
-n, --line-number
Prefix each line of output with the 1-based line number
within its input file.
So, running this against the /etc/passwd file in linux for user test_user, gives:
31:test_user:x:5000:5000:Test User,,,:/home/test_user:/bin/bash
This shows that the test_user account appears on line 31 of the /etc/passwd file.
Also, instead of $foo+=1, you should write foo=$(($foo+1)).

Resources