bash to capture specific instance of pattern and exclude others

bash to capture specific instance of pattern and exclude others - bash

I am trying to capture and read into $line the line or lines in file that have only del in them (line 2 is an example). Line 3 has del in it but it also has ins and the bash when executed currently captures both. I am not sure how to exclude anything but del and only capture those lines. Thank you :).
file
NM_003924.3:c.765_779dupGGCAGCGGCGGCAGC
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC
NM_003924.3:c.765_779delGGCAGCinsGGCGGCAGC
NM_003924.3:c.765_779insGGCAGCGGCGGCAGC
desired output
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC
bash w/ current output
while read line; do
if [[ $line =~ del ]] ; then echo $line; fi
done < file
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC
NM_003924.3:c.765_779delGGCAGCinsGGCGGCAGC

Could you please try following(if ok with awk).
awk '/del/ && !/ins/' Input_file

Try:
while read -r line; do
[[ $line =~ del && ! $line =~ ins ]] && printf '%s\n' "$line"
done < file
The revised code is also ShellCheck clean and avoids BashPitfall #14.
This solution may fail if the last line in the file does not have a terminating newline. If that is a concern, see the accepted answer to Read last line of file in bash script when reading file line by line for a fix.

Here is a sed solution. It negates the match del followed by ins and prints everything that has a del in it. -n to silent every other output.
$ sed -n -e '/del.*ins/!{/.*del.*/p}' inputFile
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Here is another answer using PCRE enabled grep. This should work with -P option in GNU grep
$ grep -P 'del\.*(?!.*ins)' inputFile
NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Split it in 2 steps. You do not need a loop:
grep "del" file | grep -v "ins"

Related

Updating a config file based on the presence of a specific string

I want to be able to comment and uncomment lines which are "managed" using a bash script.
I am trying to write a script which will update all of the config lines which have the word #managed after them and remove the preceeding # if it exists.
The rest of the config file needs to be left unchanged. The config file looks like this:
configFile.txt
#config1=abc #managed
#config2=abc #managed
config3=abc #managed
config3=abc
This is the script I have created so far. It iterates the file, finds lines which contain "#managed" and detects if they are currently commented.
I need to then write this back to the file, how do I do that?
manage.sh
#!/bin/bash
while read line; do
STR='#managed'
if grep -q "$STR" <<< "$line"; then
echo "debug - this is managed"
firstLetter=${$line:0:1}
if [ "$firstLetter" = "#" ]; then
echo "Remove the initial # from this line"
fi
fi
echo "$line"
done < configFile.txt

With your approach using grep and sed.
str='#managed$'
file=ConfigFile.txt
grep -q "^#.*$str" "$file" && sed "/^#.*$str/s/^#//" "$file"
Looping through files ending in *.txt
#!/usr/bin/env bash
str='#managed$'
for file in *.txt; do
grep -q "^#.*$str" "$file" &&
sed "/^#.*$str/s/^#//" "$file"
done
In place editing with sed requires the -i flag/option but that varies from different version of sed, the GNU version does not require an -i.bak args, while the BSD version does.
On a Mac, ed should be installed by default, so just replace the sed part with.
printf '%s\n' "g/^#.*$str/s/^#//" ,p Q | ed -s "$file"
Replace the Q with w to actually write back the changes to the file.
Remove the ,p if no output to stdout is needed/required.
On a side note, embedding grep and sed in a shell loop that reads line-by-line the contents of a text file is considered a bad practice from shell users/developers/coders. Say the file has 100k lines, then grep and sed would have to run 100k times too!

This sed one-liner should do the trick:
sed -i.orig '/#managed/s/^#//' configFile.txt
It deletes the # character at the beginning of the line if the line contains the string #managed.
I wouldn't do it in bash (because that would be slower than sed or awk, for instance), but if you want to stick with bash:
#! /bin/bash
while IFS= read -r line; do
if [[ $line = *'#managed'* && ${line:0:1} = '#' ]]; then
line=${line:1}
fi
printf '%s\n' "$line"
done < configFile.txt > configFile.tmp
mv configFile.txt configFile.txt.orig && mv configFile.tmp configFile.txt

'sed: no input files' when using sed -i in a loop

I checked some solutions for this in other questions, but they are not working with my case and I'm stuck so here we go.
I have a csv file that I want to convert all to uppercase. It has to be with a loop and occupate 7 lines of code minimum. I have to run the script with this command:
./c_bash.sh student-mat.csv
So I tried this Script:
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -e 's/\(.*\)/\U\1/'
else
echo "$line"
fi
((c++))
done < student-mat.csv
I know that maybe there are a couple of unnecessary things on it, but I want to focus in the sed command because it looks like the problem here.
That script shows this output:(first 5 lines):
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
GP,F,17,U,GT3,T,1,1,AT_HOME,OTHER,COURSE,FATHER,1,2,0,NO,YES,NO,NO,NO,YES,YES,NO,5,3,3,1,1,3,4,5,5,6
GP,F,15,U,LE3,T,1,1,AT_HOME,OTHER,OTHER,MOTHER,1,2,3,YES,NO,YES,NO,YES,YES,YES,NO,4,3,2,2,3,3,10,7,8,10
GP,F,15,U,GT3,T,4,2,HEALTH,SERVICES,HOME,MOTHER,1,3,0,NO,YES,YES,YES,YES,YES,YES,YES,3,2,2,1,1,5,2,15,14,15
GP,F,16,U,GT3,T,3,3,OTHER,OTHER,HOME,FATHER,1,2,0,NO,YES,YES,NO,YES,YES,NO,NO,4,3,2,1,2,5,4,6,10,10
GP,M,16,U,LE3,T,4,3,SERVICES,OTHER,REPUTATION,MOTHER,1,2,0,NO,YES,YES,YES,YES,YES,YES,NO,5,4,2,1,2,5,10,15,15,15
Now that I see that it works, I want to apply that sed command permanently to the csv file, so I put -i after it:
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -i -e 's/\(.*\)/\U\1/'
else
echo "$line"
fi
((c++))
done < student-mat.csv
But the output instead of applying the changes, shows this:(first 5 lines)
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
sed: no input files
sed: no input files
sed: no input files
sed: no input files
sed: no input files
So checking a lot of different solutions on the internet, I also tried to change single quoting to double quoting.
#!/bin/bash
declare -i c=0
while read -r line; do
if [ "$c" -gt '0' ]; then
sed -i -e "s/\(.*\)/\U\1/"
else
echo "$line"
fi
((c++))
done < student-mat.csv
But in this case, instead of applying the changes, it generate a file with 0 bytes. So no output when I do this:
cat student-mat.csv
My expected solution here is that, when I apply this script, it changes permanently all the data to uppercase. And after applying the script, it should show this with the command cat student-mat.csv: (first 5 lines)
school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
GP,F,17,U,GT3,T,1,1,AT_HOME,OTHER,COURSE,FATHER,1,2,0,NO,YES,NO,NO,NO,YES,YES,NO,5,3,3,1,1,3,4,5,5,6
GP,F,15,U,LE3,T,1,1,AT_HOME,OTHER,OTHER,MOTHER,1,2,3,YES,NO,YES,NO,YES,YES,YES,NO,4,3,2,2,3,3,10,7,8,10
GP,F,15,U,GT3,T,4,2,HEALTH,SERVICES,HOME,MOTHER,1,3,0,NO,YES,YES,YES,YES,YES,YES,YES,3,2,2,1,1,5,2,15,14,15
GP,F,16,U,GT3,T,3,3,OTHER,OTHER,HOME,FATHER,1,2,0,NO,YES,YES,NO,YES,YES,NO,NO,4,3,2,1,2,5,4,6,10,10
GP,M,16,U,LE3,T,4,3,SERVICES,OTHER,REPUTATION,MOTHER,1,2,0,NO,YES,YES,YES,YES,YES,YES,NO,5,4,2,1,2,5,10,15,15,15

Sed works on files, not on lines. Do not read lines, use sed on the file. Sed can exclude the first line by itself. See sed manual.
You want:
sed -i -e '2,$s/\(.*\)/\U\1/' student-mat.csv
You can do shorter with s/.*/\U&/.
Your code does not work as you think it does. Note that your code removes the second line from the output. Your code:
reads first line with read -r line
echo "$line" first line is printed
c++ is incremented
read -r line reads second line
then sed processes the rest of the file (from line 3 till the end) and prints them in upper case
then c++ is incremented
then read -r line fails, and the loop exits

How to read specific lines in a file in BASH?

while read -r line will run through each line in a file. How can I have it run through specific lines in a file, for example, lines "1-20", then "30-100"?

One option would be to use sed to get the desired lines:
while read -r line; do
echo "$line"
done < <(sed -n '1,20p; 30,100p' inputfile)
Saying so would feed lines 1-20, 30-100 from the inputfile to read.

#devnull's sed command does the job. Another alternative is using awk since it avoids doing the read and you can do the processing in awk itself:
awk '(NR>=1 && NR<=20) || (NR>=30 && NR<=100) {print "processing $0"}' file

fast way to replace characters in file ignoring comment lines

How can I replace/delete characters in a file while leaving comment lines unchanged? I'm looking for a something to the effect of the following lines (where 'X' is replaced for 'Y' in file.txt), just substantially faster:
while read line
do
if [[ ${line:0:1} = "#" ]]
then
echo "$line"
else
echo "$line" | tr "X" "Y"
fi
done < file.txt
Thank you!

Equivalent, more accurate (and faster) will be this sed command as compared to your script:
sed '/^ *#/!{s/X/Y/g;}' file.txt
This means match any line that doesn't have 0 or more spaces followed by # at the start of line and replace X with Y globally.

i am willing to bet perl will be faster than all above :
perl -i -pe 's/X/Y/g unless /^#/' file.txt

for fast replacement, use sed, and only replace in lines not starting with "#":
cat foo.txt | sed -e '/^#/! s/X/Y/g'

sed -i '/^#/! s/{what_to_replace}/{to_what_to_replace}/g' file.txt

awk version:
awk '!/^ *#/{gsub(/X/,"Y")}1' file.txt
Do look for word boundaries to prevent sub strings of your substitution from getting replaced. For example, with gawk you can use \< and \>

Help with Bash script

I'm trying to get this script to basically read input from a file on a command line, match the user id in the file using grep and output these lines with line numbers starting from 1)...n in a new file.
so far my script looks like this
#!/bin/bash
linenum=1
grep $USER $1 |
while [ read LINE ]
do
echo $linenum ")" $LINE >> usrout
$linenum+=1
done
when i run it ./username file
i get
line 4: [: read: unary operator expected
could anyone explain the problem to me?
thanks

Just remove the [] around read line - they should be used to perform tests (file exists, string is empty etc.).

How about the following?
$ grep $USER file | cat -n >usrout

Leave off the square brackets.
while read line; do
echo $linenum ")" $LINE
done >> usrout

just use awk
awk -vu="$USER" '$0~u{print ++d") "$0}' file
or
grep $USER file |nl
or with the shell, (no need to use grep)
i=1
while read -r line
do
case "$line" in
*"$USER"*) echo $((i++)) $line >> newfile;;
esac
done <"file"

Why not just use grep with the -n (or --line-number) switch?
$ grep -n ${USERNAME} ${FILE}
The -n switch gives the line number that the match was found on in the file. From grep's man page:
-n, --line-number
Prefix each line of output with the 1-based line number
within its input file.
So, running this against the /etc/passwd file in linux for user test_user, gives:
31:test_user:x:5000:5000:Test User,,,:/home/test_user:/bin/bash
This shows that the test_user account appears on line 31 of the /etc/passwd file.

Also, instead of $foo+=1, you should write foo=$(($foo+1)).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash to capture specific instance of pattern and exclude others - bash

Could you please try following(if ok with awk). awk '/del/ && !/ins/' Input_file

Here is a sed solution. It negates the match del followed by ins and prints everything that has a del in it. -n to silent every other output. $ sed -n -e '/del.ins/!{/.del.*/p}' inputFile NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Here is another answer using PCRE enabled grep. This should work with -P option in GNU grep $ grep -P 'del\.(?!.ins)' inputFile NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Split it in 2 steps. You do not need a loop: grep "del" file | grep -v "ins"

Related

Updating a config file based on the presence of a specific string

'sed: no input files' when using sed -i in a loop

How to read specific lines in a file in BASH?

fast way to replace characters in file ignoring comment lines

Help with Bash script

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash to capture specific instance of pattern and exclude others - bash

Could you please try following(if ok with awk). awk '/del/ && !/ins/' Input_file

Here is a sed solution. It negates the match del followed by ins and prints everything that has a del in it. -n to silent every other output. $ sed -n -e '/del.*ins/!{/.*del.*/p}' inputFile NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Here is another answer using PCRE enabled grep. This should work with -P option in GNU grep $ grep -P 'del\.*(?!.*ins)' inputFile NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Split it in 2 steps. You do not need a loop: grep "del" file | grep -v "ins"

Related

Updating a config file based on the presence of a specific string

'sed: no input files' when using sed -i in a loop

How to read specific lines in a file in BASH?

fast way to replace characters in file ignoring comment lines

Help with Bash script

Categories

Resources

Here is a sed solution. It negates the match del followed by ins and prints everything that has a del in it. -n to silent every other output. $ sed -n -e '/del.ins/!{/.del.*/p}' inputFile NM_003924.3:c.765_779delGGCAGCGGCGGCAGC

Here is another answer using PCRE enabled grep. This should work with -P option in GNU grep $ grep -P 'del\.(?!.ins)' inputFile NM_003924.3:c.765_779delGGCAGCGGCGGCAGC