How do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it? - bash

I need to edit specific lines in a text file. I have a pattern here, pattern.txt:
1
3
6
17
etc...
and a file with text, file.txt:
1 text
2 text
3 text
4 text
5 text
etc...
I want to add the words _PUT FLAG HERE to the end of each line of file.txt on lines that have match indicated by the pattern.txt.
I have
grep -F -f pattern.txt file.txt | awk '{print $0 "_PUT FLAG HERE" }'
But I can't seem to figure out a way to shove those changes back into the original file so it looks like this:
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
6 teeskjtkljeltsj _PUT FLAG HERE
etc...
It's a lot like trying to use tr, but much more convoluted. There should be a logical way to string AWK and grep, I just can't seem conceive of a way to put the pieces together into one pipe that would do this, and I can't find the answer anywhere. (If you explain a sed way to do this, please explain the regex.)

Assume your awk has been taken hostage.
A GNU sed/grep solution! To generate a sed script that does what you want, we get the lines to change from the input file:
$ grep -wFf pattern.txt file.txt
1 text
3 text
6 text
17 text
This matches complete words (-w) so 1 text is matched, but 11 text is not; -F is for fixed strings (no regex, should be faster) and -f pattern.txt reads the patterns to look for from a file.
Now we pipe this to sed to generate a script:
$ grep -wFf pattern.txt file.txt | sed 's#.*#/^&$/s/$/_PUT FLAG HERE/#'
/^1 text$/s/$/_PUT FLAG HERE/
/^3 text$/s/$/_PUT FLAG HERE/
/^6 text$/s/$/_PUT FLAG HERE/
/^17 text$/s/$/_PUT FLAG HERE/
The sed command in the pipe matches the complete line (.*) and assembles an address plus substitution command (& stands for the whole previously matched line).
Now we take all that and use it as input for sed by means of process substitution (requires Bash):
$ sed -f <(grep -wFf pattern.txt file.txt | sed 's#.*#/^&$/s/$/_PUT FLAG HERE/#') file.txt
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
6 text_PUT FLAG HERE
7 text
8 text
9 text
10 text
11 text
12 text
13 text
14 text
15 text
16 text
17 text_PUT FLAG HERE
Done!
Yes, yes, awk is shorter1, faster and more beautiful.
1 Actually not, but still.
Another remark: the grep step is not actually required, see answers by potong and Walter A.

Try this:
pattern.txt:
1
3
6
17
file.txt:
1 text
2 text
3 text
4 text
5 text
Use awk:
$ awk 'NR == FNR{seen[$1];next} $1 in seen{printf("%s_PUT FLAG HERE\n",$0);next}1' pattern.txt file.txt
Output:
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text

awk to the rescue!
you don't need other tools with the full power of awk at your disposal
$ awk -v tag='_PUT FLAG HERE' 'NR==FNR{a[$1];next}
{print $0 ($1 in a?tag:"")}' pattern file
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
just as an exercise, do the same with join/sort
$ sort <(join pattern file --nocheck-order |
sed 's/$/_PUT_FLAG_HERE/') <(join -v2 pattern file --nocheck-order)
1 text_PUT_FLAG_HERE
2 text
3 text_PUT_FLAG_HERE
4 text
5 text
perhaps defining function for DRY
$ f() { join $1 pattern file --nocheck-order; }; sort <(f "" |
sed 's/$/_PUT_FLAG_HERE/') <(f -v2)

The solution of #Benjamin can be simplified to
sed -f <(sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt) file.txt
Explanation
# Read awk commands from a file
sed -f awkcommands.txt pattern.txt file.txt
# Read awk commands from other command
sed -f <(other_command) file.txt
# Append string to every line by replacing end-of-line character $
sed 's/$/_PUT FLAG HERE/'
# Only append string on lines matching something
sed '/something/s/$/_PUT FLAG HERE/#'
# Only append string on lines matching something at the beginning of the line followed by a space
sed '/^something /s/$/_PUT FLAG HERE/#'
# Get the word something in above command selecting the whole line with .* and putting it in the new sed command with &.
# The slashes are used for the inner sed command, so use # here
sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt
# Now all together:
sed -f <(sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt) file.txt

This might work for you (GNU sed):
sed 's#.*#/&/s/$/_PUT FLAG HERE/#' pattern.txt | sed -f - file
This turns the pattern file into a sed script which is then invoked against the text file.

This solution uses only Bash (4.0+) features:
# Set up associative array 'patterns' whose keys are patterns
declare -A patterns
for pat in $(< pattern.txt) ; do patterns[$pat]=1 ; done
# Slurp all the lines of 'file.txt' into the 'lines' array
readarray -t lines < file.txt
# Write each old line in the file, possibly with a suffix, back to the file
for line in "${lines[#]}" ; do
read -r label text <<< "$line"
printf '%s%s\n' "$line" "${patterns[$label]+_PUT FLAG HERE}"
done > file.txt
NOTES:
The changes are written back to 'file.txt', as the question seems to specify.
Bash 4.0 or later is required for associative arrays and readarray.
Bash is very slow, so this solution may not be practical if either of the files is large (more than 10 thousand lines).

Related

read values of txt file from bash [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 5 years ago.
I'm trying to read values from a text file.
I have test1.txt which looks like
sub1 1 2 3
sub8 4 5 6
I want to obtain values '1 2 3' when I specify 'sub1'.
The closest I get is:
subj="sub1"
grep "$subj" test1.txt
But the answer is:
sub8 4 5 6
I've read that grep gives you the next line to the match, so I've tried to change the text file to the following:
test2.txt looks like:
sub1
1 2 3
sub8
4 5 6
However, when I type
grep "$subj" test2.txt
The answer is:
sub1
It should be something super simple but I've tried awk, seg, grep,egrep, cat and none is working...I've also read some posts somehow related but none was really helpful
Awk works: awk '$1 == "'"$subj"'" { print $2, $3, $4 }' test1.txt
The command outputs fields two, three, and four for all lines in test1.txt where the first field is $subj (i.e.: the contents of the variable named subj).
With your original text file format:
target=sub1
while IFS=$' \t\n' read -r key values; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
This requires no external tools, using only functionality built into bash itself. See BashFAQ #1.
As has come up during debugging in comments, if you have a traditional Apple-format text file (CR newlines only), then you might want something more like:
target=sub1
while IFS=$' \t\n' read -r -d $'\r' key values || [[ $key ]]; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
Alternately, using awk (for a standard UNIX text file):
target="sub1"
awk -v target="$target" '$1 == target { $1 = ""; print; }' <test1.txt
...or, for a file with CR-only newlines:
target="sub1"
tr '\r' '\n' <test1.txt | awk -v target="$target" '$1 == target { $1 = ""; print; }'
This version will be slower if the text file being read is small (since awk, like any other external tool, takes time to start up); but faster if it's large (since awk's operation is much faster than that of bash's built-ins once it's done starting up).
grep "sub1" test1.txt | cut -c6-
or
grep -A 1 "sub1" test2.txt | tail -n 1
You doing it right, but it seems like test1.txt has a wrong value in it.
with grep foo you get all lines with foo in it. use grep -m1 foo to find the first line with foo in it only.
then you can use cut -d" " -f2- to get all the values behind foo, while seperated by empty spaces.
In the end the command would look like this ...
$ subj="sub1"
$ grep -m1 "$subj" test1.txt | cut -d" " -f2-
But this doenst explain why you could not find sub1 in the first place.
Did you read the proper file ?
There's a bunch of ways to do this (and shorter/more efficient answers than what I'm giving you), but I'm assuming you're a beginner at bash, and therefore I'll give you something that's easy to understand:
egrep "^$subj\>" file.txt | sed "s/^\S*\>\s*//"
or
egrep "^$subj\>" file.txt | sed "s/^[^[:blank:]]*\>[[:blank:]]*//"
The first part, egrep, will search for you subject at the beginning of the line in file.txt (that's what the ^ symbol does in the grep string). It also is looking for a whole word (the \> is looking for an end of word boundary -- that way sub1 doesn't match sub12 in the file.) Notice you have to use egrep to get the \>, as grep by default doesn't recognize that escape sequence. Once done finding the lines, egrep then passes it's output to sed, which will strip the first word and trailing whitespace off of each line. Again, the ^ symbol in the sed command, specifies it should only match at the beginning of the line. The \S* tells it to read as many non-whitespace characters as it can. Then the \s* tells sed to gobble up as many whitespace as it can. sed then replaces everything it matched with nothing, leaving the other stuff behind.
BTW, there's a help page in Stack overflow that tells you how to format your questions (I'm guessing that was the reason you got a downvote).
-------------- EDIT ---------
As pointed out, if you are on a Mac or something like that you have to use [:alnum:] instead of \S, and [:blank:] instead of \s in your sed expression (as these are portable to all platforms)
awk '/sub1/{ print $2,$3,$4 }' file
1 2 3
What happens? After regexp /sub1/ the three following fields are printed.
Any drawbacks? It affects the space.
Sed also works: sed -n -e 's/^'"$subj"' *//p' file1.txt
It outputs all lines matching $subj at the beginning of a line after having removed the matching word and the spaces following. If TABs are used the spaces should be replaced by something like [[:space:]].

How to add 100 spaces at end of each line of a file in Unix

I have a file which is supposed to contain 200 characters in each line. I received a source file with only 100 characters in each line. I need to add 100 extra white spaces to each line now. If it were few blank spaces, we could have used sed like:
sed 's/$/ /' filename > newfilename
Since it's 100 spaces, can anyone tell me is it possible to add in Unix?
If you want to have fixed n chars per line (don't trust the input file has exact m chars per line) follow this. For the input file with varying number of chars per line:
$ cat file
1
12
123
1234
12345
extend to 10 chars per line.
$ awk '{printf "%-10s\n", $0}' file | cat -e
1 $
12 $
123 $
1234 $
12345 $
Obviously change 10 to 200 in your script. Here $ shows end of line, it's not there as a character. You don't need cat -e, here just to show the line is extended.
With awk
awk '{printf "%s%100s\n", $0, ""}' file.dat
$0 refers to the entire line.
Updated after Glenn's suggestion
Somewhat how Glenn suggests in the comments, the substitution is unnecessary, you can just add the spaces - although, taking that logic further, you don't even need the addition, you can just say them after the original line.
perl -nlE 'say $_," "x100' file
Original Answer
With Perl:
perl -pe 's/$/" " x 100/e' file
That says... "Substitute (s) the end of each line ($) with the calculated expression (e) of 100 repetitions of a space".
If you wanted to pad all lines to, say, 200 characters even if the input file was ragged (all lines of differing length), you could use something like this:
perl -pe '$pad=200-length;s/$/" " x $pad/e'
which would make up lines of 83, 102 and 197 characters to 200 each.
If you use Bash, you can still use sed, but use some readline functionality to keep you from manually typing 100 spaces (see manual for "Readline arguments").
You start typing normally:
sed 's/$/
Now, you want to insert 100 spaces. You can do this by prepending hitting the space bar with a readline argument to indicate that you want it to happen 100 times, i.e., you manually enter what would look like this as a readline keybinding:
M-1 0 0 \040
Or, if your meta key is the alt key: Alt+1 00Space
This inserts 100 spaces, and you get
sed 's/$/ /' filename
after typing the rest of the command.
This is useful for working in an interactive shell, but not very pretty for scripts – use any of the other solutions for that.
Just in case you are looking for a bash solution,
while IFS= read -r line
do
printf "%s%100s\n" "$line"
done < file > newfile
Test
Say I have a file with 3 lines it it as
$ wc -c file
16 file
$ wc -c newfile
316 newfile
Original Answer
spaces=$(echo {1..101} | tr -d 0-9)
while read line
do
echo -e "${line}${spaces}\n" >> newfile
done < file
You can use printf in awk:
awk '{printf "%s%*.s\n", $0, 100, " "}' filename > newfile
This printf will append 100 spaces at the end of each newline.
Another way in GNU awk using string-manipulation function sprintf.
awk 'BEGIN{s=sprintf("%-100s", "");}{print $0 s}' input-file > file-with-spaces
A proof with an example:-
$ cat input-file
1234jjj hdhyvb 1234jjj
6789mmm mddyss skjhude
khora77 koemm sado666
nn1004 nn1004 457fffy
$ wc -c input-file
92 input-file
$ awk 'BEGIN{s=sprintf("%-100s", "");}{print $0 s}' input-file > file-with-spaces
$ wc -c file-with-spaces
492 file-with-spaces

remove lines based on file input pattern using sed

I have been trying to solve a simple sed line deletion problem.
Looked here and there. It didn't solve my problem.
My problem could simply be achieved by using sed -i'{/^1\|^2\|^3/d;}' infile.txt which deletes lines beginning with 1,2 and 3 from the infile.txt.
But what I want instead is to take the starting matching patterns from a file than manually feeding into the stream editor.
E.g: deletePattern
1
3
2
infile.txt
1 Line here
2 Line here
3 Line here
4 Line here
Desired output
4 Line here
Thank you in advance,
This grep should work:
grep -Fvf deletePattern infile.txt
4 Line here
But this will skip a line if patterns in deletePattern are found anywhere in the 2nd file.
More accurate results can be achieved by using this awk command:
awk 'FILENAME == ARGV[1] && FNR==NR{a[$1];next} !($1 in a)' deletePattern infile.txt
4 Line here
Putting together a quick command substitution combined with a character class will allow a relatively short oneliner:
$ sed -e "/^[$( while read -r ch; do a+=$ch; done <pattern.txt; echo "$a" )]/d" infile.txt
4 Line here
Of course, change the -e to -i for actual in-place substitution.
With GNU sed (for -f -):
sed 's!^[0-9][0-9]*$!/^&[^0-9]/d!' deletePattern | sed -f - infile.txt
The first sed transforms deletePattern into a sed script, then the second sed applies this script.
Try this:
sed '/^[123]/ d' infile.txt

grep (awk) a file from A to first empty line

I need to grep a file from a line containing Pattern A to a first empty line.
I used awk but I don't know how to code this empty line.
cat ${file} | awk '/Pattern A/,/Pattern B/'
sed might be best:
sed -n '/PATTERN/,/^$/p' file
To avoid printing the empty line:
sed -n '/PATTERN/,/^$/{/^$/d; p}' file
or even better - thanks jthill!:
sed -n '/PATTERN/,/^$/{/./p}' file
Above solutions will give more output than needed if PATTERN appears more than once. For that, it is best to quit after empty line is found, as jaypal's answer suggests:
sed -n '/PATTERN/,/^$/{/^$/q; p}' file
Explanation
^$ matches empty lines, because ^ stands for beginning of line and $ for end of line. So that, ^$ means: lines not containing anything in between beginning and end of line.
/PATTERN/,/^$/{/^$/d; p}
/PATTERN/,/^$/ match lines from PATTERN to empty line.
{/^$/d; p} remove (d) the lines being on ^$ format, print (p) the rest.
{/./p} just prints those lines having at least one character.
With awk you can use:
awk '!NF{f=0} /PATTERN/ {f=1} f' file
Same as sed, if it has many lines with PATTERN it would fail. For this, let's exit once empty line is found:
awk 'f && !NF{exit} /PATTERN/ {f=1} f' file
Explanation
!NF{f=0} if there are no fields (that is, line is empty), unset the flag f.
/PATTERN/ {f=1} if PATTERN is found, set the flag f.
f if flag f is set, this is True, so it performs the default awk behaviour: print the line.
Test
$ cat a
aa
bb
hello
aaaaaa
bbb
ttt
$ awk '!NF{f=0} /hello/ {f=1} f' a
hello
aaaaaa
bbb
$ sed -n '/hello/,/^$/{/./p}' a
hello
aaaaaa
bbb
Using sed:
sed -n '/PATTERN/,/^$/{/^$/q;p;}' file
Using regex range, you define your range from the PATTERN to blank line (/^$/). When you encounter a blank line, you quit else you keep printing.
Using awk:
awk '/PATTERN/{p=1}/^$/&&p{exit}p' file
You enable a flag when you encounter your PATTERN. When you reach a blank line and flag is enabled, you exit. If not, you keep printing.
Another alternate suggested by devnull in the comments is to use pcregrep:
pcregrep -M 'PATTERN(.|\n)*?(?=\n\n)' file
I think this is a nice, readable Perl one-liner:
perl -wne '$f=1 if /Pattern A/; exit if /^\s*$/; print if $f' file
Set the flag $f when the pattern is matched
Exit if a blank line (only whitespace between start and end of line) is found
Print the line if the flag is set
Testing it out:
$ cat file
1
2
Pattern A
3
4
5
6
7
8
9
$ perl -wne '$f=1 if /Pattern A/; exit if /^$/; print if $f' file
Pattern A
3
4
5
6
Alternatively, based on the suggestion by #jaypal, you could do this:
perl -lne '/Pattern A/ .. 1 and !/^$/ ? print : exit' file
Rather than using a flag $f, the range operator .. takes care of this for you. It evaluates to true when "Pattern A" is found on the line and remains true indefinitely. When it is true, the other part will be evaluated and will print until a blank line is found.
Never use
/foo/,/bar/
in awk unless you want to get from the first occurrence of "foo" to the last occurrence of "bar" as it makes trivial jobs marginally briefer but even slightly more interesting requirements require a complete re-write.
Just use:
/foo/{f=1} f{print; if (/bar/) f=0}
or similar instead.
In the case the awk solution is:
awk '/pattern/{f=1} f{print; if (!NF) exit}' file

How can I delete every Xth line in a text file?

Consider a text file with scientific data, e.g.:
5.787037037037037063e-02 2.048402977658663748e-01
1.157407407407407413e-01 4.021264347118673754e-01
1.736111111111111049e-01 5.782032163406526371e-01
How can I easily delete, for instance, every second line, or every 9 out of 10 lines in the file? Is it for example possible with a bash script?
Background: the file is very large but I need much less data to plot. Note that I am using Ubuntu/Linux.
This is easy to accomplish with awk.
Remove every other line:
awk 'NR % 2 == 0' file > newfile
Remove every 10th line:
awk 'NR % 10 != 0' file > newfile
The NR variable in awk is the line number. Anything outside of { } in awk is a conditional, and the default action is to print.
How about perl?
perl -n -e '$.%10==0&&print' # print every 10th line
You could possibly do it with sed, e.g.
sed -n -e 'p;N;d;' file # print every other line, starting with line 1
If you have GNU sed it's pretty easy
sed -n -e '0~10p' file # print every 10th line
sed -n -e '1~2p' file # print every other line starting with line 1
sed -n -e '0~2p' file # print every other line starting with line 2
Try something like:
awk 'NR%3==0{print $0}' file
This will print one line in three. Or:
awk 'NR%10<9{print $0}' file
will print 9 lines out of ten.
This might work for you (GNU sed):
seq 10 | sed '0~2d' # delete every 2nd line
1
3
5
7
9
seq 100 | sed '0~10!d' # delete 9 out of 10 lines
10
20
30
40
50
60
70
80
90
100
You can use a awk and a shell script. Awk can be difficult but...
This will delete specific lines you tell it to:
nawk -f awkfile.awk [filename]
awkfile.awk contents
BEGIN {
if (!lines) lines="3 4 7 8"
n=split(lines, lA, FS)
for(i=1;i<=n;i++)
linesA[lA[i]]
}
!(FNR in linesA)
Also I can't remember if VIM comes with the standard Ubuntu or not. If not get it.
Then open the file with vim
vim [filename]
Then type
:%!awk NR\%2 or :%!awk NR\%2
This will delete every other line. Just change the 2 to another integer for a different frequency.

Resources