sed to grep lines after specific line for further processing - shell

I am working with a script which looks for file lines after a specific line and process them to get data from it.
Let me illustrate with an example,
if file "sample.log" has lines like
qwerty asdf foo bar
foo
time: 1:00 PM
foo1 bar1
foo foo fooo copying file abc/def/ghi/foo.txt
bar bar1 bar2 copying file efg/qwe/bar.txt
foo
My script should search for contents after time: 1:00 PM. After finding those lines, it must look for lines matching the pattern "copying" and get the path specified in the line.
In this case, output written to another file should be
abc/def/ghi/foo.txt
efg/qwe/bar.txt
I tried this using following command but getting empty string as output. Please guide me with this
sed -n '/^time: 1:00 PM/{/^(.*)copying file/s/^(.*)copying file //p}' ../../sample.log

If you're already in Tcl, you could code it in Tcl:
set fid [open "FILE" r]
set have_time false
while {[gets $fid line] != -1} {
if {$have_time && [regexp {copying file (.*)} $line -> filename]} {
puts $filename
} elseif {[string first "time:" $line] > -1} {
set have_time true
}
}
close $fid
If your file is quite huge, exec sed may be faster, but you'll have to see for yourself.
Note, if you're going to exec sed, keep in mind that inside Tcl, single quotes have no special meaning: use braces to quote the sed program.
exec sed -e {do stuff here} FILE

sed '/1:00 PM/,$ {/copying/s:.*file \(.*\):\1:p};d' FILE

This might work for you (GNU sed):
sed -ne '/1:00 PM/,$!b' -e 's/.*copying.* //w copy' file

Related

search through regex pattern in a file

I have a file with data in below format
abc {u_bit_top/connect_down/u_FDIO[6]/u_latch}
ghi {u_bit_top/seq_connect/p_REDEIO[9]/ff_latch
def {u_bit_top/connect_up/shift_reg[7]
I want to search for pattern *bit_top*FDIO* and *bit_top*REDEIO*in the file in each line and delete the complete line if pattern is found.
I want output as
def {u_bit_top/connect_up/shift_reg[7]
I did using sed like sed "/bit_top/d;/FDIO/d;/REDEIO/d;" but this deletes the line having bit_top and FDIO and REDEIO separately.
How I can search for above pattern and delete the line containing it.
Shell or TCL anything will be useful.
Since you tagged tcl
set fh [open "filename"]
set contents [split [read -nonewline $fh] \n]
close $fh
set filtered [lsearch -inline -not -regexp $contents {bit_top.*(FDIO|REDEIO)}]
results in
def {u_bit_top/connect_up/shift_reg[7]
lsearch documentation.
But really all you need for this is grep
grep -Ev 'bit_top.*(FDIO|REDEIO)' filename
You've been close! ;)
sed '/bit_top.*FDIO/d' input
Just input a regex to sed that matches what you want...
Using sed
$ sed -E '/bit_top.*(REDE|FD)IO/d' input_file
def {u_bit_top/connect_up/shift_reg[7]
You might use GNU AWK for this task following way, let file.txt content be
abc {u_bit_top/connect_down/u_FDIO[6]/u_latch}
ghi {u_bit_top/seq_connect/p_REDEIO[9]/ff_latch
def {u_bit_top/connect_up/shift_reg[7]
then
awk '/bit_top/&&(/FDIO/||/REDEIO/){next}{print}' file.txt
gives output
def {u_bit_top/connect_up/shift_reg[7]
Explanation: if lines contain bit_top AND (FDIO OR REDEIO) then go to next line i.e. skip it. If that did not happen line is just printed.
(tested in GNU Awk 5.0.1)
With a small change you can implement the compound pattern (eg, *bit_top*FDIO*) in sed.
A couple variations on OP's current sed:
# daisy-chain the 2 requirements:
$ sed "/bit_top.*FDIO/d;/bit_top.*REDEIO/d" file
def {u_bit_top/connect_up/shift_reg[7]
# enable "-E"xtended regex support:
$ sed -E "/bit_top.*(FDIO|REDEIO)/d" file
def {u_bit_top/connect_up/shift_reg[7]
You can read the file line by line and perform not operation on your regex pattern.
set fp [open "input.txt" r]
while { [gets $fp data] >= 0 } {
if {![regexp {bit_top.*(FDIO|REDEIO)} $data match]}
{puts $match}
}
close $fp

Speed up bash for loop which contains multiple sed commands

my bash for loop looks like:
for i in read_* ; do
cut -f1 $i | sponge $i
sed -i '1 s/^/>/g' $i
sed -i '3 s/^/>ref\n/g' $i
sed -i '4d' $i
sed -i '1h;2H;1,2d;4G' $i
mv $i $i.fasta
done
Are there any methods of speeding up this process, perhaps using GNU parallel?
EDIT: Added input and expected output.
Input:
sampleid 97 stuff 2086 42 213M = 3322 1431
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
Hopeful output:
>ref
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
>sampleid
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
I used the sed -i '1h;2H;1,2d;4G' $i command to swap lines 2 and 4.
If I read it right, this should create the same result, though it would probably help a LOT if I could see what your input and expected output look like...
awk '{$0=$1}
FNR==1{hd=">"$0; next}
FNR==2{hd=hd"\n"$0;next}
FNR==3{print ">ref\n"$0 > FILENAME".fasta"}
FNR==4{next}
FNR==5{print hd"\n"$0 > FILENAME".fasta"}
' read_*
My input files:
$: cat read_x
foo x
bar x
baz x
last x
curiosity x
$: cat read_y
FOO y
BAR y
BAZ y
LAST y
CURIOSITY y
and the resulting output files:
$: cat read_x.fasta
>ref
baz
>foo
bar
curiosity
$: cat read_y.fasta
>ref
BAZ
>FOO
BAR
CURIOSITY
This runs in one pass with no loop aside from awk's usual internals, and leaves the originals in place so you can check it first. If all is good, all that's left is to remove the originals. For that, I would use extended globbing.
$: shopt -s extglob; rm read_!(*.fasta)
That will clean up the original inputs but not the new outputs.
Same results, three commands, no loops.
I am, or course, making some assumptions about what you are meaning to do that might not be accurate. To get this format in a single sed call -
$: sed -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_x
>ref
baz
>foo
bar
curiosity
but that's not the same commands you used, so maybe I'm misreading it.
To use this to in-place edit multiple files at a time (instead of calling it in a loop on each file), use -si so that the line numbers apply to each file rather than the stream of records they collectively produce.
DON'T use -is, though you could use -i -s.
$: sed -s -i -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_*
This still leaves you with the issue of renaming each, but xargs makes that pretty easy in the given example.
printf "%s\n" read_* | xargs -I# mv # #.fasta
addendum
Using the file you gave in the OP, assuming every file is the same general structure and exactly 4 lines -
$: cat file_0 # I made files 0 through 7, but with same data
sampleid 97 stuff 2086 42 213M = 3322 1431
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
$: sed -Esi '1{s/^([^[:space:]]+).*/>\1/;h;s/.*/>ref/}; 3x;' file_?
$: cat file_0 # used a diff on each, worked on all at once
>ref
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
>sampleid
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
Breakout:
-Esi Extended pattern matching, separate file linecounts, in-place edits
1{...}; Collectively do these commands, in order, only on every line 1
s/^([^[:space:]]+).*/>\1/ add leading > but strip everything after any whitespace
h store the resulting >\1 line in the hold buffer
s/.*/>ref/ then replace the whole line with a literal >ref
`3x' swap line 3 with the value in the hold buffer from line 1
file_? I used a glob to supply the appropriate list of files all at once.
Doing same with awk:
$: awk 'FNR==1{id=">"$1; print ">ref" >FILENAME".fasta"; next} FNR==3{print id > FILENAME".fasta"; next} {print $0 > FILENAME".fasta"}' file_?
Then you can do file management as above with the xargs/mv for the sed or the shopt/rm for the awk - or we could add a little organizational work in awk if you like. Consider this:
awk 'BEGIN { system(" mkdir -p done ") }
FNR==1 { id=">"$1; print ">ref" > FILENAME".fasta"; next } # skip printing original
FNR==3 { print id > FILENAME".fasta"; next } # skip printing original
{ print $0 > FILENAME".fasta" } # every line NOT skipped
FNR==4 { close(FILENAME); close(FILENAME".fasta");
system("mv " FILENAME " done/")
}' file_?
Then if there are any problems, it's easy to delete the fasta's, move the originals back, adjust the code, and try again. If everything is ok, it's fast and easy to rm -fr done, yes?
Note that I really only added the mkdir inside a system call in the awk to show that you can, and to keep from having to manually do it separately if you have to run a few iterations or move it all into a wrapper script, etc.
The code in the question runs multiple subprocesses (cut, sponge, sed four times, and mv) for each file that is processed. Running subprocesses is relatively slow, so you can speed up the code significantly by reducing the number of them.
This Shellcheck-clean code is one way to do it:
#! /bin/bash -p
old_files=()
for f in read_* ; do
readarray -t lines <"$f"
printf '>ref\n%s\n>%s\n%s\n' \
"${lines[3]}" "${lines[0]%%[[:space:]]*}" "${lines[1]}" >"$f.fasta"
old_files+=( "$f" )
done
rm -- "${old_files[#]}"
This runs no subprocesses when processing individual files. It just reads the lines of the old file into an array using the built-in readarray command and writes to the new file using the built-in printf.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of the %% in ${lines[0]%%[[:space:]]*}.
To avoid running rm for each file, the code keeps a list of files to be deleted and removes all of them at the end. If you try the code, consider commenting the rm line until you are very confident that the rest of the code is doing what you want.

Use `sed` to replace text in code block with output of command at the top of the code block

I have a markdown file that has snippets of code resembling the following example:
```
$ cat docs/code_sample.sh
#!/usr/bin/env bash
echo "Hello, world"
```
This means there there's a file at the location docs/code_sample.sh, whose contents is:
#!/usr/bin/env bash
echo "Hello, world"
I'd like to parse the markdown file with sed (awk or perl works too) and replace the bottom section of the code snippet with whatever the above bash command evaluates to, for example whatever cat docs/code_sample.sh evaluates to.
Perl to the rescue!
perl -0777 -pe 's/(?<=```\n)^(\$ (.*)\n\n)(?^s:.*?)(?=```)/"$1".qx($2)/meg' < input > output
-0777 slurps the whole file into memory
-p prints the input after processing
s/PATTERN/REPLACEMENT/ works similarly to a substitution in sed
/g replaces globally, i.e. as many times as it can
/m makes ^ match start of each line instead of start of the whole input string
/e evaluates the replacement as code
(?<=```\n) means "preceded by three backquotes and a newline"
(?^s:.*?) changes the behaviour of . to match newlines as well, so it matches (frugally because of the *?) the rest of the preformatted block
(?=```) means "followed by three backquotes`
qx runs the parameter in a shell and returns its output
A sed-only solution is easier if you have the GNU version with an e command.
That said, here's a quick, simplistic, and kinda clumsy version I knocked out that doesn't bother to check the values of previous or following lines - it just assumes your format is good, and bulls through without any looping or anything else. Still, for my example code, it worked.
I started by making an a, a b, and an x that is the markup file.
$: cat a
#! /bin/bash
echo "Hello, World!"
$: cat b
#! /bin/bash
echo "SCREW YOU!!!!"
$: cat x
```
$ cat a
foo
bar
" b a z ! "
```
```
$ cat b
foo
bar
" b a z ! "
```
Then I wrote s which is the sed script.
$: cat s
#! /bin/env bash
sed -En '
/^```$/,/^```$/ {
# for the lines starting with the $ prompt
/^[$] / {
# save the command to the hold space
x
# write the ``` header to the pattern space
s/.*/```/
# print the fabricated header
p
# swap the command back in
x
# the next line should be blank - add it to the current pattern space
N
# first print the line of code as-is with the (assumed) following blank line
p
# scrub the $ (prompt) off the command
s/^[$] //
# execute the command - store the output into the pattern space
e
# print the output
p
# put the markdown footer back
s/.*/```/
# and print that
p
}
# for the (to be discarded) existing lines of "content"
/^[^`$]/d
}
' $*
It does the job and might get you started.
$: s x
```
$ cat a
#! /bin/bash
echo "Hello, World!"
```
```
$ cat b
#! /bin/bash
echo "SCREW YOU!!!!"
```
Lots of caveats - better to actually check that the $ follows a line of backticks and is followed by a blank line, maybe make sure nothing bogus could be in the file to get executed... but this does what you asked, with (GNU) sed.
Good luck.
A rare case when use of getline would be appropriate:
$ cat tst.awk
state == "importing" {
while ( (getline line < $NF) > 0 ) {
print line
}
close($NF)
state = "imported"
}
$0 == "```" { state = (state ? "" : "importing") }
state != "imported" { print }
$ awk -f tst.awk file
See http://awk.freeshell.org/AllAboutGetline for getline uses and caveats.

sed: Replacing a range of text with contents of a file

There are many examples here and elsewhere on the interwebs for using sed's 'r' to replace a pattern, but it does not seem to work on a range, but maybe I'm just not holding it right.
The following works as expected, deleting BEGIN PATTERN and replacing it with the contents of /tmp/somefile.
sed -n "/BEGIN PATTERN/{ r /tmp/somefile d }" TARGET_FILE
This, however, only replaces END_PATTERN with the contents of /tmp/somefile.
sed -n "/BEGIN PATTERN/,/END PATTERN/ { r /tmp/somefile d }" TARGET_FILE
I suppose I could try perl or awk to do this as well, but it seems like sed should be able to do this.
I believe that this does what you want:
sed $'/BEGIN PATTERN/r somefile\n /BEGIN PATTERN/,/END PATTERN/d' file
Or:
sed -e '/BEGIN PATTERN/r somefile' -e '/BEGIN PATTERN/,/END PATTERN/d' file
How it works
/BEGIN PATTERN/r somefile
Whenever BEGIN PATTERN is found, this inserts the contents of somefile.
/BEGIN PATTERN/,/END PATTERN/d
Whenever we are in the range from a line with /BEGIN PATTERN/ to a line with /END PATTERN/, we delete (d) the contains of the pattern buffer.
Example
Let's consider these two test files:
$ cat file
prelude
BEGIN PATTERN
middle
END PATTERN
afterthought
and:
$ cat somefile
This is
New.
Our command produces:
$ sed $'/BEGIN PATTERN/r somefile\n /BEGIN PATTERN/,/END PATTERN/d' file
prelude
This is
New.
afterthought
This might work for you (GNU sed):
sed -e '/BEGIN PATTERN/,/END PATTERN/{/END PATTERN/!d;r somefile' -e 'd}' file
John1024's answer works if BEGIN PATTERN and END PATTERN are different. If this is not the case, the following works:
sed $'/PATTERN/,/PATTERN/d; 1,/PATTERN/ { /PATTERN/r somefile\n }' file
By preserving the pattern:
sed $'/PATTERN/,/PATTERN/ { /PATTERN/!d; }; 1,/PATTERN/ { /PATTERN/r somefile\n }' file
This solution can yield false positives if the pattern is not paired as potong pointed out.

How to concatenate stdin and a string?

How to I concatenate stdin to a string, like this?
echo "input" | COMMAND "string"
and get
inputstring
A bit hacky, but this might be the shortest way to do what you asked in the question (use a pipe to accept stdout from echo "input" as stdin to another process / command:
echo "input" | awk '{print $1"string"}'
Output:
inputstring
What task are you exactly trying to accomplish? More context can get you more direction on a better solution.
Update - responding to comment:
#NoamRoss
The more idiomatic way of doing what you want is then:
echo 'http://dx.doi.org/'"$(pbpaste)"
The $(...) syntax is called command substitution. In short, it executes the commands enclosed in a new subshell, and substitutes the its stdout output to where the $(...) was invoked in the parent shell. So you would get, in effect:
echo 'http://dx.doi.org/'"rsif.2012.0125"
use cat - to read from stdin, and put it in $() to throw away the trailing newline
echo input | COMMAND "$(cat -)string"
However why don't you drop the pipe and grab the output of the left side in a command substitution:
COMMAND "$(echo input)string"
I'm often using pipes, so this tends to be an easy way to prefix and suffix stdin:
echo -n "my standard in" | cat <(echo -n "prefix... ") - <(echo " ...suffix")
prefix... my standard in ...suffix
There are some ways of accomplish this, i personally think the best is:
echo input | while read line; do echo $line string; done
Another can be by substituting "$" (end of line character) with "string" in a sed command:
echo input | sed "s/$/ string/g"
Why i prefer the former? Because it concatenates a string to stdin instantly, for example with the following command:
(echo input_one ;sleep 5; echo input_two ) | while read line; do echo $line string; done
you get immediatly the first output:
input_one string
and then after 5 seconds you get the other echo:
input_two string
On the other hand using "sed" first it performs all the content of the parenthesis and then it gives it to "sed", so the command
(echo input_one ;sleep 5; echo input_two ) | sed "s/$/ string/g"
will output both the lines
input_one string
input_two string
after 5 seconds.
This can be very useful in cases you are performing calls to functions which takes a long time to complete and want to be continuously updated about the output of the function.
You can do it with sed:
seq 5 | sed '$a\6'
seq 5 | sed '$ s/.*/\0 6/'
In your example:
echo input | sed 's/.*/\0string/'
I know this is a few years late, but you can accomplish this with the xargs -J option:
echo "input" | xargs -J "%" echo "%" "string"
And since it is xargs, you can do this on multiple lines of a file at once. If the file 'names' has three lines, like:
Adam
Bob
Charlie
You could do:
cat names | xargs -n 1 -J "%" echo "I like" "%" "because he is nice"
Also works:
seq -w 0 100 | xargs -I {} echo "string "{}
Will generate strings like:
string 000
string 001
string 002
string 003
string 004
...
The command you posted would take the string "input" use it as COMMAND's stdin stream, which would not produce the results you are looking for unless COMMAND first printed out the contents of its stdin and then printed out its command line arguments.
It seems like what you want to do is more close to command substitution.
http://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html#Command-Substitution
With command substitution you can have a commandline like this:
echo input `COMMAND "string"`
This will first evaluate COMMAND with "string" as input, and then expand the results of that commands execution onto a line, replacing what's between the ‘`’ characters.
cat will be my choice: ls | cat - <(echo new line)
With perl
echo "input" | perl -ne 'print "prefix $_"'
Output:
prefix input
A solution using sd (basically a modern sed; much easier to use IMO):
# replace '$' (end of string marker) with 'Ipsum'
# the `e` flag disables multi-line matching (treats all lines as one)
$ echo "Lorem" | sd --flags e '$' 'Ipsum'
Lorem
Ipsum#no new line here
You might observe that Ipsum appears on a new line, and the output is missing a \n. The reason is echo's output ends in a \n, and you didn't tell sd to add a new \n. sd is technically correct because it's doing exactly what you are asking it to do and nothing else.
However this may not be what you want, so instead you can do this:
# replace '\n$' (new line, immediately followed by end of string) by 'Ipsum\n'
# don't forget to re-add the `\n` that you removed (if you want it)
$ echo "Lorem" | sd --flags e '\n$' 'Ipsum\n'
LoremIpsum
If you have a multi-line string, but you want to append to the end of each individual line:
$ ls
foo bar baz
$ ls | sd '\n' '/file\n'
bar/file
baz/file
foo/file
I want to prepend my sql script with "set" statement before running it.
So I echo the "set" instruction, then pipe it to cat. Command cat takes two parameters : STDIN marked as "-" and my sql file, cat joins both of them to one output. Next I pass the result to mysql command to run it as a script.
echo "set #ZERO_PRODUCTS_DISPLAY='$ZERO_PRODUCTS_DISPLAY';" | cat - sql/test_parameter.sql | mysql
p.s. mysql login and password stored in .my.cnf file

Resources