grep to sed, append after string match but instead on end of line - bash

I have the following text file with the following lines:
<test="123">
<test="456">
<test="789">
My aim is to have the above text file to be appended with a keyword "HELLO" after the above numbers, as following:
<test="123.HELLO">
<test="456.HELLO">
<test="789.HELLO">
with the grep command and cut, I manage to get the value between the quotation mark.
grep -o "test=".* test.txt | cut -d \" -f2
I tried to use sed on top of it, with this line
grep -o "test=".* test.txt | cut -d \" -f2 | sed -i -- 's/$/.HELLO/' test.txt
however the closest I manage to get is instead a ".HELLO" which directly appended on the end of the line (and not after the numbers in between the quotes)
<test="123">.HELLO
<test="456">.HELLO
<test="789">.HELLO
How can I fix my sed statement to provide me with the requested line?

You can do it with groups in sed. To create new output, you can do this:
sed 's/\(test="[^"]*\)"/\1.HELLO"/g' test.txt
To modify it in-place, you can use the -i switch:
sed -i 's/\(test="[^"]*\)"/\1.HELLO"/g' test.txt
Explanation:
() is a group. You can refer to it with \1. In sed we have to escape the parentheses: \(\)
[^"]* matches everything that's not a quote. So the match will stop before the quote
In the replacement, you have to add the quote manually, since it's outside of the group. So you can put stuff before the quote.

Try this:
This is how your file looks like.
bash > cat a.txt
<test="123">
<test="456">
<test="789">
Your text piped to SED
bash > cat a.txt |sed 's/">/.HELLO">/g'
<test="123.HELLO">
<test="456.HELLO">
<test="789.HELLO">
bash >
Let me know if this worked out for you.

awk 'sub("[0-9]+","&.HELLO")' file

You can accomplish this with sed directly. Cut should not be necessary:
grep "test=" test.txt | sed 's/"\(.*\)"/"\1.HELLO"/'

Related

Find&Replace a quoted string with grep | sed

I need to remove in a text file all quotation marks which enclosure strings that always begins with the same, but ends in a different way:
'something.with.quotation.123' must be something.with.quotation.123
'something.with.quotation.456' must be something.with.quotation.456
but quotated strings that doesn't begin with this should not be changed.
I've been working with a grep that finds & prints the quotated strings:
grep -o "something\.with\.quotation\.[^']*" file.txt
Now I need to pass the results to sed through a pipeline, but it doesn't work:
grep -o "something\.with\.quotation\.[^']*" file.txt | sed -i "s/'$'/$/g" file.txt
I've been trying with other options ("s/\'$\'/$/g", "s/'\\$'/\\$/g",...) and googling a lot, but no way.
Can you point me to the correct way to get a result from a pipeline in sed?
You don't need grep here. Just use sed like this:
cat file
'something.with.quotation.123'
'something.with.quotation.456'
'foo bar'
sed -E "s/'(something\.with\.quotation\.[^']*)'/\1/g" file
something.with.quotation.123
something.with.quotation.456
'foo bar'

Shell script for string search between particular lines, timestamps

I have a file with more than 10000 lines. I am trying to search for a string in between particular set of lines, between 2 timestamps.
I am using sed command to achieve this.
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' filename | grep -i "string"
With the above command I am getting desired result. The above command is considering entire file not the lines I have specified.
Is there is a way to achieve this?.
Please help
I think the problem is here:
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' filename |
^^^
You want to pipe the output of your first sed command into the second. The way you have this, the output from the first is clobbered and replaced with a re-scan of the file.
Try this:
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' | grep -i "string"
Any time you find yourself chaining together pipes of seds and greps stop and just use 1 awk command instead:
awk -v IGNORECASE=1 '/2015-08-12/{f=1} f&&/string/; /2015-09-12/||(NR==4133){exit}' file
The above uses GNU awk for IGNORECASE, with other awks you'd just change /string/ to tolower($0)~/string/.

sed emulate "tr | grep"

Given the following file
$ cat a.txt
FOO='hhh';BAR='eee';BAZ='ooo'
I can easily parse out one item with tr and grep
$ tr ';' '\n' < a.txt | grep BAR
BAR='eee'
However if I try this using sed it just prints everything
$ sed 's/;/\n/g; /BAR/!d' a.txt
FOO='hhh'
BAR='eee'
BAZ='ooo'
With awk you could do this:
awk '/BAR/' RS=\; file
But if in the case of BAZ this would produce an extra newline, because the is no ; after the last word. If you want to remove that newline as well you would need to do something like:
awk '/BAZ/{sub(/\n/,x); print}' RS=\; file
or with GNU awk or mawk you could use:
awk '/BAZ/' RS='[;\n]'
If your grep has the -o option then you could also try this:
grep -o '[^;]*BAZ[^;]*' file
sed can do it just as you want:
sed -n 's/.*\(BAR[^;]*\).*/\1/gp' <<< "FOO='hhh';BAR='eee';BAZ='ooo'"
The point here is that you must suppress sed's default output -- the whole line --, and print only the substitutions you want to performed.
Noteworthy points:
sed -n suppresses the default output;
s/.../.../g operates in the entire line, even if already matched -- greedy;
s/.1./.2./p prints out the substituted part (.2.);
the tr part is given as the delimiter in the expression \(BAR[^;]*\);
the grep job is represented by the matching of the line itself.
awk 'BEGIN {RS=";"} /BAR/' a.txt
The following grep solution might work for you:
grep -o 'BAR=[^;]*' a.txt
$ sed 's/;/\n/g;/^BAR/!D;P;d' a.txt
BAR='eee'
replace all ; with \n
delete until BAR line is at the top
print BAR line
delete pattern space

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.
Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -
This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'
How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)
Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'
Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS
I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"
I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.
Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.
POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )
My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

Resources