sed emulate "tr | grep" - bash

Given the following file
$ cat a.txt
FOO='hhh';BAR='eee';BAZ='ooo'
I can easily parse out one item with tr and grep
$ tr ';' '\n' < a.txt | grep BAR
BAR='eee'
However if I try this using sed it just prints everything
$ sed 's/;/\n/g; /BAR/!d' a.txt
FOO='hhh'
BAR='eee'
BAZ='ooo'

With awk you could do this:
awk '/BAR/' RS=\; file
But if in the case of BAZ this would produce an extra newline, because the is no ; after the last word. If you want to remove that newline as well you would need to do something like:
awk '/BAZ/{sub(/\n/,x); print}' RS=\; file
or with GNU awk or mawk you could use:
awk '/BAZ/' RS='[;\n]'
If your grep has the -o option then you could also try this:
grep -o '[^;]*BAZ[^;]*' file

sed can do it just as you want:
sed -n 's/.*\(BAR[^;]*\).*/\1/gp' <<< "FOO='hhh';BAR='eee';BAZ='ooo'"
The point here is that you must suppress sed's default output -- the whole line --, and print only the substitutions you want to performed.
Noteworthy points:
sed -n suppresses the default output;
s/.../.../g operates in the entire line, even if already matched -- greedy;
s/.1./.2./p prints out the substituted part (.2.);
the tr part is given as the delimiter in the expression \(BAR[^;]*\);
the grep job is represented by the matching of the line itself.

awk 'BEGIN {RS=";"} /BAR/' a.txt

The following grep solution might work for you:
grep -o 'BAR=[^;]*' a.txt

$ sed 's/;/\n/g;/^BAR/!D;P;d' a.txt
BAR='eee'
replace all ; with \n
delete until BAR line is at the top
print BAR line
delete pattern space

Related

grep to sed, append after string match but instead on end of line

I have the following text file with the following lines:
<test="123">
<test="456">
<test="789">
My aim is to have the above text file to be appended with a keyword "HELLO" after the above numbers, as following:
<test="123.HELLO">
<test="456.HELLO">
<test="789.HELLO">
with the grep command and cut, I manage to get the value between the quotation mark.
grep -o "test=".* test.txt | cut -d \" -f2
I tried to use sed on top of it, with this line
grep -o "test=".* test.txt | cut -d \" -f2 | sed -i -- 's/$/.HELLO/' test.txt
however the closest I manage to get is instead a ".HELLO" which directly appended on the end of the line (and not after the numbers in between the quotes)
<test="123">.HELLO
<test="456">.HELLO
<test="789">.HELLO
How can I fix my sed statement to provide me with the requested line?
You can do it with groups in sed. To create new output, you can do this:
sed 's/\(test="[^"]*\)"/\1.HELLO"/g' test.txt
To modify it in-place, you can use the -i switch:
sed -i 's/\(test="[^"]*\)"/\1.HELLO"/g' test.txt
Explanation:
() is a group. You can refer to it with \1. In sed we have to escape the parentheses: \(\)
[^"]* matches everything that's not a quote. So the match will stop before the quote
In the replacement, you have to add the quote manually, since it's outside of the group. So you can put stuff before the quote.
Try this:
This is how your file looks like.
bash > cat a.txt
<test="123">
<test="456">
<test="789">
Your text piped to SED
bash > cat a.txt |sed 's/">/.HELLO">/g'
<test="123.HELLO">
<test="456.HELLO">
<test="789.HELLO">
bash >
Let me know if this worked out for you.
awk 'sub("[0-9]+","&.HELLO")' file
You can accomplish this with sed directly. Cut should not be necessary:
grep "test=" test.txt | sed 's/"\(.*\)"/"\1.HELLO"/'

How to grep with a pattern that includes a " quotation mark?

I want to grep a line that includes a quotation mark, more specifically I want to grep lines that include a " mark.
more specifically I want to grep lines like:
#include "something.h"
then pipe into sed to just return something.h
A single grep will do this job.
grep -oP '(?<=")[^"]*(?=")' file
Example:
$ echo '#include "something.h"' | grep -oP '(?<=")[^"]*(?=")'
something.h
sed '#n
/"/ s/.*"\([^"]*\)" *$/\1/p' YourFile
No need of grep (unless performance on huge file is wanted) with a sed. Sed could filter and adapt directly the content
In your case, /"/ is certainly modified by /#include *"/
in case of several string between quote
sed '#n
/"/ {s/"[^"]*$/"/;s/[^"]*"\([^"]*\)" */\1/gp;}' YourFile
You can use awk to get included filename:
awk -F'"' '{print $2}' file.c
something.h

Bash - remove all lines beginning with 'P'

I have a text file that's about 300KB in size. I want to remove all lines from this file that begin with the letter "P". This is what I've been using:
> cat file.txt | egrep -v P*
That isn't outputting to console. I can use cat on the file without another other commands and it prints out fine. My final intention being to:
> cat file.txt | egrep -v P* > new.txt
No error appears, it just doesn't print anything out and if I run the 2nd command, new.txt is empty.
I should say I'm running Windows 7 with Cygwin installed.
Explanation
use ^ to anchor your pattern to the beginning of the line ;
delete lines matching the pattern using sed and the d flag.
Solution #1
cat file.txt | sed '/^P/d'
Better solution
Use sed-only:
sed '/^P/d' file.txt > new.txt
With awk:
awk '!/^P/' file.txt
Explanation
The condition starts with an ! (negation), that negates the following pattern ;
/^P/ means "match all lines starting with a capital P",
So, the pattern is negated to "ignore lines starting with a capital P".
Finally, it leverage awk's behavior when { … } (action block) is missing, that is to print the record validating the condition.
So, to rephrase, it ignores lines starting with a capital P and print everything else.
Note
sed is line oriented and awk column oriented. For your case you should use the first one, see Edouard Lopez's reponse.
Use sed with inplace substitution (for GNU sed, will also for your cygwin)
sed -i '/^P/d' file.txt
BSD (Mac) sed
sed -i '' '/^P/d' file.txt
Use start of line mark and quotes:
cat file.txt | egrep -v '^P.*'
P* means P zero or more times so together with -v gives you no lines
^P.* means start of line, then P, and any char zero or more times
Quoting is needed to prevent shell expansion.
This can be shortened to
egrep -v ^P file.txt
because .* is not needed, therefore quoting is not needed and egrep can read data from file.
As we don't use extended regular expressions grep will also work fine
grep -v ^P file.txt
Finally
grep -v ^P file.txt > new.txt
This works:
cat file.txt | egrep -v -e '^P'
-e indicates expression.

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.
Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -
This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'
How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)
Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'
Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS
I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"
I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.
Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.
POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )
My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

shell replace cr\lf by comma

I have input.txt
1
2
3
4
5
I need to get such output.txt
1,2,3,4,5
How to do it?
Try this:
tr '\n' ',' < input.txt > output.txt
With sed, you could use:
sed -e 'H;${x;s/\n/,/g;s/^,//;p;};d'
The H appends the pattern space to the hold space (saving the current line in the hold space). The ${...} surrounds actions that apply to the last line only. Those actions are: x swap hold and pattern space; s/\n/,/g substitute embedded newlines with commas; s/^,// delete the leading comma (there's a newline at the start of the hold space); and p print. The d deletes the pattern space - no printing.
You could also use, therefore:
sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}'
The -n suppresses default printing so the final d is no longer needed.
This solution assumes that the CRLF line endings are the local native line ending (so you are working on DOS) and that sed will therefore generate the local native line ending in the print operation. If you have DOS-format input but want Unix-format (LF only) output, then you have to work a bit harder - but you also need to stipulate this explicitly in the question.
It worked OK for me on MacOS X 10.6.5 with the numbers 1..5, and 1..50, and 1..5000 (23,893 characters in the single line of output); I'm not sure that I'd want to push it any harder than that.
In response to #Jonathan's comment to #eumiro's answer:
tr -s '\r\n' ',' < input.txt | sed -e 's/,$/\n/' > output.txt
tr and sed used be very good but when it comes to file parsing and regex you can't beat perl
(Not sure why people think that sed and tr are closer to shell than perl... )
perl -pe 's/\n/$1,/' your_file
if you want pure shell to do it then look at string matching
${string/#substring/replacement}
Use paste command. Here is using pipes:
echo "1\n2\n3\n4\n5" | paste -s -d, /dev/stdin
Here is using a file:
echo "1\n2\n3\n4\n5" > /tmp/input.txt
paste -s -d, /tmp/input.txt
Per man pages the s concatenates all lines and d allows to define the delimiter character.
Awk versions:
awk '{printf("%s,",$0)}' input.txt
awk 'BEGIN{ORS=","} {print $0}' input.txt
Output - 1,2,3,4,5,
Since you asked for 1,2,3,4,5, as compared to 1,2,3,4,5, (note the comma after 5, most of the solutions above also include the trailing comma), here are two more versions with Awk (with wc and sed) to get rid of the last comma:
i='input.txt'; awk -v c=$(wc -l $i | cut -d' ' -f1) '{printf("%s",$0);if(NR<c){printf(",")}}' $i
awk '{printf("%s,",$0)}' input.txt | sed 's/,\s*$//'
printf "1\n2\n3" | tr '\n' ','
if you want to output that to a file just do
printf "1\n2\n3" | tr '\n' ',' > myFile
if you have the content in a file do
cat myInput.txt | tr '\n' ',' > myOutput.txt
python version:
python -c 'import sys; print(",".join(sys.stdin.read().splitlines()))'
Doesn't have the trailing comma problem (because join works that way), and splitlines splits data on native line endings (and removes them).
cat input.txt | sed -e 's|$|,|' | xargs -i echo "{}"

Resources