Multiple "sed" actions on previous results - bash

Have this input:
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
on the above I need do 4 (sequential) actions:
select only lines containing "foo" (lowercase)
on the selected lines, remove everything but UPPERCASE letters
delete empty lines (if some is created by the previous action)
and on the remaining from the above - enclose every char with [x]
I'm able to solve the above, but need two sed invocations piped together. Script:
#!/bin/bash
data() {
cat <<EOF
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
EOF
}
echo "Result OK"
data | sed -n '/foo/s/[^A-Z]//gp' | sed '/^\s*$/d;s/./[&]/g'
# in the above it is solved using 2 sed invocations
# trying to solve it using only one invocation,
# but the following doesn't do what i need.. :( :(
echo "Variant 2 - trying to use only ONE invocation of sed"
data | sed -n '/foo/s/[^A-Z]//g;/^\s*$/d;s/./[&]/gp'
output from the above:
Result OK
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Variant 2 - trying to use only ONE invocation of sed
[A][B][C][D][E][F]
[B][A][R][ ][A][B][C]
[A][B][C][D][E][F]
The variant 2 should be also only
[A][B][C][D][E][F]
[A][B][C][D][E][F]
It is possible to solve the above using only by one sed invocation?

sed -n '/foo/{s/[^A-Z]//g;/^$/d;s/./[&]/g;p;}' inputfile
Output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]

Alternative sed approach:
sed '/foo/!d;s/[^A-Z]//g;/./!d;s/./[&]/g' file
The output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
/foo/!d - deletes all lines that don't contain foo
/./!d - deletes all empty lines

Related

Sed insert file contents rather than file name

I have two files and would like to insert the contents of one file into the other, replacing a specified line.
File 1:
abc
def
ghi
jkl
File 2:
123
The following code is what I have.
file1=numbers.txt
file2=letters.txt
linenumber=3s
echo $file1
echo $file2
sed "$linenumber/.*/r $file1/" $file2
Which results in the output:
abc
def
r numbers.txt
jkl
The output I am hoping for is:
abc
def
123
jkl
I thought it could be an issue with bash variables but I still get the same output when I manually enter the information.
How am I misunderstanding sed and/or the read command?
Your script replaces the line with the string "r $file1". The part in sed in s command is not re-interpreted as a command, but taken literally.
You can:
linenumber=3
sed "$linenumber"' {
r '"$file1"'
d
}' "$file2"
Read line number 3, print file1 and then delete the line.
See here for a good explanation and reference.
Surely we can make that a oneliner:
sed -e "$linenumber"' { r '"$file2"$'\n''d; }' "$file1"
Life example at tutorialpoints.
I would use the c command as follows:
linenumber=3
sed "${linenumber}c $(< $file1)" "$file2"
This replaces the current line with the text that comes after c.
Your command didn't work because it expands to this:
sed "3s/.*/r numbers.txt/" letters.txt
and you can't use r like that. r has to be the command that is being run.

Split text file basing on date tag / timestamp

I have big log file containing date tags. It looks like this:
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
[04/11/2015, 12:21]
foo
bar
[08/11/2015, 14:12]
bar
foo
[09/11/2015, 11:25]
...
[15/11/2015, 19:22]
...
[15/11/2015, 21:55]
...
and so on. I need to split these data into files of days, like:
01.txt:
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
04.txt:
[04/11/2015, 12:21]
foo
bar
etc. How can I do that using any of unix tools?
I don't think there's a tool that will do it without a little programming, but with Awk the little programming really isn't all that hard.
script.awk
/^\[[0-3][0-9]\/[01][0-9]\/[12][0-9]{3},/ {
if ($1 != old_date)
{
if (outfile != "") close(outfile);
outfile = sprintf("%.2d.txt", ++filenum);
old_date = $1
}
}
{ print > outfile }
The first (bigger) block of code recognizes the date string, which is also in $1 (so the condition could be made more precise by referring to $1, but the benefit it minimal to non-existent). Inside the actions, it checks to see if the date is different from the last date it remembered. If so, it checks whether it has a file open and closes it if necessary (close is part of POSIX awk). Then it generates a new file name, and remembers the current date it is processing.
The second smaller block simply writes the current line to the current file.
Invocation
awk -f script.awk data
This assumes you have a file script.awk; you could provide it as a script argument if you prefer. If the whole is encapsulated in a shell script, I'd use an expression rather than a second file, but I find it convenient for development to use a file. (The shell script would contain awk '…the script…' "$#" with no separate file.)
Example output files
Given the sample data from the question, the output is in five files, 01.txt .. 05.txt.
$ for file in 0?.txt; do boxecho $file; cat $file; done
************
** 01.txt **
************
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
************
** 02.txt **
************
[04/11/2015, 12:21]
foo
bar
************
** 03.txt **
************
[08/11/2015, 14:12]
bar
foo
************
** 04.txt **
************
[09/11/2015, 11:25]
...
************
** 05.txt **
************
[15/11/2015, 19:22]
...
[15/11/2015, 21:55]
...
$
The boxecho command is a simple script that echoes its arguments in a box of stars:
echo "** $* **" | sed -e h -e s/./*/g -e p -e x -e p -e x
Revised file name format
I wish have output as a [day].txt or [day].[month].[year].txt, based on date in file. Is that possible?
Yes; it is possible and not particularly hard. The split function is one way of dealing with breaking up the value in $1. The regex specifies that square brackets, slashes and commas are the field separators. There are 5 sub-fields in the value in $1: an empty field before the [, the three numeric components separated by slashes and an empty field after the ,. The array name, dmy, is mnemonic for the sequence in which the components are stored.
/^\[[0-3][0-9]\/[01][0-9]\/[12][0-9]{3},/ {
if ($1 != old_date)
{
if (outfile != "") close(outfile)
n = split($1, dmy, "[/\[,]")
outfile = sprintf("%s.%s.%s.txt", dmy[4], dmy[3], dmy[2])
old_date = $1
}
}
{ print > outfile }
Permute the numbers 4, 3, 2 in the sprintf() statement to suit yourself. The given order is year, month, day, which has many merits including that it is exploiting the ISO 8601 standard and the files sort automatically into date order. I strongly counsel its use, but you may do as you wish. For the sample data and the input shown in the question, the files it generates are:
2015.11.01.txt
2015.11.04.txt
2015.11.08.txt
2015.11.09.txt
2015.11.15.txt
This is my idea. I use sed command and awk script.
$ cat biglog
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
[04/11/2015, 12:21]
foo
bar
aaa
bbb
[08/11/2015, 14:12]
bar
foo
$ cat sample.awk
#!/bin/awk -f
BEGIN {
FS = "\n"
RS = "\n\n"
}
{
date = substr($1, 2, 2)
filename = date ".txt"
for (i = 2; i <= NF; i++) {
print $i >> filename
}
}
How to use
sed -e 's/^\(\[[0-9][0-9]\)/\n\1/' biglog | sed -e 1d | ./sample.awk
Confirmation
ls *.txt
01.txt 04.txt 08.txt
$ cat 01.txt
foo
bar
$ cat 04.txt
foo
bar
aaa
bbb
$ cat 08.txt
bar
foo
yet another awk
$ awk -F"[[/,]" -v d="." '/^[\[0-9\/, :\]]*$/{f=$4 d $3 d $2 d"txt"}
{print $0>f}' file
$ ls 20*
2015.11.01.txt 2015.11.04.txt 2015.11.08.txt 2015.11.09.txt 2015.11.15.txt
$ cat 2015.11.01.txt
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar

Insert lines in a file starting from a specific line

I would like to insert lines into a file in bash starting from a specific line.
Each line is a string which is an element of an array
line[0]="foo"
line[1]="bar"
...
and the specific line is 'fields'
file="$(cat $myfile)"
for p in $file; do
if [ "$p" = 'fields' ]
then insertlines() #<- here
fi
done
This can be done with sed: sed 's/fields/fields\nNew Inserted Line/'
$ cat file.txt
line 1
line 2
fields
line 3
another line
fields
dkhs
$ sed 's/fields/fields\nNew Inserted Line/' file.txt
line 1
line 2
fields
New Inserted Line
line 3
another line
fields
New Inserted Line
dkhs
Use -i to save in-place instead of printing to stdout
sed -i 's/fields/fields\nNew Inserted Line/'
As a bash script:
#!/bin/bash
match='fields'
insert='New Inserted Line'
file='file.txt'
sed -i "s/$match/$match\n$insert/" $file
Or anoter one example with the sed:
Prepare a test.txt file:
echo -e "line 1\nline 2\nline 3\nline 4" > /tmp/test.txt
cat /tmp/test.txt
line 1
line 2
line 3
line 4
Add a new line into the test.txt file:
sed -i '2 a line 2.5' /tmp/test.txt
# sed for in-place editing (-i) of the file: 'LINE_NUMBER a-ppend TEXT_TO_ADD'
cat /tmp/test.txt
line 1
line 2
line 2.5
line 3
line 4
This is definitely a case where you want to use something like sed (or awk or perl) rather than readling one line at a time in a shell loop. This is not the sort of thing the shell does well or efficiently.
You might find it handy to write a reusable function. Here's a simple one, though it won't work on fully-arbitrary text (slashes or regular expression metacharacters will confuse things):
function insertAfter # file line newText
{
local file="$1" line="$2" newText="$3"
sed -i -e "/^$line$/a"$'\\\n'"$newText"$'\n' "$file"
}
Example:
$ cat foo.txt
Now is the time for all good men to come to the aid of their party.
The quick brown fox jumps over a lazy dog.
$ insertAfter foo.txt \
"Now is the time for all good men to come to the aid of their party." \
"The previous line is missing 'bjkquvxz.'"
$ cat foo.txt
Now is the time for all good men to come to the aid of their party.
The previous line is missing 'bjkquvxz.'
The quick brown fox jumps over a lazy dog.
$
sed is your friend:
:~$ cat text.txt
foo
bar
baz
~$
~$ sed '/^bar/\na this is the new line/' text.txt > new_text.txt
~$ cat new_text.txt
foo
bar
this is the new line
baz
~$

How to use sed command to add a string before a pattern string?

I want to use sed to modify my file named "baz".
When i search a pattern foo , foo is not at the beginning or end of line, i want to append bar before foo, how can i do it using sed?
Input file named baz:
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
Output file
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
You can just use something like:
sed 's/foo/barfoo/g' baz
(the g at the end means global, every occurrence on each line rather than just the first).
For an arbitrary (rather than fixed) pattern such as foo[0-9], you could use capture groups as follows:
pax$ echo 'xyz fooA abc
xyz foo5 abc
xyz fooB abc' | sed 's/\(foo[0-9]\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc
xyz fooB abc
The parentheses capture the actual text that matched the pattern and the \1 uses it in the substitution.
You can use arbitrarily complex patterns with this one, including ensuring you match only complete words. For example, only changing the pattern if it's immediately surrounded by a word boundary:
pax$ echo 'xyz fooA abc
xyz foo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc' | sed 's/\(\bfoo[0-9]\b\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc
In terms of how the capture groups work, you can use parentheses to store the text that matches a pattern for later use in the replacement. The captured identifiers are based on the ( characters reading from left to right, so the regex (I've left off the \ escape characters and padded it a bit for clarity):
( ( \S* ) ( \S* ) )
^ ^ ^ ^ ^ ^
| | | | | |
| +--2--+ +--3--+ |
+---------1---------+
when applied to the text Pax Diablo would give you three groups:
\1 = Pax Diablo
\2 = Pax
\3 = Diablo
as shown below:
pax$ echo 'Pax Diablo' | sed 's/\(\(\S*\) \(\S*\)\)/[\1] [\2] [\3]/'
[Pax Diablo] [Pax] [Diablo]
Just substitute the start of the line with something different.
sed '/^foo/s/^/bar/'
To replace or modify all "foo" except at beginning or end of line, I would suggest to temporarily replace them at beginning and end of line with a unique sentinel value.
sed 's/^foo/____veryunlikelytoken_bol____/
s/foo$/____veryunlikelytoken_eol____/
s/foo/bar&/g
s/^____veryunlikelytoken_bol____/foo/
s/____veryunlikelytoken_eol____$/foo/'
In sed there is no way to specify "cannot match here". In Perl regex and derivatives (meaning languages which borrowed from Perl's regex, not necessarily languages derived from Perl) you have various negative assertions so you can do something like
perl -pe 's/(?!^)foo(?!$)/barfoo/g'

How do I write one-liner script that inserts the contents of one file to another file?

Say I have file A, in middle of which have a tag string "#INSERT_HERE#". I want to put the whole content of file B to that position of file A. I tried using pipe to concatenate those contents, but I wonder if there is more advanced one-line script to handle it.
$ cat file
one
two
#INSERT_HERE#
three
four
$ cat file_to_insert
foo bar
bar foo
$ awk '/#INSERT_HERE#/{while((getline line<"file_to_insert")>0){ print line };next }1 ' file
one
two
foo bar
bar foo
three
four
cat file | while read line; do if [ "$line" = "#INSERT_HERE#" ]; then cat file_to_insert; else echo $line; fi; done
Use sed's r command:
$ cat foo
one
two
#INSERT_HERE#
three
four
$ cat bar
foo bar
bar foo
$ sed '/#INSERT_HERE#/{ r bar
> d
> }' foo
one
two
foo bar
bar foo
three
four

Resources