merge lines based on pattern - bash

I have been struggling to figure out how to 'unparse' lines in an log file (with 2 new line delimiters - '#' and '|') so all lines related to one time stamp are on one line.
Example:
2016-03-22 blah blah blah
|blah blah
|blah blah blah
#blah
|blah blah blah
2016-03-22 blah blah blah
|blah blah blah
#blah blah
#blah blah blah
|blah
Required Output
2016-03-22 blah blah blah |blah blah |blah blah blah #blah |blah blah blah
2016-03-22 blah blah blah |blah blah blah #blah blah #blah blah blah |blah
I thought I had this sussed simply by using xarg to put everything on one line then using sed to add new lines at 2016 but i discovered there is a limit on characters on one line and the log file is so big xargs was creating multiple lines.
Removing the carriage returns from lines starting with | and # would solve this but can't fathom how to do this either.
I've searched on here and found a few people posting similar questions but I can't interpret some of the solutions to fit in with my issue as I'm not familiar enough with sed/awk/xargs.
Would appreciate if anyone can offer some suggestions.
Thanks

You can use this awk command:
awk '/^[0-9]{4}(-[0-9]{2}){2}/ {
if (p!="")
print p
p=$0
next
}
{
p = p OFS $0
}
END {
print p
}' file
2016-03-22 blah blah blah |blah blah |blah blah blah #blah |blah blah blah
2016-03-22 blah blah blah |blah blah blah #blah blah #blah blah blah |blah

anubhava's answer works but it buffers the entirety of each line before printing it.
This prints as it reads each input line.
awk '{printf "%s%s", /^[|#]/?OFS:(NR>1)?"\n":"", $0} END{print ""}'
/^[|#]/ match lines starting with # or |
?OFS if matched lead with OFS (output field separator, space by default)
: otherwise
(NR>1) if we aren't on the first line
?"\n" output a newline
:"" otherwise output a blank (to avoid a blank line at the top of the output)
END{print ""} make sure we end the last line with a newline

This might work for you (GNU sed):
sed ':a;N;/\n....-..-.. /!s/\n/ /;ta;P;D' file
Read two lines into the pattern space and if the newline is not the start of a new record, replace it by a space and repeat i.e. append another line to the existing one etc.
If the line appended is the start of a new record, print the first line, delete it and repeat.

Remove the newlines, add a newline at the end of the line and insert newlines before each 2016:
echo '2016-03-22 blah blah blah
|blah blah
|blah blah blah
#blah
|blah blah blah
2016-03-22 blah blah blah
|blah blah blah
#blah blah
#blah blah blah
|blah ' | tr -d '\n' | sed -e 's/$/\n/' -e 's/2016-/\n2016-/g'

But how to merge lines (only words from lines), when this word exists in both files?
All words are changing automaticaly and files 1.txt and 2.txt are changing automatically too as part of package manager's script in Gnome 2 environment. And "link" means http://link
example INPUT:
1.txt contains detected http and version of packages:
link1/autotools-dev_20100122.1
link4/debhelper_8.0.0
link5/dreamchess_0.2.0
link5/dreamchess_0.2.0-2
link7/quilt_0.48
link7/quilt_0.48-7
link34/quilt-el_0.46.2
link34/quilt-el_0.46.2-1
2.txt contains needed extensions of packages:
autotools-dev_*.diff.gz
debhelper_*.diff.gz
debhelper_*.orig.tar.gz
libmxml-dev_*.diff.gz
libmxml-dev_*.dsc
libmxml-dev_*.orig.tar.gz
libsdl1.2-dev_*.diff.gz
libsdl1.2-dev_*.dsc
libsdl1.2-dev_*.orig.tar.gz
libsdl-image1.2-dev_*.diff.gz
libsdl-image1.2-dev_*.dsc
libsdl-image1.2-dev_*.orig.tar.gz
quilt_*.diff.gz
DESIRED OUTPUT to file 3.txt:
link1/autotools-dev_20100122.1.diff.gz
link4/debhelper_8.0.0.diff.gz
link4/debhelper_8.0.0.orig.tar.gz
libmxml-dev_*.diff.gz
libmxml-dev_*.dsc
libmxml-dev_*.orig.tar.gz
libsdl1.2-dev_*.diff.gz
libsdl1.2-dev_*.dsc
libsdl1.2-dev_*.orig.tar.gz
libsdl-image1.2-dev_*.diff.gz
libsdl-image1.2-dev_*.dsc
libsdl-image1.2-dev_*.orig.tar.gz
link7/quilt_0.48.diff.gz
link7/quilt_0.48-7.diff.gz
So needed script, which automaticaly detects common package name in files 1.txt and 2.txt and to file 3.txt suitable inserts to the same line where package name exist:
http and version from file 1.txt
extension from file 2.txt
lines from file 2.txt which not contain package name in file 1.txt

Related

Awk multiple file manipulation

Ok, let's try this again.
How can I open multiple files within AWK, and then just print them all to standard output? The following prints only the first line of each file.
BEGIN {
}
{
$file = $1;
(getline < $file)
print $0;
}
awk -f program.awk myindex
myindex is a list of files
file1
file2
file3
file4
an example of file1
rigrg
gdfgbt
rfghrth
thfg
bhtd
ht
hthrtjhrth
rtg
rthhrthrt
It sounds like you need something like this:
awk '
NR == FNR { ARGV[ARGC++]=$0; next }
FNR == 1 { found=0 }
$2 == "motd" { found=1 }
found
$1 == "customer" { nextfile }
' myindex
Untested of course since you didn't provide testable sample input/output. The above uses GNU awk for nextfile, with other awks replace nextfile with found=0; next.
I'll propose a different approach since getline use needs to be very precise...
$ awk '/motd/{p=1} /Customer/{p=0} p' $(awk '{print $0".info"}' index)
motd
good stuff 1
good stuff 1
motd
good stuff 2
good stuff 2
motd
good stuff 3
good stuff 3
prepare the file names as arguments to the main script. I added 1/2/3 suffix to show that the data is coming from the corresponding file.
where
==> index <==
one
two
three
==> one.info <==
blah
blah
blah
motd
good stuff 1
good stuff 1
Customer
blah
blah
end
==> three.info <==
blah
blah
blah
motd
good stuff 3
good stuff 3
Customer
blah
blah
end
==> two.info <==
blah
blah
blah
motd
good stuff 2
good stuff 2
Customer
blah
blah
to print lines between motd and Customer from all files listed in
index file
cat + sed pipeline:
cat index | xargs -I {} sed -n '/^motd$/,/^Customer$/{/^motd$/d; /^Customer$/d;p}' {}".information"
The above will output the needed lines excluding pattern lines

Comment multiple lines between two markers in a file

Let's say I have a file like this
blah
blah
MARKER 1
blah
blah
blah
MARKER 2
blah
I want to find a single line command (awk? sed?) in bash to change it in
blah
blah
# MARKER 1
# blah
# blah
# blah
# MARKER 2
blah
same in sed
$ sed '/^MARKER 1/,/^MARKER 2/s/^/#/' file
blah
blah
#MARKER 1
#blah
#blah
#blah
#MARKER 2
blah
Updated as per anubhava suggested :
awk '/^MARKER 1/,/^MARKER 2/{$0 = "#" $0} 1' testt
blah
blah
#MARKER 1
#blah
#blah
#blah
#MARKER 2
blah

Sed match on multiple file, displaying the match together with filename and line number

This is a continuance to Multiple line, repeated occurence matching
I have many test*.txt files with contents as per previous thread.
test1.txt
blah blah..
blah blah..
blah abc blah1
blah blah..
blah blah..
blah abc blah2
blah blah..
blah efg1 blah blah
blah efg2 blah blah
blah blah..
blah blah..
blah abc blah3
blah blah..
blah blah..
blah abc blah4
blah blah..
blah blah blah
blah abc blah5
blah blah..
blah blah..
blah abc blah6
blah blah..
blah efg3 blah blah
blah efg4 blah blah
blah abc blah7
blah blah..
blah blah..
blah abc blah8
blah blah..
Now I wanted to modify the output to run the sed command on all files, but also displaying the filename together with line number (if possible) with the output of the sed command...
I run below command
ls test*.txt | xargs sed -n -f findMatch.txt
findMatch.txt content
/abc/h;/efg/!b;x;/abc/p;z;x
output is
blah abc blah2
blah abc blah6
blah abc blah2
blah abc blah6
blah abc blah2
blah abc blah6
I need a bit more detailed output as per below
test1.txt ln6 blah abc blah2
test1.txt ln23 blah abc blah6
test2.txt ln6 blah abc blah2
test2.txt ln23 blah abc blah6
test3.txt ln6 blah abc blah2
test3.txt ln23 blah abc blah6
grep command used to search the particular pattern to all files.
grep -f patten_file -Rn *.txt
-R Recursive
-n Line number
Patten_file
hai
hello
this
Output:
1.txt:1:hai
1.txt:2:hello
2.txt:1:hai
2.txt:2:this

grep/awk each line from a file from another file from first column

I have a file (file_1) that I would like to read through and grep each line from another file (file_2), but it should only match from the first column from this file.
file_1
1
2
78
GL.1234
22
file_2
#blahblah hello this is some file
1 this is still some file 345
1 also still a 12 file
78 blah blah blah
22 oh my gosh, still a file!
GL.1234 hey guys, it's me. just being a file
2 i think that's it.
output
1 this is still some file 345
1 also still a 12 file
2 i think that's it.
22 oh my gosh, still a file!
78 blah blah blah
GL.1234 hey guys, it's me. just being a file
I have tried:
cat file_1.txt | while read line; do awk -v line = $line '{if ($1 == line) print $0;}' < file_2.txt > output.txt; done
and
cat file_1.txt | while read line; do grep -E '$line\b' < file_2.txt > output.txt; done
Looking at your script it seems it can all be done in a single awk:
awk 'NR==FNR{seen[$1]; next} $1 in seen' file1 file2
Output:
1 this is still some file 345
1 also still a 12 file
78 blah blah blah
22 oh my gosh, still a file!
GL.1234 hey guys, it's me. just being a file
2 i think that's it.
Basically we swipe through file first and store first column in an associative array seen. Later we check whether column1 of file2 exists in this array and print the record.

Bash & Printf: How can I both right pad and truncate?

In Bash ...
I know how to right pad with printf
printf "%-10s" "potato"
I know how to truncate with printf
printf "%.10s" "potatos are my best friends"
How can I do both at the same time?
LIST="aaa bbbbb ccc ddddd"
for ITEM in $LIST; do
printf "%-.4s blah" $ITEM
done
This prints
aaa blah
bbbbb blah
ccc blah
ddddd blah
I want it to print
aaa blah
bbbb blah
ccc blah
dddd blah
I'd rather not do something like this (unless there's no other option):
LIST="aaa bbbbb ccc ddddd"
for ITEM in $LIST; do
printf "%-4s blah" $(printf "%.4s" "$ITEM")
done
though, obviously, that works (it feels ugly and hackish).
You can use printf "%-4.4s for getting both formatting in output:
for ITEM in $LIST; do printf "%-4.4s blah\n" "$ITEM"; done
aaa blah
bbbb blah
ccc blah
dddd blah

Resources