Split text file basing on date tag / timestamp - bash

I have big log file containing date tags. It looks like this:
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
[04/11/2015, 12:21]
foo
bar
[08/11/2015, 14:12]
bar
foo
[09/11/2015, 11:25]
...
[15/11/2015, 19:22]
...
[15/11/2015, 21:55]
...
and so on. I need to split these data into files of days, like:
01.txt:
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
04.txt:
[04/11/2015, 12:21]
foo
bar
etc. How can I do that using any of unix tools?

I don't think there's a tool that will do it without a little programming, but with Awk the little programming really isn't all that hard.
script.awk
/^\[[0-3][0-9]\/[01][0-9]\/[12][0-9]{3},/ {
if ($1 != old_date)
{
if (outfile != "") close(outfile);
outfile = sprintf("%.2d.txt", ++filenum);
old_date = $1
}
}
{ print > outfile }
The first (bigger) block of code recognizes the date string, which is also in $1 (so the condition could be made more precise by referring to $1, but the benefit it minimal to non-existent). Inside the actions, it checks to see if the date is different from the last date it remembered. If so, it checks whether it has a file open and closes it if necessary (close is part of POSIX awk). Then it generates a new file name, and remembers the current date it is processing.
The second smaller block simply writes the current line to the current file.
Invocation
awk -f script.awk data
This assumes you have a file script.awk; you could provide it as a script argument if you prefer. If the whole is encapsulated in a shell script, I'd use an expression rather than a second file, but I find it convenient for development to use a file. (The shell script would contain awk '…the script…' "$#" with no separate file.)
Example output files
Given the sample data from the question, the output is in five files, 01.txt .. 05.txt.
$ for file in 0?.txt; do boxecho $file; cat $file; done
************
** 01.txt **
************
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
************
** 02.txt **
************
[04/11/2015, 12:21]
foo
bar
************
** 03.txt **
************
[08/11/2015, 14:12]
bar
foo
************
** 04.txt **
************
[09/11/2015, 11:25]
...
************
** 05.txt **
************
[15/11/2015, 19:22]
...
[15/11/2015, 21:55]
...
$
The boxecho command is a simple script that echoes its arguments in a box of stars:
echo "** $* **" | sed -e h -e s/./*/g -e p -e x -e p -e x
Revised file name format
I wish have output as a [day].txt or [day].[month].[year].txt, based on date in file. Is that possible?
Yes; it is possible and not particularly hard. The split function is one way of dealing with breaking up the value in $1. The regex specifies that square brackets, slashes and commas are the field separators. There are 5 sub-fields in the value in $1: an empty field before the [, the three numeric components separated by slashes and an empty field after the ,. The array name, dmy, is mnemonic for the sequence in which the components are stored.
/^\[[0-3][0-9]\/[01][0-9]\/[12][0-9]{3},/ {
if ($1 != old_date)
{
if (outfile != "") close(outfile)
n = split($1, dmy, "[/\[,]")
outfile = sprintf("%s.%s.%s.txt", dmy[4], dmy[3], dmy[2])
old_date = $1
}
}
{ print > outfile }
Permute the numbers 4, 3, 2 in the sprintf() statement to suit yourself. The given order is year, month, day, which has many merits including that it is exploiting the ISO 8601 standard and the files sort automatically into date order. I strongly counsel its use, but you may do as you wish. For the sample data and the input shown in the question, the files it generates are:
2015.11.01.txt
2015.11.04.txt
2015.11.08.txt
2015.11.09.txt
2015.11.15.txt

This is my idea. I use sed command and awk script.
$ cat biglog
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
[04/11/2015, 12:21]
foo
bar
aaa
bbb
[08/11/2015, 14:12]
bar
foo
$ cat sample.awk
#!/bin/awk -f
BEGIN {
FS = "\n"
RS = "\n\n"
}
{
date = substr($1, 2, 2)
filename = date ".txt"
for (i = 2; i <= NF; i++) {
print $i >> filename
}
}
How to use
sed -e 's/^\(\[[0-9][0-9]\)/\n\1/' biglog | sed -e 1d | ./sample.awk
Confirmation
ls *.txt
01.txt 04.txt 08.txt
$ cat 01.txt
foo
bar
$ cat 04.txt
foo
bar
aaa
bbb
$ cat 08.txt
bar
foo

yet another awk
$ awk -F"[[/,]" -v d="." '/^[\[0-9\/, :\]]*$/{f=$4 d $3 d $2 d"txt"}
{print $0>f}' file
$ ls 20*
2015.11.01.txt 2015.11.04.txt 2015.11.08.txt 2015.11.09.txt 2015.11.15.txt
$ cat 2015.11.01.txt
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar

Related

How to find the next occurrence of a string in a file starting from $x line number.

In the following example, I have a variable $line set to "14" which relates to the 14th line of the file which is "four".
I need to find a way in BASH to start at whichever line number is set in ($line) (14) and find the line number of the next occurrence of a string (foo) in the same file. In this case , the result would be line 16.
one
foo
bar
foo
two
foo
three
foo
foo
bar
foo
foo
foo
four
bar
foo
bar
five
foo
six
foo
foo
bar
foo
$line = "14"
$search = "foo"
Lots of ways to do this. But if you want to do it in bash alone (i.e. no awk, perl, etc), here's one way:
$ mapfile -t -O 1 a < ttt
$ for ((i=$line; i<=${#a[#]}; i++)); do [[ "${a[$i]}" = "$string" ]] && echo "$i" && break; done
The mapfile command sucks your file into an array (in a way that performs better than read in a loop), and the -O 1 causes it to start numbering array indices at 1 instead of 0. The for loop steps through the array, starting at $line, with a [[ that compares the current array value with $string.
I'd still love to see what solution you came up with, and help you understand why it didn't work.
Try awk:
line=14
search=foo
awk '(NR > n && $0 == s){ print NR; exit }' n=$line s="$search" file
Output:
16
You can assign the result to a variable this way:
declare -i var=$(awk '(NR > n && $0 == s){ print NR; exit }' n=$line s="$search" file)
sed one-liner -
Using your data, naming it "infile", I get
$: line=14 search=foo sed -n $line,\${/$search/\{=\;q}} infile
16
This sets variables for the env before executing sed, and tells it not to output anything unasked with -n.
the command $line,\${/$search/\{=\;q}} evaluates to 14,${/foo/{=;q}}
That means "from line 14 to the end, find the next line that has "foo" in it, and then
print the line number (=) and
quit (q) without processing the rest of the file.

Use `sed` to replace text in code block with output of command at the top of the code block

I have a markdown file that has snippets of code resembling the following example:
```
$ cat docs/code_sample.sh
#!/usr/bin/env bash
echo "Hello, world"
```
This means there there's a file at the location docs/code_sample.sh, whose contents is:
#!/usr/bin/env bash
echo "Hello, world"
I'd like to parse the markdown file with sed (awk or perl works too) and replace the bottom section of the code snippet with whatever the above bash command evaluates to, for example whatever cat docs/code_sample.sh evaluates to.
Perl to the rescue!
perl -0777 -pe 's/(?<=```\n)^(\$ (.*)\n\n)(?^s:.*?)(?=```)/"$1".qx($2)/meg' < input > output
-0777 slurps the whole file into memory
-p prints the input after processing
s/PATTERN/REPLACEMENT/ works similarly to a substitution in sed
/g replaces globally, i.e. as many times as it can
/m makes ^ match start of each line instead of start of the whole input string
/e evaluates the replacement as code
(?<=```\n) means "preceded by three backquotes and a newline"
(?^s:.*?) changes the behaviour of . to match newlines as well, so it matches (frugally because of the *?) the rest of the preformatted block
(?=```) means "followed by three backquotes`
qx runs the parameter in a shell and returns its output
A sed-only solution is easier if you have the GNU version with an e command.
That said, here's a quick, simplistic, and kinda clumsy version I knocked out that doesn't bother to check the values of previous or following lines - it just assumes your format is good, and bulls through without any looping or anything else. Still, for my example code, it worked.
I started by making an a, a b, and an x that is the markup file.
$: cat a
#! /bin/bash
echo "Hello, World!"
$: cat b
#! /bin/bash
echo "SCREW YOU!!!!"
$: cat x
```
$ cat a
foo
bar
" b a z ! "
```
```
$ cat b
foo
bar
" b a z ! "
```
Then I wrote s which is the sed script.
$: cat s
#! /bin/env bash
sed -En '
/^```$/,/^```$/ {
# for the lines starting with the $ prompt
/^[$] / {
# save the command to the hold space
x
# write the ``` header to the pattern space
s/.*/```/
# print the fabricated header
p
# swap the command back in
x
# the next line should be blank - add it to the current pattern space
N
# first print the line of code as-is with the (assumed) following blank line
p
# scrub the $ (prompt) off the command
s/^[$] //
# execute the command - store the output into the pattern space
e
# print the output
p
# put the markdown footer back
s/.*/```/
# and print that
p
}
# for the (to be discarded) existing lines of "content"
/^[^`$]/d
}
' $*
It does the job and might get you started.
$: s x
```
$ cat a
#! /bin/bash
echo "Hello, World!"
```
```
$ cat b
#! /bin/bash
echo "SCREW YOU!!!!"
```
Lots of caveats - better to actually check that the $ follows a line of backticks and is followed by a blank line, maybe make sure nothing bogus could be in the file to get executed... but this does what you asked, with (GNU) sed.
Good luck.
A rare case when use of getline would be appropriate:
$ cat tst.awk
state == "importing" {
while ( (getline line < $NF) > 0 ) {
print line
}
close($NF)
state = "imported"
}
$0 == "```" { state = (state ? "" : "importing") }
state != "imported" { print }
$ awk -f tst.awk file
See http://awk.freeshell.org/AllAboutGetline for getline uses and caveats.

Multiple "sed" actions on previous results

Have this input:
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
on the above I need do 4 (sequential) actions:
select only lines containing "foo" (lowercase)
on the selected lines, remove everything but UPPERCASE letters
delete empty lines (if some is created by the previous action)
and on the remaining from the above - enclose every char with [x]
I'm able to solve the above, but need two sed invocations piped together. Script:
#!/bin/bash
data() {
cat <<EOF
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
EOF
}
echo "Result OK"
data | sed -n '/foo/s/[^A-Z]//gp' | sed '/^\s*$/d;s/./[&]/g'
# in the above it is solved using 2 sed invocations
# trying to solve it using only one invocation,
# but the following doesn't do what i need.. :( :(
echo "Variant 2 - trying to use only ONE invocation of sed"
data | sed -n '/foo/s/[^A-Z]//g;/^\s*$/d;s/./[&]/gp'
output from the above:
Result OK
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Variant 2 - trying to use only ONE invocation of sed
[A][B][C][D][E][F]
[B][A][R][ ][A][B][C]
[A][B][C][D][E][F]
The variant 2 should be also only
[A][B][C][D][E][F]
[A][B][C][D][E][F]
It is possible to solve the above using only by one sed invocation?
sed -n '/foo/{s/[^A-Z]//g;/^$/d;s/./[&]/g;p;}' inputfile
Output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Alternative sed approach:
sed '/foo/!d;s/[^A-Z]//g;/./!d;s/./[&]/g' file
The output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
/foo/!d - deletes all lines that don't contain foo
/./!d - deletes all empty lines

sed to grep lines after specific line for further processing

I am working with a script which looks for file lines after a specific line and process them to get data from it.
Let me illustrate with an example,
if file "sample.log" has lines like
qwerty asdf foo bar
foo
time: 1:00 PM
foo1 bar1
foo foo fooo copying file abc/def/ghi/foo.txt
bar bar1 bar2 copying file efg/qwe/bar.txt
foo
My script should search for contents after time: 1:00 PM. After finding those lines, it must look for lines matching the pattern "copying" and get the path specified in the line.
In this case, output written to another file should be
abc/def/ghi/foo.txt
efg/qwe/bar.txt
I tried this using following command but getting empty string as output. Please guide me with this
sed -n '/^time: 1:00 PM/{/^(.*)copying file/s/^(.*)copying file //p}' ../../sample.log
If you're already in Tcl, you could code it in Tcl:
set fid [open "FILE" r]
set have_time false
while {[gets $fid line] != -1} {
if {$have_time && [regexp {copying file (.*)} $line -> filename]} {
puts $filename
} elseif {[string first "time:" $line] > -1} {
set have_time true
}
}
close $fid
If your file is quite huge, exec sed may be faster, but you'll have to see for yourself.
Note, if you're going to exec sed, keep in mind that inside Tcl, single quotes have no special meaning: use braces to quote the sed program.
exec sed -e {do stuff here} FILE
sed '/1:00 PM/,$ {/copying/s:.*file \(.*\):\1:p};d' FILE
This might work for you (GNU sed):
sed -ne '/1:00 PM/,$!b' -e 's/.*copying.* //w copy' file

How do I write one-liner script that inserts the contents of one file to another file?

Say I have file A, in middle of which have a tag string "#INSERT_HERE#". I want to put the whole content of file B to that position of file A. I tried using pipe to concatenate those contents, but I wonder if there is more advanced one-line script to handle it.
$ cat file
one
two
#INSERT_HERE#
three
four
$ cat file_to_insert
foo bar
bar foo
$ awk '/#INSERT_HERE#/{while((getline line<"file_to_insert")>0){ print line };next }1 ' file
one
two
foo bar
bar foo
three
four
cat file | while read line; do if [ "$line" = "#INSERT_HERE#" ]; then cat file_to_insert; else echo $line; fi; done
Use sed's r command:
$ cat foo
one
two
#INSERT_HERE#
three
four
$ cat bar
foo bar
bar foo
$ sed '/#INSERT_HERE#/{ r bar
> d
> }' foo
one
two
foo bar
bar foo
three
four

Resources