How to speed up this log parser? - bash

I have a gigabytes-large log file of in this format:
2016-02-26 08:06:45 Blah blah blah
I have a log parser which splits up the single file log into separate files according to date while trimming the date from the original line.
I do want some form of tee so that I can see how far along the process is.
The problem is that this method is mind numbingly slow. Is there no way to do this quickly in bash? Or will I have to whip up a little C program to do it?
log_file=server.log
log_folder=logs
mkdir $log_folder 2> /dev/null
while read a; do
date=${a:0:10}
echo "${a:11}" | tee -a $log_folder/$date
done < <(cat $log_file)

read in bash is absurdly slow. You can make it faster, but you can probably get more speed up with awk:
#!/bin/bash
log_file=input
log_directory=${1-logs}
mkdir -p $log_directory
awk 'NF>1{d=l"/"$1; $1=""; print > d}' l=$log_directory $log_file
If you really want to print to stdout as well, you can, but if that's going to a tty it is going to slow things down a lot. Just use:
awk '{d=l"/"$1; $1=""; print > d}1' l=$log_directory $log_file
(Note the "1" after the closing brace.)

Try this awk solution - it should be pretty fast - it shows progress - only one file is kept open - also writes lines that don't start with a date to the current date file so lines are not lost - a default initial date is set to "0000-00-00" in case log starts with lines without dates
any timing comparison would be much appreciated
dir=$1
if [[ -z $dir ]]; then
echo >&2 "Usage: $0 outdir <logfile"
echo >&2 "outdir: directory where output files are created"
echo >&2 "logfile: input on stdin to split into output files"
exit 1
fi
mkdir -p $dir
echo "output directory \"$dir\""
awk -vdir=$dir '
BEGIN {
datepat="[0-9]{4}-[0-9]{2}-[0-9]{2}"
date="0000-00-00"
file=dir"/"date
}
date != $1 && $1 ~ datepat {
if(file) {
close(file)
print ""
}
print $1 ":"
date=$1
file=dir"/"date
}
{
if($1 ~ datepat)
line=substr($0,12)
else
line=$0
print line
print line >file
}
'
head -6 $dir/*
sample input log
first line without date
2016-02-26 08:06:45 0 Blah blah blah
2016-02-26 09:06:45 1 Blah blah blah
2016-02-27 07:06:45 2 Blah blah blah
2016-02-27 08:06:45 3 Blah blah blah
no date line
blank lines
another no date line
2016-02-28 07:06:45 4 Blah blah blah
2016-02-28 08:06:45 5 Blah blah blah
output
first line without date
2016-02-26:
08:06:45 0 Blah blah blah
09:06:45 1 Blah blah blah
2016-02-27:
07:06:45 2 Blah blah blah
08:06:45 3 Blah blah blah
no date line
blank lines
another no date line
2016-02-28:
07:06:45 4 Blah blah blah
08:06:45 5 Blah blah blah
==> tmpd/0000-00-00 <==
first line without date
==> tmpd/2016-02-26 <==
08:06:45 0 Blah blah blah
09:06:45 1 Blah blah blah
==> tmpd/2016-02-27 <==
07:06:45 2 Blah blah blah
08:06:45 3 Blah blah blah
no date line
blank lines
another no date line
==> tmpd/2016-02-28 <==
07:06:45 4 Blah blah blah
08:06:45 5 Blah blah blah

Related

Multiple lines added to vim line by line

Can you please help add multiple lines of txt to the file via bash script through vim?
I tried this:
vim -c "3 s/^/
add-this-line1
add-this-line2
add-this-line3/" -c "wq" /var/www/html/webserver/output_file.txt
But, the output of the file looks like this:
3 add-this-line1 add-this-line2 add-this-line3
What I want to do is, add the lines one by one FROM the line 3 in the output_file.txt not at the line 3 one next to another.
This is more of a job for ed, IMO
seq 10 > file
ed file <<END_ED
3a
first
second
third
.
wq
END_ED
For those new to ed, the line with the dot signals the end of "insert mode".
file now contains:
1
2
3
first
second
third
4
5
6
7
8
9
10
if you really want to do it via vim, I believe you need to insert new lines in your substitution:
vim -c "3 s/^/add-this-line1\radd-this-line2\radd-this-line3\r/" -c "wq" /var/www/html/webserver/output_file.txt
With ex or ed if available/acceptable.
printf '%s\n' '3a' 'foo' 'bar' 'baz' 'more' . 'w output_file.txt' | ex -s input_file.txt
Replace ex with ed and it should be the same output.
Using a bash array to store the data that needs to be inserted.
to_be_inserted=(foo bar baz more)
printf '%s\n' '3a' "${to_be_inserted[#]}" . 'w output_file.txt' | ex -s inputfile.txt
Again change ex to ed should do the same.
If the input file needs to be edited in-place then remove the output_file.txt just leave the w.
Though It seems you want to insert from the beginning of the line starting from line number "3 s/^/
Give the file.txt that was created by running
printf '%s\n' {1..10} > file.txt
A bit of shell scripting would do the trick.
#!/usr/bin/env bash
start=3
to_be_inserted=(
foo
bar
baz
more
)
for i in "${to_be_inserted[#]}"; do
printf -v output '%ds/^/%s/' "$start" "$i"
ed_array+=("$output")
((start++))
done
printf '%s\n' "${ed_array[#]}" ,p Q | ed -s file.txt
Output
1
2
foo3
bar4
baz5
more6
7
8
9
10
Change Q to w if in-place editing is needed.
Remove the ,p if you don't want to see the output.

running a script to read three values then output the first value to a txt file if the 2nd and 3rd add up to equal above a set number

I have a file with values within it that I need to sift through for specific reference numbers that are over a certain value. The trouble is that this file is also full of a lot of junk info that I don't need.
The file looks something like this:
file 657657/78687686
blah blah blah
blah
blah 5 blah 8 value1 456456 value2 678678 blah 7
blah 2 blah 5 value1 9878787 value2 4544454 blah 2
blah 1 blah 8 value1 4584 value2 21231232 blah 5
blah blah
blah
file 657657/78687686
blah blah blah
blah
blah 5 blah 0 value1 871245 value2 555558 blah 7
blah 6 blah 7 value1 6666 value2 777877 blah 1
I want to feed that into a script and have it add the values and work out if the total value is above say 500000. If it is then it sends the file number to a seperate txt file and then moves on to the next file number and so on.
I have no idea where to start with this, any help would be appreciated.
This is being run on an AIX box and in a .ksh

Indesign CC script to apply paragraph styles to multiple paragraphs

I have an Indesign document with the following structure:
paragraph 1 blah blah blah blah blah blah blah blah
paragraph 2 blah blah blah blah blah blah blah blah
paragraph 3 blah blah blah blah blah blah blah blah
paragraph 4 blah blah blah blah blah blah blah blah
paragraph 5 blah blah blah blah blah blah blah blah
. . . and so on...
Now I need to leave the first paragraph as is but apply paragraph styles to all the subsequent paragraphs in the following pattern:
paragraph 2: style A
paragraph 3: style B
paragraph 4: style A
paragraph 5: style B
. . . and so on (alternating pattern)...
I know this can be automated using scripts and I also know a bit of programming in general (JavaScript) but I have no idea how to go about doing this in Indesign. Any suggestion?
Try this script:
provided you have a text frame and you referenced it to a variable myFrame
for (i=0; i < myFrame.paragraphs.length; i++)
{
if ( i%2 == 0 )
{
myFrame.parentStory.paragraphs[i].appliedParagraphStyle = app.activeDocument.paragraphStyles.item('Style B);
}
else
{
myFrame.parentStory.paragraphs[i].appliedParagraphStyle = app.activeDocument.paragraphStyles.item('Style A);
}
}
Save it as a script in scripts folder and run from the scripts panel. You will need to add frame referencing.

Bash command output changes when stored in a variable

When I run the command:
git lg --since="24 hours ago" | tail -1
I get the expected result:
* f71da17 - blah blah blah (12 hours ago)
However, when I store this output in a variable and echo it to the console:
last_commit=$(git lg --since="24 hours ago" | tail -1); echo $last_commit
I get the unexpected result of:
dir1/ dir2/ dir3/ file1 file2 file3 * f71da17 - blah blah blah (12 hours ago)
It prepends every file in the current directory to the output. Any insight as to what's going on would be much appreciated!
The * in the variable's value is being glob expanded because you didn't quote the expansion.
Use echo "$last_commit"

How to check if a line has the right syntax like this?

if i have the following line syntax
FirstName, FamilyName, Address, PhoneNo
and i am reading a data file that contain information, how can i check that i reads a line with the right syntax ??
UPDATE::
i mean a function i send to it each line (from a while loop), and its return 0 if the line is correct and 1 if the line is not ?
UPDATE2::
the correct form is
first name(string), last name(string), address(string), phone no.(string)
so if the line is missing one or if there more than 4,, it should return a 1,,
Using Bash,
Good Input is ::
Rami, Jarrar, Jenin - Wadi berqen, 111 111
# Some Cases To Deal With
, Jarrar, Jenin - Wadi berqen, 111 111
- Extra Spaces::
Rami, Jarrar, Jenin - Wadi berqen, 111 111
Rami, Jarrar, Jenin - Wadi berqen, 111 111, 213 3123
ALSO ANOTHER UPDATE :)
check(){
x=$(echo "$#" | grep -q '^[^,]\+,[^,]\+,[^,]\+,[^,]\+$')
return $x
}
len=#number of lines in the file
i=1
while [ $i -le $len ]; do
line=$(cat $file)
#------this is where i call the func-----
check $line
if [ $? -eq 1 ];then
echo "ERROR"
else
echo "Good Line"
fi
BASH 2.3.39
*GREP 2.5.3*
UPDATE
now if i make the correct format like this ::
string, value, value, value
value : is a positive integer
what this line should be replaced ::
x=$(echo "$#" | grep -q '^[^,]\+,[^,]\+,[^,]\+,[^,]\+$')
??
Allows empty fields:
check () { echo "$#" | grep -q '^[^,]*,[^,]*,[^,]*,[^,]*$'; }
Does not allow any field to be empty:
check () { echo "$#" | grep -q '^[^,]\+,[^,]\+,[^,]\+,[^,]\+$'; }
Bourne shell without using external utilities (allows empty fields):
check () { local IFS=,; set -- $#; return $(test -n "$4" -a -z "$5"); }
Bash 3.2 or greater (allows empty fields):
check () { [[ $# =~ ^[^,]*,[^,]*,[^,]*,[^,]*$ ]]; }
Bash 3.2 or greater (does not allow empty fields):
check () { [[ $# =~ ^[^,]+,[^,]+,[^,]+,[^,]+$ ]]; }
is_correct () {
grep -q '^[^ ][^,]\+, [^ ][^,]\+, [^ ][^,]\+, [^ ][^,]\+$' <<< "$#"
}
l=0
while read line ; do
is_correct "$line" && echo line $l ok || echo Invalid syntax on line $l
((l+=1))
done <<<"Rami, Jarrar, Jenin - Wadi berqen, 111 111
, Jarrar, Jenin - Wadi berqen, 111 111
- Extra Spaces::
Rami, Jarrar, Jenin - Wadi berqen, 111 111
Rami, Jarrar, Jenin - Wadi berqen, 111 111, 213 3123
A line, containg fields with, many spaces, but otherwise valid
a, b, c, d
aa, bb, cc, dd"
Yields:
line 0 ok
Invalid syntax on line 1
Invalid syntax on line 2
Invalid syntax on line 3
Invalid syntax on line 4
line 5 ok
Invalid syntax on line 6
line 7 ok
Correctly throws out the all but the sample good line, including the "too many spaces" case. The only place where it fails is if a field has only one character in it.

Resources