Prepend to lines of a program as they come in

Prepend to lines of a program as they come in - bash

I'm running xinput test and trying to timestamp the data.
From another question, I'm using :
xinput test $KEYBOARD_ID | (echo -n $(date +"$date_format") && cat) > $LOGFILE_NAME
However, that dates the first line, not every line.
If I do a while loop:
while IFS= read -r line
do
echo -n $(date +"date_format") &&cat)
done < $(xinput test $KEYBOARD_ID)
The loop exits right away, since xinput test is yet to generate any text.
Process substitution fails as well, only dating the first line of the file.
while IFS= read -r line
do
(echo -n $(date +"$date_format") && cat) > $LOGFILE_NAME
done < <(xinput test $KEYBOARD_ID)
Writing to file and post-processing won't work, because I need the timestamp when each line was processed.
I feel like I'm making a small error, but I can't find it, any input?

The following GNU awk command is equivalent to #karakfa's answer, but launches fewer processes, so it could be faster if the device is generating a lot of events:
xinput test "$KEYBOARD_ID" | gawk '{print strftime(), $0}' > "$LOGFILE_NAME"

perhaps this will help...
$ seq 10 | xargs -n1 -I {} echo $(date) {}
Wed May 10 14:43:09 EDT 2017 1
Wed May 10 14:43:09 EDT 2017 2
Wed May 10 14:43:09 EDT 2017 3
Wed May 10 14:43:09 EDT 2017 4
Wed May 10 14:43:09 EDT 2017 5
Wed May 10 14:43:09 EDT 2017 6
Wed May 10 14:43:09 EDT 2017 7
Wed May 10 14:43:09 EDT 2017 8
Wed May 10 14:43:09 EDT 2017 9
Wed May 10 14:43:09 EDT 2017 10
Note that, as commented below, this time stamp won't be updated for each line, if you want to time stamp each new line the gawk solution by user000001

I feel like I'm making a small error, but I can't find it
Yep. It's the cat. It reads the rest of the input and puts it there. Instead, you should just write the current line, and append it to the file:
while IFS= read -r line
do
(echo "$(date +"$date_format") $line") >> $LOGFILE_NAME
done < <(xinput test $KEYBOARD_ID)
Which can more canonically be written as
while IFS= read -r line
do
echo "$(date +"$date_format") $line"
done < <(xinput test $KEYBOARD_ID) > "$LOGFILE_NAME"
I would go for #user000001's shorter and more efficient solution though.

Related

Find and Echo only the date (with format) in String Output on Bash

I am trying to get the date "+%a %b %d %R:%S %Y" in bash.
here's the sample command and output
$ xscreensaver-command --time
XScreenSaver 5.32: screen non-blanked since Thu Oct 29 12:15:05 2015 (hacks: #184, #60)
I am trying to get the the value Thu Oct 29 12:15:05 2015 on the string.
How can I achieve this?

Try to append with GNU grep:
2>&1 | grep -Po 'since \K.*(?= \()'
Output:
Thu Oct 29 12:15:05 2015

bash tail on a live log file, counting uniq lines with same date/time

I'm looking for a good way to tail on a live log file, and display number of lines with the same date/time.
Currently this is working:
tail -F /var/logs/request.log | [cut the date-time] | uniq -c
BUT the performance is not good enough. There is a delay of more than one minute, and it output in bulks of few lines each time.
Any idea?

Your problem is most likely related to buffering in your system, not anything intrinsically wrong with your line of code. I was able to create a test scenario where I could reproduce it - then make it go away. I hope it will work for you too.
Here is my test scenario. First I write a short script that writes the time to a file every 100 ms (approx) - this is my "log file" that generates enough data that uniq -c should give me an interesting output every second:
#!/bin/ksh
while :
do
echo The time is `date` >> a.txt
sleep 0.1
done
(Note - I had to use ksh which has the ability to do a sub-second sleep)
In another window, I type
tail -f a.txt | uniq -c
Sure enough, you get the following output appearing every second:
9 The time is Thu Dec 12 21:01:05 EST 2013
10 The time is Thu Dec 12 21:01:06 EST 2013
10 The time is Thu Dec 12 21:01:07 EST 2013
9 The time is Thu Dec 12 21:01:08 EST 2013
10 The time is Thu Dec 12 21:01:09 EST 2013
9 The time is Thu Dec 12 21:01:10 EST 2013
10 The time is Thu Dec 12 21:01:11 EST 2013
10 The time is Thu Dec 12 21:01:12 EST 2013
etc. No delays. Important to note - I did not attempt to cut out the time. Next, I did
tail -f a.txt | cut -f7 -d' ' | uniq -c
And your problem reproduced - it would "hang" for quite a while (until there was 4k of characters in the buffer, and then it would vomit it all out at once).
A bit of searching online ( https://stackoverflow.com/a/16823549/1967396 ) told me of a utility called stdbuf . At that reference, it specifically mentions almost exactly your scenario, and they provide the following workaround (paraphrasing to match my scenario above):
tail -f a.txt | stdbuf -oL cut -f7 -d' ' | uniq -c
And that would be great… except that this utility doesn't exist on my machine (Mac OS) - it is specific to GNU coreutils. This left me unable to test - although it may be a good solution for you.
Never fear - I found the following workaround, based on the socat command (which I honestly barely understand, but I adapted from the answer given at https://unix.stackexchange.com/a/25377 ).
Make a small file called tailcut.sh (this is the "long_running_command" from the link above):
#!/bin/ksh
tail -f a.txt | cut -f7 -d' '
Give it execute permissions with chmod 755 tailcut.sh . Then issue the following command:
socat EXEC:./tailcut.sh,pty,ctty STDIO | uniq -c
And hey presto - your lumpy output is lumpy no more. The socat sends the output from the script straight to the next pipe, and uniq can do its thing.

You may try logtop, (apt-get install logtop):
Usage:
tail -F /var/logs/request.log | [cut the date-time] | logtop
Example:
$ tail -f /var/log/varnish/varnishncsa.log | awk '{print $4}' | logtop
5585 elements in 10 seconds (558.50 elements/s)
1 690 69.00/s [28/Mar/2015:23:13:48
2 676 67.60/s [28/Mar/2015:23:13:47
3 620 62.00/s [28/Mar/2015:23:13:49
4 576 57.60/s [28/Mar/2015:23:13:53
5 541 54.10/s [28/Mar/2015:23:13:54
6 540 54.00/s [28/Mar/2015:23:13:55
7 511 51.10/s [28/Mar/2015:23:13:51
8 484 48.40/s [28/Mar/2015:23:13:52
9 468 46.80/s [28/Mar/2015:23:13:50
Columns are, from left to right:
Just row number
qte seen
hits per second
the actual line

Consider how uniq -c is working.
In order to print the count, it needs to read all the unique lines and only once a line that is different from the previous one, it can print the line and number of occurences.
That's just how the algorithm fundamentally works and there is no way around it.
You can test this by running
touch a
tail -F a | uniq -c
And then one after another
echo 1 >> a
echo 1 >> a
echo 1 >> a
nothing happens. Only after you run
echo 2 >> a
uniq can print there were 3 "1\n" occurences.

sed: convert time(3) seconds in a table into printable date (spamdb)

I get the following from spamdb, where the third field represents the time in seconds since the Epoch.
Cns# spamdb | fgrep TRAPPED
TRAPPED|113.163.117.129|1360836903
TRAPPED|113.171.216.201|1360837481
TRAPPED|122.177.159.61|1360844596
TRAPPED|36.231.9.231|1360865649
TRAPPED|37.146.207.209|1360832096
TRAPPED|212.156.98.210|1360837015
TRAPPED|59.99.160.62|1360839785
TRAPPED|86.127.116.162|1360840492
TRAPPED|92.83.139.194|1360843056
TRAPPED|219.71.12.150|1360844704
I want to sort this table by the time, and print the time field with date -r, such that it's presentable and clear when the event has occurred.
How do I do this in tcsh on OpenBSD?
Sorting with sort is easy, and so is editing with sed; but how do I make sed execute date -r or equivalent?

There are indeed a few obstacles here: first, you basically have to separate the data, and then one part of it is presented as-is, whereas another part has to be passed down to date -r for date formatting, prior to being presented to the user.
Another obstacle is making sure the output is aligned: apparently, it's quite difficult to handle the tab character in the shell, possibly only on the BSDs:
sed replace literal TAB
Replacing / with TAB using sed
Also, as we end up piping this to sh for execution, we have to use a different separator for the fields other than the pipe character, |.
So far, this is the best snippet I could come up with, it seems to work great in my tcsh:
Cns# spamdb | fgrep TRAPPED | sort -n -t '|' -k 3 | sed -E -e 's#\|###g' \
-e 's#^([A-Z]+)#([0-9.]+)#([0-9]+)$#"echo -n \2_"; "date -r \3"#g' | \
xargs -n1 sh -c | awk '{gsub("_","\t",$0); print;}'
37.146.207.209 Thu Feb 14 00:54:56 PST 2013
113.163.117.129 Thu Feb 14 02:15:03 PST 2013
212.156.98.210 Thu Feb 14 02:16:55 PST 2013
113.171.216.201 Thu Feb 14 02:24:41 PST 2013
59.99.160.62 Thu Feb 14 03:03:05 PST 2013
86.127.116.162 Thu Feb 14 03:14:52 PST 2013
92.83.139.194 Thu Feb 14 03:57:36 PST 2013
122.177.159.61 Thu Feb 14 04:23:16 PST 2013
219.71.12.150 Thu Feb 14 04:25:04 PST 2013
36.231.9.231 Thu Feb 14 10:14:09 PST 2013

Remove lines where next line matches certain pattern

I have a following simple script for parsing out dates from irc logs (created by irssi)
#!/bin/bash
query=$1
grep -n $query logfile > matches.log
grep -n "Day changed" logfile >> matches.log
cat matches.log | sort -n
It produces output like:
--- Day changed Tue Jul 03 2012
--- Day changed Wed Jul 04 2012
--- Day changed Thu Jul 05 2012
16:54 <#Hamatti> who let the dogs out
--- Day changed Fri Jul 06 2012
--- Day changed Sat Jul 07 2012
--- Day changed Sun Jul 08 2012
12:11 <#Hamatti> dogs are fun
But since I'm only interested in finding out dates for actual matches, I'd like to filter out all those
--- Day changed XXX XXX dd dddd
lines where they don't follow by timestamp on the next line. So the example should output
--- Day changed Thu Jul 05 2012
16:54 <#Hamatti> who let the dogs out
--- Day changed Sun Jul 08 2012
12:11 <#Hamatti> dogs are fun
to get rid of all the disinformation that's not useful.
edit.
After the answer by T. Zelieke I realised that I could make this more of a one-liner so I use the following now to save logfile from being iterated twice.
query=$1
egrep "$query|Day changed" logfile |grep -B1 "^[^-]" |sed '/^--$/d'

grep -B1 "^[^-]" data |sed '/^--$/d'
This uses grep to filter lines that do NOT start with a dash ("^[^-]"). -B1 asks to print the immediate line before a match.
Unfortunately grep separates then each match (pair of two lines) by an -- line. Therefore I pipe the output through sed to get rid of those superflouos lines.

Here's one using awk.
awk -v query="$1" '/^--- Day changed/{day=$0;next} $0 ~ query {if (day!=p) {print day;p=day}; print}'
Every time it finds a "Day changed" line, it stores it in the variable day. Then when it finds a match to the query, it outputs the currently stored day line first. In case there are multiple matches in the same day, the variable p is used to determine if the day-line has been printed already.

Remove duplicate entries in a Bash script [duplicate]

This question already has answers here:
How to delete duplicate lines in a file without sorting it in Unix
(9 answers)
Closed 7 years ago.
I want to remove duplicate entries from a text file, e.g:
kavitha= Tue Feb 20 14:00 19 IST 2012 (duplicate entry)
sree=Tue Jan 20 14:05 19 IST 2012
divya = Tue Jan 20 14:20 19 IST 2012
anusha=Tue Jan 20 14:45 19 IST 2012
kavitha= Tue Feb 20 14:00 19 IST 2012 (duplicate entry)
Is there any possible way to remove the duplicate entries using a Bash script?
Desired output
kavitha= Tue Feb 20 14:00 19 IST 2012
sree=Tue Jan 20 14:05 19 IST 2012
divya = Tue Jan 20 14:20 19 IST 2012
anusha=Tue Jan 20 14:45 19 IST 2012

You can sort then uniq:
$ sort -u input.txt
Or use awk:
$ awk '!a[$0]++' input.txt

It deletes duplicate, consecutive lines from a file (emulates "uniq").
First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'

Perl one-liner similar to #kev's awk solution:
perl -ne 'print if ! $a{$_}++' input
This variation removes trailing whitespace before comparing:
perl -lne 's/\s*$//; print if ! $a{$_}++' input
This variation edits the file in-place:
perl -i -ne 'print if ! $a{$_}++' input
This variation edits the file in-place, and makes a backup input.bak
perl -i.bak -ne 'print if ! $a{$_}++' input

This might work for you:
cat -n file.txt |
sort -u -k2,7 |
sort -n |
sed 's/.*\t/ /;s/\([0-9]\{4\}\).*/\1/'
or this:
awk '{line=substr($0,1,match($0,/[0-9][0-9][0-9][0-9]/)+3);sub(/^/," ",line);if(!dup[line]++)print line}' file.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio