sed: mass converting epochs amongst random other text - bash

Centos / Linux
Bash
I have a log file, which has lots of text in and epoch numbers all over the place. I want to replace all epochs whereever they are into readable date/time.
I've been wanting to this via sed, as that seems the tool for the job. I can't seem to get the replacement part of sed to actually parse the variable(epoch) to it for conversion.
Sample of what I'm working with...
echo "Some stuff 1346474454 And not working" \
| sed 's/1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/'"`bpdbm -ctime \&`"'/g'
Some stuff 0 = Thu Jan 1 01:00:00 1970 And not working
The bpdbm part will convert a supplied epoch variable into useful date. Like this..
bpdbm -ctime 1346474454
1346474454 = Sat Sep 1 05:40:54 2012
So how do i get the "found" item to be parsed into a command. As i don't seem to be able to get it to work.
Any help would be lovely. If there is another way, that would be cool...but i suspect sed will be quickest.
Thanks for your time!

that seems the tool for the job
No, it is not. sed can use & only itself, there is no way how to make it an argument to a command. You need something more powerful, e.g. Perl:
perl -pe 'if ( ($t) = /(1[0-9]+)/ ) { s/$t/localtime($t)/e }'

You can do it with GNU sed, the input:
infile
Some stuff 1346474454 And not working
GNU sed supports /e parameter which allows for piping command output into pattern space, one way to take advantage of this with bpdbm:
sed 's/(.*)(1[0-9]{9})(.*)/echo \1 $(bpdbm -ctime \2) \3/e' infile
Or with coreutils date:
sed 's/(.*)(1[0-9]{9})(.*)/echo \1 $(date -d #\2) \3/e' infile
output with date
Some stuff Sat Sep 1 06:40:54 CEST 2012 And not working
To get the same output as with bpdbm:
sed 's/(.*)(1[0-9]{9})(.*)/echo "\1$(date -d #\2 +\"%a %b %_d %T %Y\")\3"/e' infile
output
Some stuff Sat Sep 1 06:40:54 2012 And not working
Note, this only replaces the last epoch found on a line. Re-run if there are more.

Related

sed regex capturing group outputting whole input

I'm trying to use sed in order to get following output > 09 Aug 2017 14:15:11 from the string that looks like this 09/Aug/2017:14:15:11
when I use following code
sed 's/^\(\d+\)\/\(\w+\)\/\(\d+\)\:\(.*\)$/\1 /p' <(echo "09/Aug/2017:14:15:11")
I get whole input string as an output:
09/Aug/2017:14:15:11
Im doing this in order to execute date -d command on the result since date -d 09/Aug/2017:14:15:11 +%s is giving me this error: date: invalid date ‘09/Aug/2017:14:15:11’.
If you have other suggestion rather than using sed dont hesitate to make an suggestion.
Thanks!
With sed:
$ echo "09/Aug/2017:14:15:11" | sed -e 's#/# #g' -e 's/:/ /'
09 Aug 2017 14:15:11
We use two search and replace commands here, one running after the other. The first one to replace all (notice the global flag, g) slashes with spaces (/ → ), and the second one to replace just the first colon (: → ) (notice the lack of g flag). Both are search and replace commands (s), but the first one uses # as separator instead of the standard /, so we don't have to escape the slash we are searching.
I think command below is better:
date "+%d %h %Y %H:%M:%S"

What is the Exact Use and Meaning of "IFS=!"

I was trying to understand the usage of IFS but there is something I couldn't find any information about.
My example code:
#!/bin/sh
# (C) 2016 Ergin Bilgin
IFS=!
for LINE in $(last -a | sed '$ d')
do
echo $LINE | awk '{print $1}'
done
unset IFS
I use this code to print last users line by line. I totally understand the usage of IFS and in this example when I use default IFS, it reads word by word inside of my loop. And when I use IFS=! it reads line by line as I wish. The problem here is I couldn't find anything about that "!" on anywhere. I don't remember where I learned that. When I google about achieving same kind of behaviour, I see other values which are usually strings.
So, what is the meaning of that "!" and how it gives me the result I wish?
Thanks
IFS=! is merely setting a non-existent value for IFS so that you can iterate input line by line. Having said that using for loop here is not recommended, better to use read in a while loop like this to print first column i.e. username:
last | sed '$ d' | while read -r u _; do
echo "$u"
done
As you are aware, if the output of last had a !, the script would split the input lines on that character.
The output format of last is not standardized (not in POSIX for instance), but you are unlikely to find a system where the first column contains anything but the name of whatever initiated an action. For instance, I see this:
tom pts/8 Wed Apr 27 04:25 still logged in michener.jexium-island.net
tom pts/0 Wed Apr 27 04:15 still logged in michener.jexium-island.net
reboot system boot Wed Apr 27 04:02 - 04:35 (00:33) 3.2.0-4-amd64
tom pts/0 Tue Apr 26 16:23 - down (04:56) michener.jexium-island.net
continuing to
reboot system boot Fri Apr 1 15:54 - 19:03 (03:09) 3.2.0-4-amd64
tom pts/0 Fri Apr 1 04:34 - down (00:54) michener.jexium-island.net
wtmp begins Fri Apr 1 04:34:26 2016
with Linux, and different date-formats, origination, etc., on other machines.
By setting IFS=!, the script sets the field-separator to a value which is unlikely to occur in the output of last, so each line is read into LINE without splitting it. Normally, lines are split on spaces.
However, as you see, the output of last normally uses spaces for separating columns, and it is fed into awk which splits the line anyway — with spaces. The script could be simplified in various ways, e.g.,:
#!/bin/sh
for LINE in $(last -a | sed -e '$ d' -e 's/ .*//')
do
echo $LINE
done
which is (starting from the example in the question) adequate if the number of logins is not large enough to exceed your command-line. While checking for variations in last output, I noticed one machine with about 9800 lines from several years. (The other usual motivations given for not using for-loops are implausible in this instance). As a pipe:
#!/bin/sh
last -a | sed -e 's/ .*//' -e '/^$/d' | while IFS= read LINE
do
echo $LINE
done
I changed the sed expression (which OP likely copied from some place such as Bash - remove the last line from a file) because it does not work.
Finally, using the -a option of last is unnecessary, since all of the additional information it provides is discarded.

bash unit tests with dynamic content

I had to implement some new features on an very old awk script and now want to implement some unit tests to check if my script breaks things. I used diff to check if the script output is different from the whished output:
awk -f mygenerator.awk test.1.gen | diff - test.1.out -q
if [ $? -ne 0 ]; then
echo "test failed"
fi
But now i have some files that generate a dynamic content like a timestamp of the generation date, which causes diff to fail because obviously the timestamp will be different.
My first though was to remove the corresponding lines with grep and test the two "clean" files. then check by egrep if the line is a timestamp.
is there any better way to do this? It should all be done by common unix tools in a bash script due to compatibility reasons.
You could use sed with regular expressions.
If your output is like Fri Feb 21 22:53:54 UTC 2014 from the date command, use:
regex_timestamp="s/([A-Z]{1}[a-z]{2} [A-Z]{1}[a-z]{2} [0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2} [A-Z]{3} [0-9]{4})//g";
awk -f mygenerator.awk test.1.gen | diff <(sed -r "$regex_timestamp" -) <(sed -r "$regex_timestamp" test.1.out) -q
If you're trying to filter a unix timestamp, simply use this as regex:
s/([0-9]{10})//g
Please note that the latter replaces any group of numbers the same size as a unix timestamp. What format is your timestamp?
I usually use sed to replace the timestamp with XXXXXX, so I can still compare the other information on the same line.
date | \
sed 's/\(Sun\|Mon\|Tue\|Wed\|Thu\|Fri\|Sat\) \(Jan\|Feb\|Mar\|Apr\|May\|Jun\|Jul\|Aug\|Sep\|Oct\|Nov\|Dec\) \?[0-9]\+ [0-9][0-9]:[0-9][0-9]:[0-9][0-9] [A-Z]\+ [0-9]\{4\}/XXXXXX/'

Convert a date in Shell

This is a hart one, how do I convert a date like
12-23-11 13:37
In something like(seconds should always be 00)
Fri Dec 23 13:18:58 CET 2011
?
With gnu date 5.97, you can do:
$ date -d '11-12-23 13:37'
to get what you want, so all you need to do is massage your input.
Since gnu date is not ubiquitous, here's a quick perl script that does what you want:
$ echo 12-23-11 13:37 |
perl -MTime::Local -wnE '
y/-:/ /;
#F=split;
say scalar localtime timelocal( 0, $F[4], $F[3], $F[1], $F[0] - 1,$F[2]);
'
Fri Dec 23 13:37:00 2011
(Requires perl 5.10 for -E and say, but should work in older perl using -e and print.)
If this is a script for yourself and not something that will have to run in a million different environments, then depending on what version of date you have available, you should be able to use it.
Read the man page for your particular version of date. For example, if it's the version documented at http://ss64.com/bash/date.html, you can use --date for the input string, etc.
On Mac OS X, use the -f option to specify the input format, the -j option so that it doesn't try to set the date, and with specifying the output format on the command line.

Trim text and add timestamp?

So basically I have my output as the following:
<span id="PlayerCount">134,015 people currently online</span>
What I want is a way to trim it to show:
134,015 - 3:24:20AM - Oct 24
Can anyone help? Also note the number may change so is it possible output everything between ">" and the "c" in currently? And add a timestamp somehow?
Using commands from terminal in Linux, so that's called bash right?
Do you perhaps mean something like:
$ echo '<span id="PlayerCount">134,015 people currently online</span>' | sed
-e 's/^[^>]*>//'
-e "s/currently.*$/$(date '+%r %b %d %Y')/"
which generates:
134,015 people 03:36:30 PM Oct 24 2011
The echo is just for the test data. The first sed command will change everything up to the first > character into nothing (ie, delete it).
The second one will change everything from the currently to the end of the line with the current date in your desired format (although I have added the year since I'm a bit of a stickler for detail).
The relevant arguments for date here are:
%r locale's 12-hour clock time (e.g., 11:11:04 PM)
%b locale's abbreviated month name (e.g., Jan)
%d day of month (e.g., 01)
%Y year
A full list of format specifiers can be obtained from the date man page (execute man date from a shell).
A small script which will give you the desired information from the page you mentioned in the comments is:
#!/usr/bin/bash
wget --output-document=- http://runescape.com/title.ws 2>/dev/null \
| grep PlayerCount \
| head -1l \
| sed 's/^[^>]*>//' \
| sed "s/currently.*$/$(date '+%r %b %d %Y')/"
Running this gives me:
pax$ ./online.sh
132,682 people 04:09:17 PM Oct 24 2011
In detail:
The wget bit pulls down the web page and writes it on standard output. The standard error (progress bar) is thrown away.
The grep extracts only lines with the word PlayerCount in them.
The head throws away all but the first of those.
The first sed strips up to the first > character.
The second sed changes the trailing text to the durrent date and time.
Quickhack(tm):
$ people=$(echo '<span id="PlayerCount">134,015 people currently online</span>' | \
sed -e 's/^.*>\(.*\) people.*$/\1/')
$ echo $people - $(date)
134,015 - Mon Oct 24 09:36:23 CEST 2011
produce_OUTPUT | grep -o '[0-9,]\+' | while read count; do
printf "%s - %s\n" $count "$(date +'%l:%M:%S %p - %b %e')"
done

Resources