I had to implement some new features on an very old awk script and now want to implement some unit tests to check if my script breaks things. I used diff to check if the script output is different from the whished output:
awk -f mygenerator.awk test.1.gen | diff - test.1.out -q
if [ $? -ne 0 ]; then
echo "test failed"
fi
But now i have some files that generate a dynamic content like a timestamp of the generation date, which causes diff to fail because obviously the timestamp will be different.
My first though was to remove the corresponding lines with grep and test the two "clean" files. then check by egrep if the line is a timestamp.
is there any better way to do this? It should all be done by common unix tools in a bash script due to compatibility reasons.
You could use sed with regular expressions.
If your output is like Fri Feb 21 22:53:54 UTC 2014 from the date command, use:
regex_timestamp="s/([A-Z]{1}[a-z]{2} [A-Z]{1}[a-z]{2} [0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2} [A-Z]{3} [0-9]{4})//g";
awk -f mygenerator.awk test.1.gen | diff <(sed -r "$regex_timestamp" -) <(sed -r "$regex_timestamp" test.1.out) -q
If you're trying to filter a unix timestamp, simply use this as regex:
s/([0-9]{10})//g
Please note that the latter replaces any group of numbers the same size as a unix timestamp. What format is your timestamp?
I usually use sed to replace the timestamp with XXXXXX, so I can still compare the other information on the same line.
date | \
sed 's/\(Sun\|Mon\|Tue\|Wed\|Thu\|Fri\|Sat\) \(Jan\|Feb\|Mar\|Apr\|May\|Jun\|Jul\|Aug\|Sep\|Oct\|Nov\|Dec\) \?[0-9]\+ [0-9][0-9]:[0-9][0-9]:[0-9][0-9] [A-Z]\+ [0-9]\{4\}/XXXXXX/'
Related
I feel like there's probably good ways to do this in bash, but I'm struggling to find direct explanations for the best tools to do something like the following:
Given an input of string data from git log
filter commits down to only those between a set of tags
then format each commit, pulling convention based snippets of data
output the formatted snippets
So far, I've found that:
set $(git tag -l | sort -V | tail -2)
currentVersion=$2
previousVersion=$1
will give me variables for relevant tags. I can then do this:
$(git log v9.5.3..) | {what now?}
to pipe all commits from the previous tag to current. But I'm not sure on the next step?
Will the commits coming from the pipe be considered an array?
If not, how do I differentiate each commit distinctly?
Should I run a function against the piped input data?
Am I thinking about this completely wrong?
If this were Javascript, I'd run a loop over what would assuredly be an array input, regex the snippets I want from the commit, then output a formatted string with the snippets, probably in a map method or something. But I'm not sure if this is how I should be thinking in Bash with pipes?
Expecting data for each commit like:
commit xxxxxxxxxx
Author: xxxx xxxx <xxx#xxx.xxx>
Date: Thu Jul 29 xx:xx:xx 2021 +0000
Subject of the commit
A multiline description of the commit, the contents of which are not
really relevant for what I need, but still useful for consideration.
{an issue id}
And right now I'd be looking to grab:
the commit hash
the author
the date
the subject
the issue id
Would appreciate any insight as to the normal way to do this sort of thing with bash, with pipes, etc. I'd love to get my head right with Bash and do it here, rather than retreat back to my comfort zone of JS. Thanks!
Alright, I spent some time and found a solution that works for me. I'm (again) very much not a bash script'er, so I'm still curious if there's better ways to do this, but this works:
PREVIOUS_VERSION=${1:-$(git tag | tail -n 2 | head -n 1)}
CURRENT_VERSION=$(git tag | tail -n 1)
URL="https://your-hosting-domain-here/q/"
echo "-----RELEASE: $CURRENT_VERSION-----"
echo ""
parse_commits() {
while read LINE
do
if grep -q "Author:" <<< "$LINE"; then
echo "$LINE"
read DATE_LINE; echo "$DATE_LINE"
read SUBJECT_LINE; echo "Subject: $SUBJECT_LINE"
fi
if grep -q "Change-Id:" <<< "$LINE"; then
CHANGE_ID=$(echo "$LINE" | awk '{print $NF}')
echo "$URL$CHANGE_ID"
echo ""
fi
done
}
git log $PREVIOUS_VERSION.. | strings | parse_commits
I'll explain for anyone curious as I was as to how this could be done:
PREVIOUS_VERSION=${1:-$(git tag | tail -n 2 | head -n 1)}
This is simply a means within Bash to assign a variable to the incoming argument, with a fallback if it's not defined.
git log $PREVIOUS_VERSION.. | strings | parse_commits
This uses a git method to output all commits since the given version. We then pipe those commits to Bash "strings" command, which translates the input stream(?)/string(?) into a set of lines, and then pipe that to our custom function.
while read LINE
This starts a while loop, using the Bash command "read" which is super useful for what I needed to do. Essentially, it reads one line from input, and assigns it to the given arg as a variable. So this reads a line, and assigns it to variable: LINE.
if grep -q "Author:" <<< "$LINE"; then
This is a conditional that uses Bash command grep, which will search a file for the given string. We don't have a file, we have a string as a variable $LINE, but we can turn that into a temporary file using the Bash operator <<< which does exactly that. So this line runs the internal block if the given LINE as a file contains the substring "Author".
read DATE_LINE; echo "$DATE_LINE"
Once we've found our desired position after the Author: line (and echo'ed it), we simply read the next line, assign it to variable DATE_LINE and immediately echo that as well. We do the same for the subject line.
Up until now, we probably could have used simpler commands to achieve a similar result, but now we get to the tricky part (for me at least, not knowing much about Bash).
CHANGE_ID=$(echo "$LINE" | awk '{print $NF}')
After a similar conditional grep'ing for a substring Change-Id:, we snag the second word in the LINE by echo'ing it, and piping that to Bash awk command, which was the best way I could find for grabbing a substring. The awk command has a special keyword NF that equates to the count of words in the string. By using $NF we are referencing the last word, since for example the last word of 5 words would be $5. We set that to a variable, then echo it out with a given format (a url in my case).
The ultimate output looks like this:
-----RELEASE: v9.6.0-----
Author: xxxxxx xxxx <xxxx#xxxx.com>
Date: Fri Jul 30 xx:xx:xx 2021 +0000
Subject: The latest commit subject since last tag
https://your-hosting-domain-here/q/xxxxxxxxxxxxxxx
Author: xxxxxx xxxx <xxxx#xxxx.com>
Date: Thu Jul 29 xx:xx:xx 2021 +0000
Subject: The second latest commit subject
https://your-hosting-domain-here/q/xxxxxxxxxxxxxxx
... (and so on)
Hope that was helpful to someone, and if not, to future me :)
I wanna change unix epoch to normal date
i'm trying:
sed < file.json -e 's/\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/`date -r \1`/g'
any hint?
With the lack of information from your post, I can not give you a better answer than this but it is possible to execute commands using sed!
You have different ways to do it you can use
directly sed e instruction followed by the command to be
executed, if you do not pass a command to e then it will treat the content of the pattern buffer as external command.
use a simple substitute command with sed and pipe the output to sh
Example 1:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/;e"
Example 2:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/" |sh
Test 1 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Test 2 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Remarks:
I will let you adapt the date command accordingly to your system specifications
Since modern dates are longer than 8 characters, the sed command uses an
open ended length specifier of at least 8, rather than exactly 8.
Allan has a nice way to tackle dynamic arguments: write a script dynamically and pipe it to a shell! It works. It tends to be a bit more insecure because you could potentially pipe unintentional shell components to sh - for example if rm -f some-important-file was in the file along with the numbers , the sed pipeline wouldn't change that line, and it would also be passed to sh along with the date commands. Obviously, this is only a concern if you don't control the input. But mistakes can happen.
A similar method I much prefer is with xargs. It's a bit of a head trip for new users, but very powerful. The idea behind xargs is that it takes its input from its standard in, then adds it to the command comprised of its own non-option arguments and runs the command(s). For instance,
$ echo -e "/tmp\n/usr/lib" | xargs ls -d
/tmp /usr/lib
Its a trivial example of course, but you can see more exactly how this works by adding an echo:
echo -e "/tmp\n/usr/lib" | xargs echo ls -d
ls -d /tmp /usr/lib
The input to xargs becomes the additional arguments to the command specified in xargs's own arguments. Read that twice if necessary, or better yet, fiddle with this powerful tool, and the light bulb should come on.
Here's how I would approach what you're doing. Of course I'm not sure if this is actually a logical thing to do in your case, but given the detail you went into in your question, it's the best I can do.
$ cat dates.txt
Dates:
1517363346
I can run a command like this:
$ sed -ne '/^[0-9]\{8,\}$/ p' < dates.txt | xargs -I % -n 1 date -d #%
Tue Jan 30 19:49:06 CST 2018
Makes sense, because I used the commnad echo -e "Dates:\ndate +%s" > dates.txt to make the file a few minutes before I wrote this post! Let's go through it together and I'll break down what I'm doing here.
For one thing, I'm running sed with -n. This tells it not to print the lines by default. That makes this script work if not every line has an 8+ digit "date" in it. I also added anchors to the start (^) and end ($) of the regex so the line had only the approprate digits ( I realize this may not be perfect for you, but without understanding your its input, I can't do better ). These are important changes if your file is not entirely comprised of date strings. Additionally, I am matching at least 8 characters, as modern date strings are going to be more like 10 characters long. Finally, I added a command p to sed. This tells it to print the matching lines, which is necessary because I specifically said not to print the nonmatching lines.
The next bit is the xargs iteslf. The sed will write a date string out to xargs's standard input. I set only a few settings for xargs. By default it will add the standard input to the end of the command, separated by a space. I didn't want a space, so I used -I to specify a replacement string. % doesn't have a special meaning; its just a placeholder that gets replaced with the input. I used % because its not a special character but rarely is used in commands. Finally, I added -n 1 to make sure only 1 input was used per execution of date. ( xargs can also add many inputs together, as in my ls example above).
The end result? Sed matches lines that consist, exclusively, of 8 or more numeric values, outputting the matching lines. The pipe then sends this output to xargs, which takes each line separately (-n 1) and, replacing the placeholder (-I %) with each match, then executes the date command.
This is a shell pattern I really like, and use every day, and with some clever tweaks, can be very powerful. I encourage anyone who uses linux shell to get to know xargs right away.
There is another option for GNU sed users. While the BSD land folks were pretty true to their old BSD unix roots, the GNU folks, who wrote their userspace from scratch, added many wonderful enhancements to the standards. GNU Sed can apparently run a subshell command for you and then do the replacement for you, which would be dramatically easier. Since you are using the bsd style date invocation, I'm going to assume you don't have gnu sed at your disposal.
Using sed: tested with macOs only
There is a slight difference with the command date that should use the flag (-r) instead of (-d) exclusive to macOS
echo 12687278 | sed "s/\([0-9]\{8,\}\)/$(date -r \1)/g"
Results:
Thu Jan 1 09:00:01 JST 1970
I've enabled the timestamp for my .bash_history by using the HISTTIMEFORMAT="%d.%m.%y %T " instructive in .bashrc. However, sometimes the order of the entries in the .bash_history is messed up, and I want to sort that file by the timestamp. Unfortunately, the timestamp is not in the same line as the entry, but one line above, like this:
#1512649029
a command
#1512649032
another command
#1512649039
a third command
So how can I sort the file by these "pairs" of lines? Furthermore, there are entries that have no timestamps, e.g. lines that have no #... line above. I want these lines to gather at the top of the file. Thanks!
We can use a simple sed program to join lines:
/^$/d # skip blank lines
/^#/N # append next line to timestamp
/^#/!s/^/#0\n/ # command without timestamp - prefix with #0
s/#// # remove initial #
y/\n/ / # convert newline to space
and another to restore the timestamp comments:
s/(\S+) /#\1\n/
Putting that all together, we get
sort_history() {
sed -e '/^$/d' -e '/^#/N' -e '/^#/!s/^/#0\n/' \
-e 's/#//' -e 'y/\n/ /' <<<"$in" \
| sort -n | sed -e 's/\(\S\+\) /#\1\n/'
}
Disclaimer: This might not be the most elegant and simplest solution.
However the following bash shell script snippet worked for me:
#!/bin/bash
function BashHistoryJoinTimestampLines() {
COMMAND_WITHOUT_TIMESTAMP=TRUE
while read line; do
if [ "${line:0:1}" = "#" ] # This should be a timestamp line
then echo -ne "$line\t" # the -n option supresses the line feed
COMMAND_WITHOUT_TIMESTAMP=FALSE
else if [ ${COMMAND_WITHOUT_TIMESTAMP} = TRUE ]
then echo -ne "#0\t"
fi
echo $line
COMMAND_WITHOUT_TIMESTAMP=TRUE
fi
done
}
#
# Example:
BashHistoryJoinTimestampLines < $HISTFILE | sort
In Unix/Linux text processing by pipelining the sort utility program by default operates on records separated by line endings.
In order to use "sort" for this application the timestamp lines have to be first joined together with the history lines containing the commands. Lines not preceeded by a time stamp will get a dummy timestamp of #0 (January 1st 1970) in this script. I've used the TAB character as a separator between timestamp and command in this script.
For a long time I looked for a way to merge bash history (with timestamps), and nothing seemed acceptable.
That is... Merge the on-disk ".bash_history" with the in-memory shell 'history'. Preserving timestamp ordering, and command order within those timestamps.
Optionally removing unique commands (even if multi-line), and/or removing (cleaning out) simple and/or sensitive commands, according to defined perl RE's. Adjust to suit!
This is the result... https://antofthy.gitlab.io/software/history_merge.bash.txt
Enjoy.
I am trying to get rid of the dates - all of them from 2015 -present 2017.
I want to rename each foo_data_$date to just foo_data_*. I just need the files name. Not all the individual dates.
I do not understand the regex for sed - I can do it in perl with perl -nle 'print /(foo_data_)\d+txt) but can't figure out how to do it with sed.
I want to do it in sed because I have been using sed -i flag and changing the file in place.
cat /tmp/foo | head | sed -e 's/foo_data_20*txt/foo_data_\*/g'
foo_data_20150901.txt
foo_data_20150902.txt
foo_data_20150906.txt
foo_data_20150907.txt
foo_data_20150908.txt
foo_data_20150909.txt
foo_data_20150912.txt
You can just run sed like this.
sed -e 's/foo_data_[0-9]*/foo_data_/g'
Now, for the thing to capture dates only between 2015 and 2017, this will make it.
sed -e 's/foo_data_201\(5\|6\|7\)[0-9]*/foo_data_/g'
Then you will remove the dates from the file names in your file.
You don't need to mention foo_data:
sed -i 's/201[567][01][0-9][0-3][0-9]//'
Your command was wrong: /foo_data_20*txt/ will match a '0' 0 or more times (something like foo_data_2000000000000txt).
If you just want to rename the files, most Linux distros (assuming you're on Linux) have a rename utility that handles Perl regular expressions just fine:
pax> touch pax_100.txt ; touch pax_200.txt
pax> rename -n 's/_(\d)/_diablo_$1/' pax*
rename(pax_100.txt, pax_diablo_100.txt)
rename(pax_200.txt, pax_diablo_200.txt)
The -n options shows what will happen rather than doing the rename. Once you're satisfied, simply remove it.
Oh, and one final note. If you remove the dates from all those file names, they'll all have the same file name. Unless your file names are just test data, that's probably going to need some further thought on your part.
I've got a reasonably complicated string of piped shell commands (let's assume it's bunch | of | commands), which together produces several rows of output, in this format:
some_path/some_file.csv 1439934121
...where 1439934121 is the file's last-modified timestamp.
What I need to do is see if it's a timestamp on the current day, i.e. on or after last midnight, and then include just the lines where that is true.
I assume this means that some string (e.g. the word true) should either replace or be appended to the timestamps of those lines for grep to distinguish them from ones where the timestamps are those of an earlier date.
To put it in shell command terms:
bunch | of | commands | ????
...should produce:
some_path/some_file.csv true or some_path/some_file.csv 1439934121 true
...for which I could easily grep (obviously assuming that last midnight <= 1439934121 <= current time).
What kind of ???? would do this? I'm almost certain that awk can do what I need it to, so I've looked at it and date, but I'm basically doing awk-by-google with no skills and getting nowhere.
Don't feel constrained by my tool assumptions; if you can achieve this with alternate means, given the output of bunch | of | commands but still using shell tools and piping, I'm all ears. I'd like to avoid temp files or Perl, if possible :-)
I'm using gawk + bash 4.3 on Ubuntu Linux, specifically, and have no portability concerns.
Since today 00:00:00 with the %s format returns the unix timestamp of that moment:
$ date -d'today 00:00:00'
Thu Sep 3 00:00:00 CEST 2015
$ date -d 'today 00:00:00' "+%s"
1441231200
You can probably pipe to an awk doing something like:
... | awk -v midnight="$(date -d 'today 00:00:00' '+%s')" '{$2= ($2>midnight) ? "true" : "false"}1'
That is, use the ternary operator to check the value of $2 and replace with either of the values true/false depending on the result:
awk -v midnight="$(date ...)" '{$2= ($2>midnight) ? "true" : "false"}1'
Test
$ cat a
hello 1441231201
bye 23
$ awk -v midnight="$(date -d 'today 00:00:00' '+%s')" '{$2= ($2>midnight) ? "true" : "false"}1' a
hello true
bye false