Sort `.bash_history` by timestamp - bash

I've enabled the timestamp for my .bash_history by using the HISTTIMEFORMAT="%d.%m.%y %T " instructive in .bashrc. However, sometimes the order of the entries in the .bash_history is messed up, and I want to sort that file by the timestamp. Unfortunately, the timestamp is not in the same line as the entry, but one line above, like this:
#1512649029
a command
#1512649032
another command
#1512649039
a third command
So how can I sort the file by these "pairs" of lines? Furthermore, there are entries that have no timestamps, e.g. lines that have no #... line above. I want these lines to gather at the top of the file. Thanks!

We can use a simple sed program to join lines:
/^$/d # skip blank lines
/^#/N # append next line to timestamp
/^#/!s/^/#0\n/ # command without timestamp - prefix with #0
s/#// # remove initial #
y/\n/ / # convert newline to space
and another to restore the timestamp comments:
s/(\S+) /#\1\n/
Putting that all together, we get
sort_history() {
sed -e '/^$/d' -e '/^#/N' -e '/^#/!s/^/#0\n/' \
-e 's/#//' -e 'y/\n/ /' <<<"$in" \
| sort -n | sed -e 's/\(\S\+\) /#\1\n/'
}

Disclaimer: This might not be the most elegant and simplest solution.
However the following bash shell script snippet worked for me:
#!/bin/bash
function BashHistoryJoinTimestampLines() {
COMMAND_WITHOUT_TIMESTAMP=TRUE
while read line; do
if [ "${line:0:1}" = "#" ] # This should be a timestamp line
then echo -ne "$line\t" # the -n option supresses the line feed
COMMAND_WITHOUT_TIMESTAMP=FALSE
else if [ ${COMMAND_WITHOUT_TIMESTAMP} = TRUE ]
then echo -ne "#0\t"
fi
echo $line
COMMAND_WITHOUT_TIMESTAMP=TRUE
fi
done
}
#
# Example:
BashHistoryJoinTimestampLines < $HISTFILE | sort
In Unix/Linux text processing by pipelining the sort utility program by default operates on records separated by line endings.
In order to use "sort" for this application the timestamp lines have to be first joined together with the history lines containing the commands. Lines not preceeded by a time stamp will get a dummy timestamp of #0 (January 1st 1970) in this script. I've used the TAB character as a separator between timestamp and command in this script.

For a long time I looked for a way to merge bash history (with timestamps), and nothing seemed acceptable.
That is... Merge the on-disk ".bash_history" with the in-memory shell 'history'. Preserving timestamp ordering, and command order within those timestamps.
Optionally removing unique commands (even if multi-line), and/or removing (cleaning out) simple and/or sensitive commands, according to defined perl RE's. Adjust to suit!
This is the result... https://antofthy.gitlab.io/software/history_merge.bash.txt
Enjoy.

Related

Looping and grep writes output for the last line only

I am looping through the lines in a text file. And performing grep on each lines through directories. like below
while IFS="" read -r p || [ -n "$p" ]
do
echo "This is the field: $p"
grep -ilr $p * >> Result.txt
done < fields.txt
But the above writes the results for the last line in the file. And not for the other lines.
If i manually execute the command with the other lines, it works (which mean the match were found). Anything that i am missing here? Thanks
The fields.txt looks like this
annual_of_measure__c
attached_lobs__c
apple
When the file fields.txt
has DOS/Windows lineending convention consisting of two character (Carriage-Return AND Linefeed) and
that file is processed by Unix-Tools expecting Unix lineendings consisting of only one character (Linefeed)
then the line read by the read command and stored in the variable $p is in the first line annual_of_measure__c\r (note the additional \r for the Carriage-Return). Then grep will not find a match.
From your description in the question and the confirmation in the comments, it seems that the last line in fields.txt has no lineending at all, so the variable $p is the ordinary string apple and grep can find a match on the last line of the file.
There are tools for converting lineendings, e.g. see this answer or even more options in this answer.

echo last character of text file in Unix/Bash

I need to see the last characters of bunch of text files (or alternatively test whether they are "}" and give a list of files that test negative ). Is there an easy way to do this from the command line.
(Ideally the solution works without reading the whole file from the start because in addition to there being many they can also be quite large.
P.S.: Any answer would be great but I would really appreciate if the function and syntax of everything in the answer can be fully explained.
It can be done fairly easily with tail and then string indexing in bash. For example, you obtain the last line in a file with, tail -n1 file. You will need to store the line in a variable using command-substitution, e.g.
lastln=$(tail -n1 file)
Then it is simply a matter of indexing the last characters, e.g.
echo ${lastln:(-1)}
(note: when indexing from the end of the string, you must put the offset (e.g. -1 in parenthesis (-1) -- or -- you must leave a space before the -1, e.g. echo ${lastln: -1} is also valid.)
You can try this:
for file in file1 file2; do tail -n 1 "$file" | grep -q '}$' || echo "$file"; done
where you should replace file1 file2 with the list of files you want to analyze, e.g. * or the like. Now what happens here? The outer part
for file in file1 file2; do ...; done
is a simple loop over the files, where inside the loop, you can refer to the current file as $file. Then,
tail -n 1 "$file"
prints the last line of the given file and
| grep -q '}$'
redirects the output to grep (turned into silent mode with -q), which looks for '}' immediatly followed by the end of the line ($). The return value of this command can be used to chain another action: when grep returns non-zero (indicating failure, i.e., the pattern is not matched), the last part
|| echo "$file"
is executed, resulting in the list of files you need.

How to get line WITH tab character using tail and head

I have made a script to practice my Bash, only to realize that this script does not take tabulation into account, which is a problem since it is designed to find and replace a pattern in a Python script (which obviously needs tabulation to work).
Here is my code. Is there a simple way to get around this problem ?
pressure=1
nline=$(cat /myfile.py | wc -l) # find the line length of the file
echo $nline
for ((c=0;c<=${nline};c++))
do
res=$( tail -n $(($(($nline+1))-$c)) myfile.py | head -n 1 | awk 'gsub("="," ",$1){print $1}' | awk '{print$1}')
#echo $res
if [ $res == 'pressure_run' ]
then
echo "pressure_run='${pressure}'" >> myfile_mod.py
else
echo $( tail -n $(($nline-$c)) myfile.py | head -n 1) >> myfile_mod.py
fi
done
Basically, it finds the line that has pressure_run=something and replaces it by pressure_run=$pressure. The rest of the file should be untouched. But in this case, all tabulation is deleted.
If you want to just do the replacement as quickly as possible, sed is the way to go as pointed out in shellter's comment:
sed "s/\(pressure_run=\).*/\1$pressure/" myfile.py
For Bash training, as you say, you may want to loop manually over your file. A few remarks for your current version:
Is /myfile.py really in the root directory? Later, you don't refer to it at that location.
cat ... | wc -l is a useless use of cat and better written as wc -l < myfile.py.
Your for loop is executed one more time than you have lines.
To get the next line, you do "show me all lines, but counting from the back, don't show me c lines, and then show me the first line of these". There must be a simpler way, right?
To get what's the left-hand side of an assignment, you say "in the first space-separated field, replace = with a space , then show my the first space separated field of the result". There must be a simpler way, right? This is, by the way, where you strip out the leading tabs (your first awk command does it).
To print the unchanged line, you do the same complicated thing as before.
A band-aid solution
A minimal change that would get you the result you want would be to modify the awk command: instead of
awk 'gsub("="," ",$1){print $1}' | awk '{print$1}'
you could use
awk -F '=' '{ print $1 }'
"Fields are separated by =; give me the first one". This preserves leading tabs.
The replacements have to be adjusted a little bit as well; you now want to match something that ends in pressure_run:
if [[ $res == *pressure_run ]]
I've used the more flexible [[ ]] instead of [ ] and added a * to pressure_run (which must not be quoted): "if $res ends in pressure_run, then..."
The replacement has to use $res, which has the proper amount of tabs:
echo "$res='${pressure}'" >> myfile_mod.py
Instead of appending each line each loop (and opening the file each time), you could just redirect output of your whole loop with done > myfile_mod.py.
This prints literally ${pressure} as in your version, because it's single quoted. If you want to replace that by the value of $pressure, you have to remove the single quotes (and the braces aren't needed here, but don't hurt):
echo "$res=$pressure" >> myfile_mod.py
This fixes your example, but it should be pointed out that enumerating lines and then getting one at a time with tail | head is a really bad idea. You traverse the file for every single line twice, it's very error prone and hard to read. (Thanks to tripleee for suggesting to mention this more clearly.)
A proper solution
This all being said, there are preferred ways of doing what you did. You essentially loop over a file, and if a line matches pressure_run=, you want to replace what's on the right-hand side with $pressure (or the value of that variable). Here is how I would do it:
#!/bin/bash
pressure=1
# Regular expression to match lines we want to change
re='^[[:space:]]*pressure_run='
# Read lines from myfile.py
while IFS= read -r line; do
# If the line matches the regular expression
if [[ $line =~ $re ]]; then
# Print what we matched (with whitespace!), then the value of $pressure
line="${BASH_REMATCH[0]}"$pressure
fi
# Print the (potentially modified) line
echo "$line"
# Read from myfile.py, write to myfile_mod.py
done < myfile.py > myfile_mod.py
For a test file that looks like
blah
test
pressure_run=no_tab
blah
something
pressure_run=one_tab
pressure_run=two_tabs
the result is
blah
test
pressure_run=1
blah
something
pressure_run=1
pressure_run=1
Recommended reading
How to read a file line-by-line (explains the IFS= and -r business, which is quite essential to preserve whitespace)
BashGuide

Delete lines in file over an hour old using timestamps bash

Having a bit of bother trying to get the following to work.
I have a file containing hostname:timestamp as below:
hostname1:1445072150
hostname2:1445076364
I am trying to create a bash script that will query this file (using a cron job) to check if the timestamp is over 1 hour old and if so, remove the line.
Below is what I have so far but it doesn't appear to be removing the line in the file.
#!/bin/bash
hosts=/tmp/hosts
current_timestamp=$(date +%s)
while read line; do
hostname=`echo $line | sed -e 's/:.*//g'`
timestamp=`echo $line | cut -d ":" -f 2`
diff=$(($current_timestamp-$timestamp))
if [ $diff -ge 3600 ]; then
echo "$hostname - Timestamp over an hour old. Deleting line."
sed -i '/$hostname/d' $hosts
fi
done <$hosts
I have managed to get the timestamp part working correctly in identifying hosts that are over an hour old but having trouble removing the time from the file.
I suspect it may be due to the while loop keeping the file open but not 100% sure how to work around it. Also tried making a copy of the file and editing that but still nothing.
ALTERNATIVELY: If there is a better way to get this to work and produce the same result, I am open to suggestions :)
Any help would be much appreciated.
Cheers
The problem in your script was just this line:
sed -i '/$hostname/d' $hosts
Variables inside single-quotes are not expanded to their values,
so the command is trying to replace literally "$hostname", instead of its value. If you replace the single-quotes with double-quotes,
the variable will get expanded to its value, which is what you need here:
sed -i "/$hostname/d" $hosts
There are improvements possible:
#!/bin/bash
hosts=/tmp/hosts
current_timestamp=$(date +%s)
while read line; do
set -- ${line/:/ }
hostname=$1
timestamp=$2
((diff = current_timestamp - timestamp))
if ((diff >= 3600)); then
echo "$hostname - Timestamp over an hour old. Deleting line."
sed -i "/^$hostname:/d" $hosts
fi
done <$hosts
The improvements:
More strict pattern in the sed command, to make it more robust and to avoid some potential errors
Simpler way to extract hostname part and timestamp part without any sub-shells
Simpler arithmetic operations by enclosing within ((...))
You ask for alternatives — use awk:
awk -F: -v ts=$(date +%s) '$2 <= ts-3600 { next }' $hosts > $hosts.$$
mv $hosts.$$ $hosts
The ts=$(date +%s) sets the awk variable ts to the value from date. The script skips any lines where the value in the second column (after the first colon) is smaller than the threshold. You could do the subtraction once in a BEGIN block if you wanted to. Decide whether <= or < is correct for your purposes.
If you need to know which lines are deleted, you can add
printf "Deleting %s - timestamp %d older than %d\n", $1, $2, (ts-3600) >/dev/stderr;
before the next to print the information on standard error. If you must write that to standard output, then you need to arrange for retained lines to be written to a file with print > file as an alternative action after the filter condition (passing -v file="$hosts.$$" as another pair of arguments to awk). The tweaks that can be made are endless.
If the file is of any significant size, it will be quicker to copy the relevant subsection of the file once to a temporary file and then to the final file than to edit the file in place multiple times as in the original code. If the file is small enough, there isn't a problem.

how to make a winmerge equivalent in linux

My friend recently asked how to compare two folders in linux and then run meld against any text files that are different. I'm slowly catching on to the linux philosophy of piping many granular utilities together, and I put together the following solution. My question is, how could I improve this script. There seems to be quite a bit of redundancy and I'd appreciate learning better ways to script unix.
#!/bin/bash
dir1=$1
dir2=$2
# show files that are different only
cmd="diff -rq $dir1 $dir2"
eval $cmd # print this out to the user too
filenames_str=`$cmd`
# remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different
tmp1=`echo "$filenames_str" | sed -n '/ differ$/p'`
# grab just the first filename for the lines of output
tmp2=`echo "$tmp1" | awk '{ print $2 }'`
# convert newlines sep to space
fs=$(echo "$tmp2")
# convert string to array
fa=($fs)
for file in "${fa[#]}"
do
# drop first directory in path to get relative filename
rel=`echo $file | sed "s#${dir1}/##"`
# determine the type of file
file_type=`file -i $file | awk '{print $2}' | awk -F"/" '{print $1}'`
# if it's a text file send it to meld
if [ $file_type == "text" ]
then
# throw out error messages with &> /dev/null
meld $dir1/$rel $dir2/$rel &> /dev/null
fi
done
please preserve/promote readability in your answers. An answer that is shorter but harder to understand won't qualify as an answer.
It's an old question, but let's work a bit on it just for fun, without thinking in the final goal (maybe SCM) nor in tools that already do this in a better way. Just let's focus in the script itself.
In the OP's script, there are a lot of string processing inside bash, using tools like sed and awk, sometimes more than once in the same command line or inside a loop executing n times (one per file).
That's ok, but it's necessary to remember that:
Each time the script calls any of those programs, it's created a new process in the OS, and that is expensive in time and resources. So the less programs are called, the better is the performance of script that is executing:
diff 2 times (1 just to print to user)
sed 1 time processing diff result and 1 time for each file
awk 1 time processing sed result and 2 times for each file (processing file result)
file 1 time for each file
That doesn't apply to echo, read, test and others that are builtin commands of bash, so no external program is executed.
meld is the final command that will display the files to user, so it doesn't count.
Even with the builtin commands, redirection pipelines | has a cost too, because the shell has to create pipes, duplicate handles, and maybe even creating forks of the shell (that is a process itself). So again: less is better.
The messages of diff command are locale dependants, so if the system is not in english, the whole script won't work.
Thinking that, let's clean a bit the original script, mantaining the OP's logic:
#!/bin/bash
dir1=$1
dir2=$2
# Set english as current language
LANG=en_US.UTF-8
# (1) show files that are different only
diff -rq $dir1 $dir2 |
# (2) remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different, delete all but left filename
sed '/ differ$/!d; s/^Files //; s/ and .*//' |
# (3) determine the type of file
file -i -f - |
# (4) for each file
while IFS=":" read file file_type
do
# (5) drop first directory in path to get relative filename
rel=${file#$dir1}
# (6) if it's a text file send it to meld
if [[ "$file_type" =~ "text/" ]]
then
# throw out error messages with &> /dev/null
meld ${dir1}${rel} ${dir2}${rel} &> /dev/null
fi
done
A little explaining:
Unique chain of commands cmd1 | cmd2 | ... where the output (stdout) of previous one is the input (stdin) of the next one.
Execute sed just once to execute 3 operations (separated with ;) in diff output:
Deleting lines ending with " differ"
Delete "Files " at the beginning of remaining lines
Delete from " and " to the end of remaining lines
Execute command file once to process the file list in stdin (option -f -)
Use the while bash sentence to read two values separated by : for each line line of stdin.
Use bash variable substitution to extract filename from a variable
Use bash test to compare a file type with a regular expression
For clarity reasons, I didn't considerate that file and directory names may have spaces. In such cases, both scripts will fail. To avoid that is necessary enclose in double quotes any reference to file/dir name variable.
I didn't use awk, because it is powerful enough that can replace almost the entire script ;-)

Resources