Delete lines in file over an hour old using timestamps bash - bash

Having a bit of bother trying to get the following to work.
I have a file containing hostname:timestamp as below:
hostname1:1445072150
hostname2:1445076364
I am trying to create a bash script that will query this file (using a cron job) to check if the timestamp is over 1 hour old and if so, remove the line.
Below is what I have so far but it doesn't appear to be removing the line in the file.
#!/bin/bash
hosts=/tmp/hosts
current_timestamp=$(date +%s)
while read line; do
hostname=`echo $line | sed -e 's/:.*//g'`
timestamp=`echo $line | cut -d ":" -f 2`
diff=$(($current_timestamp-$timestamp))
if [ $diff -ge 3600 ]; then
echo "$hostname - Timestamp over an hour old. Deleting line."
sed -i '/$hostname/d' $hosts
fi
done <$hosts
I have managed to get the timestamp part working correctly in identifying hosts that are over an hour old but having trouble removing the time from the file.
I suspect it may be due to the while loop keeping the file open but not 100% sure how to work around it. Also tried making a copy of the file and editing that but still nothing.
ALTERNATIVELY: If there is a better way to get this to work and produce the same result, I am open to suggestions :)
Any help would be much appreciated.
Cheers

The problem in your script was just this line:
sed -i '/$hostname/d' $hosts
Variables inside single-quotes are not expanded to their values,
so the command is trying to replace literally "$hostname", instead of its value. If you replace the single-quotes with double-quotes,
the variable will get expanded to its value, which is what you need here:
sed -i "/$hostname/d" $hosts
There are improvements possible:
#!/bin/bash
hosts=/tmp/hosts
current_timestamp=$(date +%s)
while read line; do
set -- ${line/:/ }
hostname=$1
timestamp=$2
((diff = current_timestamp - timestamp))
if ((diff >= 3600)); then
echo "$hostname - Timestamp over an hour old. Deleting line."
sed -i "/^$hostname:/d" $hosts
fi
done <$hosts
The improvements:
More strict pattern in the sed command, to make it more robust and to avoid some potential errors
Simpler way to extract hostname part and timestamp part without any sub-shells
Simpler arithmetic operations by enclosing within ((...))

You ask for alternatives — use awk:
awk -F: -v ts=$(date +%s) '$2 <= ts-3600 { next }' $hosts > $hosts.$$
mv $hosts.$$ $hosts
The ts=$(date +%s) sets the awk variable ts to the value from date. The script skips any lines where the value in the second column (after the first colon) is smaller than the threshold. You could do the subtraction once in a BEGIN block if you wanted to. Decide whether <= or < is correct for your purposes.
If you need to know which lines are deleted, you can add
printf "Deleting %s - timestamp %d older than %d\n", $1, $2, (ts-3600) >/dev/stderr;
before the next to print the information on standard error. If you must write that to standard output, then you need to arrange for retained lines to be written to a file with print > file as an alternative action after the filter condition (passing -v file="$hosts.$$" as another pair of arguments to awk). The tweaks that can be made are endless.
If the file is of any significant size, it will be quicker to copy the relevant subsection of the file once to a temporary file and then to the final file than to edit the file in place multiple times as in the original code. If the file is small enough, there isn't a problem.

Related

Iterating with awk over some thousend files and writing to the same files in one or two runs

I have a lot of files in their own directory. All have the same name structure:
2019-10-18-42-IV-Friday.md
2019-10-18-42-IV-Saturday.md
2019-10-18-42-IV-Sunday.md
2019-10-18-43-43-IV-Monday.md
2019-10-18-42-IV Tuesday.md
and so on.
This is in detail:
yyyy-mm-dd-dd-week of year-actual quarter-day of week.md
I want to write one line to each file as a second line:
With awk I want to extract and expand the dates from the file name and then write them to the appropriate file.
This is the point where I fail.
%!awk -F"-"-" '{print "Today is $6 ", the " $3"."$2"."$1", Kw "$4", in the" $5 ". Quarter."}'
That works well, I get the sentence I want to write into the files.
So put the whole thing in a loop:
ze.sh
#!/bin/bash
for i in *.md;
j = awk -F " " '{ print "** Today is " $6 ", the" $3"." $2"." $1", Kw " $4 ", in the " $5 ". Quarter. **"}' $i
Something with CAT, I suppose.
end
What do I have to do to make variable i iterate over all files, extract the values for j from $i, and then write $j to the second line of each file?
Thanks a lot for your help.
[Using manjaro linux and bash]
GNU bash, Version 5.0.11(1)-release (x86_64-pc-linux-gnu)
Linux version 5.2.21-1-MANJARO
Could you please try following(haven't tested it, GNU awk is needed for this). For writing date on 2nd line, I have chosen same format in which your Input_file has date in it.
awk -i inplace '
FNR==2{
split(FILENAME,array,"-")
print array[1]"-"array[2]"-"array[3]
}
1
' *.md
If possible try without -i inplace option first so that changes will not be saved into Input_file and once you are Happy with results then you can add it as shown above to code to make inplace changes into Input_file.
For inplace update supported awk versions see James sir's posted link.
Save modifications in place with awk
For updating a file in-place, sed is better suited than awk, because:
You don't need a recent version, older versions can do it too
Can work in both GNU and BSD flavors -> more portable
But first, to split a filename to its parts, you don't need an extra process, the read builtin can do it too. From your examples, we need to extract year, month, day, week numbers, a quarter string, and a weekday name string:
2019-10-18-42-IV-Friday.md
2019-10-18-42-IV-Saturday.md
2019-10-18-42-IV-Sunday.md
2019-10-18-43-43-IV-Monday.md
2019-10-18-42-IV Tuesday.md
For the first 3 lines, this simple expression would work:
IFS=-. read year month day week q dayname rest <<< "$filename"
The last line has a space before the weekday name instead of a -, but that's easy to fix:
IFS='-. ' read year month day week q dayname rest <<< "$filename"
Line 4 is harder to fix, because it has a different number of fields. To handle the extra field, we should add an extra variable term:
IFS='-. ' read year month day week q dayname ext rest <<< "$filename"
And then, if we can assume that the second 43 on that line can be ignored and we can just shift the arguments, then we use a conditional on the value of $ext.
That is, for most lines the value of ext will be md (the file extension).
If the value is different that means we have an extra field, and we should shift the values:
if [[ $ext != "md" ]; then
q=$dayname
dayname=$ext
fi
Now, we can use the variables to format the line you want to insert into the file:
line="Today is $dayname, the $day.$month.$year, Kw $week, in the $q. Quarter."
Finally, we can formulate a sed statement, for example to append our custom formatted line after the first one, ideally in a way that will work with both GNU and BSD flavors of sed.
This will work equivalently with both GNU and BSD versions:
sed -i.bak -e "1 a\\"$'\n'"$line"$'\n' "$filename" && rm *.bak
Notice that .bak backup files are created that must be manually removed.
If you don't want backup files to be created, then I'm afraid you need to use slightly different format for GNU and BSD flavors:
# GNU
sed -i'' -e "1 a\\"$'\n'"$line"$'\n' "$filename"
# BSD
sed -i '' -e "1 a\\"$'\n'"$line"$'\n' "$filename"
In fact if you only need to support GNU flavor, then a simpler form will work too:
sed -i'' "1 a$line" "$filename"
You can put all of that together in a for filename in *.md; do ...; done loop.
You probably want to feed the file name into the AWK script, using the '-' to separate the components.
This script assume the second line need to be appended the AWK output to the file:
for i in *.md ; do
echo $i | awk -F- 'AWK COMMAND HERE' >> $i
done
If the new text has to be inserted (as the second line) into the new file, the sed program can be used to perform update the file (using in-place edit '-i'). Something like
for i in *.md ; do
mark=$(echo $i | awk -F- 'AWK COMMAND HERE')
sed -i -e "2i$mark" $i
done
This is the best solution for me, especially because it copes with the different delimiters.
Many thanks to everyone who was interested in this question and especially to those who posted solutions.
I wish I hadn't made it so hard because I mistyped the example data.
This is now "my" variant of the solution:
for filename in *.md; do
IFS='-. ' read year month day week q dayname rest <<< "$filename"
line="Today is $dayname, the $day.$month.$year, Kw $week, in the $q. Quarter."
sed -i.bak -e "1 a\\"$'\n'"$line"$'\n' "$filename" && rm *.bak;
done
Because of the multiple field separators, the result is best to use.
But perhaps I am wrong, and the other solutions also offer the possibility of using different separators: At least '-' and '.' are required.
I am very surprised and pleased how quickly I received very good answers as a newcomer. Hopefully I can give something back.
And I'm also amazed how many different solutions are possible for the problems that arise.
If anyone is interested in what I've done, read on here:
I've had a fatal autoimmune disease for two years. Little by little, my brain is destroyed, intermittently.
Especially my memory has suffered a lot; I often don't remember what I did yesterday, learned what still has to be done.
That's why I created day files until 31.12.2030, with a markdown template for each day. There I then record what I have done and learned on those days and what still has to be done.
It was important to me to have the correct date within the individual file. Why no database, why markdown?
I want to have a format that I can use anywhere, on any device and with any OS. A format that doesn't belong to a company, that can change it or make it more expensive, that can take it off the market or limit it with licenses.
It's fast enough. The changes to 4,097 files as described above took less than 2 seconds on my i5 laptop (12 GB Ram, SSD).
Searching with fzf over all files is also very fast. I can simply have the files converted and output as what I just need.
My memory won't come back from this, but I have a chance to log what I forgot.
Thank you very much for your help and attention.

Delete lines in file that have a date older than x

I can read an entire file into memory like so:
#!/bin/bash
filename='peptides.txt'
filelines=`cat $filename`
ten_days_ago="$(date)"
for line in $filelines ; do
date_of="$(echo "$line" | jq -r '.time')"
if [[ "$ten_days_ago" > "$date_of" ]]; then
# delete this line
fi
done
the problem is:
I may not want to read the whole file into memory
If I stream it line by line with bash, how can I store which line to delete from? I would delete lines 0 to x, where line x has a date equal to 10 days ago.
A binary search would be appropriate here - so maybe bash is not a good solution to this? I would need to find the number of lines in the file, divide by two and go to that line.
You can use binary search only if the file is sorted.
You do not need to read the whole file into memory; you can process it line by line:
while read line
do
....
done <$filename
And: Yes, I personally would not use shell scripting for this kind of problems, but this is of course a matter of taste.
You didn't show what the input file looks like but judging by your jq its JSON data.
With that said this is how i would do it
today=$(date +%j)
tenDaysAgo=$(date --date="10 day ago" +%j)
#This is where you would create the data for peptides.txt
#20 spaces away there is a date stamp so it doesn't distract you
echo "Peptides stuff $today" >> peptides.txt
while read pepStuff; do
if [ $pepStuff == $tenDaysAgo ]; then
sed -i "/.*$pepStuff/d" peptides.txt
fi
done < <(awk '{print $3}' peptides.txt)

Incrementing Numbers & Counting with sed syntax

I am trying to wrap my head around sed and thought it would be best to try using something simple yet useful. At work I want to keep count on a small LCD display each time a specific script is run by users. I am currently doing this with a total count using the following syntax:
oldnum=`cut -d ':' -f2 TotalCount.txt`
newnum=`expr $oldnum + 1`
sed -i "s/$oldnum\$/$newnum/g" TotalCount.txt
This modifies the file that has this one line in it:
Total Recordings:0
Now I want to elaborate a little and increment the numbers starting at midnight and resetting to zero at 23:59:59 each day. I created a secondary .txt file for the display to read from with only one single line in it:
Total Recordings Today:0
But the syntax is not going to be the same. How must the above sed syntax be changed to change the number in the dialog of the second file?
I can change and reset the files using sed/bash in conjunction with a simple cron job on a schedule. The problem is that I can't figure out the syntax of sed to replicate the same effect as I originally got to work. Can anyone help please, I have been reading for hours on this, finally decided to post this and just make a pot of coffee. I have a 4 line LCD and would love to track counts across schedules if it is easy enough to learn the syntax.
sed should work fine for doing increments on both Total Recordings:, or Total Recordings Today: in your file since it's looking for the same pattern. To reset it each day at a certain time I would recommend a cronjob.
0 0 * * * echo \"Total Recordings Today:0\" > /path/to/TotalCount.txt >/dev/null 2>&1
The other things I would encourage is to use the newer style syntax $( ... ) for the shell expansion, and create a variable for your TotalCount.txt file.
#!/bin/bash
totals=/path/to/TotalCount.txt
oldnum=$(cut -d ':' -f2 "$totals")
newnum=$((oldnum + 1))
sed -i "s/$oldnum\$/$newnum/g" "$totals"
This way you can easily reuse it for whatever else you want to do with it, quote it properly and simplfy your code. Note: on OS X sed inplace expansion would need to be sed -i ''.
Whenever in doubt, http://shellcheck.net is a really nice tool to help find mistakes in your code.
although you're looking for a sed solution, cannot resist to post how it can be done in awk
$ awk -F: -v OFS=: '{$2++}1' file > temp && temp > file
-F: set the input field delimiter and -v OFS=: output field delimiter to :, awk parses the second field and increments by one, 1 is a shorthand for print (can be replaced with any "true" value); output will be written to a temp file and if successful will overwrite the original input file (to mimic in-place edit).
Sed is a fine tool, but notoriously not the best for arithmetic. You could make what you already have work by initializing the counter to zero prior to incrementing it, if the file was not last modified today (or does not exist):
[ `date +%Y-%m-%d` != "`stat --printf %z TotalCount.txt 2> /dev/null|cut -d ' ' -f 1`" ] && echo "Total Recordings Today:0" > TotalCount.txt
To do same with shifts, you would likely calculate shift "ordinal number" by subtracting first shift start since midnight (say 7 * 3600) from seconds since epoch (which is a midnight) and dividing by length of shift (8 * 3600) and initialize the counter if that changes. Something like:
[ $(((`date +%s` - 7 * 3600) / (8 * 3600))) -gt $(((`stat --printf %Z TotalCount.txt 2> /dev/null` - 7 * 3600) / (8 * 3600))) ] && echo "Total Recordings This Shift:0" > TotalCount.txt

How to get line WITH tab character using tail and head

I have made a script to practice my Bash, only to realize that this script does not take tabulation into account, which is a problem since it is designed to find and replace a pattern in a Python script (which obviously needs tabulation to work).
Here is my code. Is there a simple way to get around this problem ?
pressure=1
nline=$(cat /myfile.py | wc -l) # find the line length of the file
echo $nline
for ((c=0;c<=${nline};c++))
do
res=$( tail -n $(($(($nline+1))-$c)) myfile.py | head -n 1 | awk 'gsub("="," ",$1){print $1}' | awk '{print$1}')
#echo $res
if [ $res == 'pressure_run' ]
then
echo "pressure_run='${pressure}'" >> myfile_mod.py
else
echo $( tail -n $(($nline-$c)) myfile.py | head -n 1) >> myfile_mod.py
fi
done
Basically, it finds the line that has pressure_run=something and replaces it by pressure_run=$pressure. The rest of the file should be untouched. But in this case, all tabulation is deleted.
If you want to just do the replacement as quickly as possible, sed is the way to go as pointed out in shellter's comment:
sed "s/\(pressure_run=\).*/\1$pressure/" myfile.py
For Bash training, as you say, you may want to loop manually over your file. A few remarks for your current version:
Is /myfile.py really in the root directory? Later, you don't refer to it at that location.
cat ... | wc -l is a useless use of cat and better written as wc -l < myfile.py.
Your for loop is executed one more time than you have lines.
To get the next line, you do "show me all lines, but counting from the back, don't show me c lines, and then show me the first line of these". There must be a simpler way, right?
To get what's the left-hand side of an assignment, you say "in the first space-separated field, replace = with a space , then show my the first space separated field of the result". There must be a simpler way, right? This is, by the way, where you strip out the leading tabs (your first awk command does it).
To print the unchanged line, you do the same complicated thing as before.
A band-aid solution
A minimal change that would get you the result you want would be to modify the awk command: instead of
awk 'gsub("="," ",$1){print $1}' | awk '{print$1}'
you could use
awk -F '=' '{ print $1 }'
"Fields are separated by =; give me the first one". This preserves leading tabs.
The replacements have to be adjusted a little bit as well; you now want to match something that ends in pressure_run:
if [[ $res == *pressure_run ]]
I've used the more flexible [[ ]] instead of [ ] and added a * to pressure_run (which must not be quoted): "if $res ends in pressure_run, then..."
The replacement has to use $res, which has the proper amount of tabs:
echo "$res='${pressure}'" >> myfile_mod.py
Instead of appending each line each loop (and opening the file each time), you could just redirect output of your whole loop with done > myfile_mod.py.
This prints literally ${pressure} as in your version, because it's single quoted. If you want to replace that by the value of $pressure, you have to remove the single quotes (and the braces aren't needed here, but don't hurt):
echo "$res=$pressure" >> myfile_mod.py
This fixes your example, but it should be pointed out that enumerating lines and then getting one at a time with tail | head is a really bad idea. You traverse the file for every single line twice, it's very error prone and hard to read. (Thanks to tripleee for suggesting to mention this more clearly.)
A proper solution
This all being said, there are preferred ways of doing what you did. You essentially loop over a file, and if a line matches pressure_run=, you want to replace what's on the right-hand side with $pressure (or the value of that variable). Here is how I would do it:
#!/bin/bash
pressure=1
# Regular expression to match lines we want to change
re='^[[:space:]]*pressure_run='
# Read lines from myfile.py
while IFS= read -r line; do
# If the line matches the regular expression
if [[ $line =~ $re ]]; then
# Print what we matched (with whitespace!), then the value of $pressure
line="${BASH_REMATCH[0]}"$pressure
fi
# Print the (potentially modified) line
echo "$line"
# Read from myfile.py, write to myfile_mod.py
done < myfile.py > myfile_mod.py
For a test file that looks like
blah
test
pressure_run=no_tab
blah
something
pressure_run=one_tab
pressure_run=two_tabs
the result is
blah
test
pressure_run=1
blah
something
pressure_run=1
pressure_run=1
Recommended reading
How to read a file line-by-line (explains the IFS= and -r business, which is quite essential to preserve whitespace)
BashGuide

Grep outputs multiple lines, need while loop

I have a script which uses grep to find lines in a text file (ics calendar to be specific)
My script finds a date match, then goes up and down a few lines to copy the summary and start time of the appointment into a separate variable. The problem I have is that I'm going to have multiple appointments at the same time, and I need to run through the whole process for each result in grep.
Example:
LINE=`grep -F -n 20130304T232200 /path/to/calendar.ics | cut -f1 d:`
And it outputs only the lines, such as
86 89
Then it goes on to capture my other variables, as such:
SUMMARYLINE=$(( $LINE + 5 ))
SUMMARY:`sed -n "$SUMMARYLINE"p /path/to/calendar.ics
my script runs fine with one output, but it obviously won't work with more than 1 and I need for it to. should I send the grep results into an array? a separate text file to read from? I'm sure I'll need a while loop in here somehow. Need some help please.
You can call grep from a loop quite easily:
while IFS=':' read -r LINE notused # avoids the use of cut
do
# First field is now in $LINE
# Further processing
done < <(grep -F -n 20130304T232200 /path/to/calendar.ics)
However, if the file is not too large then it might be easier to read the whole file into an array and more around that.
With your proposed solution, you are reading through the file several times. Using awk, you can do it in one pass:
awk -F: -v time=20130304T232200 '
$1 == "SUMMARY" {summary = substr($0,9)}
/^DTSTART/ {start = $2}
/^END:VEVENT/ && start == time {print summary}
' calendar.ics

Resources