Using sed to find, convert and replace lines

Using sed to find, convert and replace lines - bash

I don't know too much of bash scripting and I'm trying to develop a bash script to do this operations:
I have a lot of .txt files in the same directory.
Every .txt file follows this structure:
file1.txt:
<name>first operation</name>
<operation>21</operation>
<StartTime>1292435633</StartTime>
<EndTime>1292435640</EndTime>
<name>second operation</name>
<operation>21</operation>
<StartTime>1292435646</StartTime>
<EndTime>1292435650</EndTime>
I want to search every <StartTime> line and convert it to standard date/time format (not unix timestamp) but preserving the structure <StartTime>2010-12-15 22:52</StartTime>, for example. This could be a function of search/replace, using sed? I think I could use these function that I found: date --utc --date "1970-01-01 $1 sec" "+%Y-%m-%d %T"
I want to to do the same with <EndTime> tag.
I should do this for all *.txt files in a directory.
I tried using sed but with not wanted results. As I said I don't know so much of bash scripting so any help would be appreciated.
Thank you for your help!
Regards

sed is incapable of doing date conversions; instead I would reccomend you to use a more appropriate tool like awk:
echo '<StartTime>1292435633</StartTime>' | awk '{
match($0,/[0-9]+/);
t = strftime("%F %T",substr($0,RSTART,RLENGTH),1);
sub(/[0-9]+/,t)
}
{print}'
If your input files have one tag per line, as in your structure example, it should work flawlessly.
If you need to repeat the operation for every .txt file just use a shell for:
for file in *.txt; do
awk '/^<[^>]*Time>/{
match($0,/[0-9]+/);
t = strftime("%F %T",substr($0,RSTART,RLENGTH),1);
sub(/[0-9]+/,t)
} 1' "$file" >"$file.new"
# mv "$file.new" "$file"
done
In comparison to the previous code, I have done two minor changes:
added condition /^<[^>]*Time>/ that checks if the current line starts with or
converted {print} to the shorter '1'
If the files ending with .new contain the result you were expecting, you can uncomment the line containing mv.

Using grep:
while read line;do
if [[ $line == *"<StartTime>"* || $line == *"<EndTime>"* ]];then
n=$(echo $line | grep -Po '(?<=(>)).*(?=<)')
line=${line/$n/$(date -d #$n)}
fi
echo $line >> file1.new.txt
done < file1.txt
$ cat file1.new.txt
<name>first operation</name>
<operation>21</operation>
<StartTime>Wed Dec 15 18:53:53 CET 2010</StartTime>
<EndTime>Wed Dec 15 18:54:00 CET 2010</EndTime>
<name>second operation</name>
<operation>21</operation>
<StartTime>Wed Dec 15 18:54:06 CET 2010</StartTime>
<EndTime>Wed Dec 15 18:54:10 CET 2010</EndTime>

Related

Append number of days since the date in the line to each line in the file using Bash

I have a file that consists of the following...
false|aaa|user|aaa001|2014-12-11|
false|bbb|user|bbb||
false|ccc|user|ccc|2021-10-19|
false|ddd|user|ddd|2018-11-16|
false|eee|user|eee|2020-06-02|
I want to use the date in the 5th column to calculate the number of days from the current date and append it to each line in the file.
The end result would be a file that looks like the following, assuming the current date is 1/13/2022...
false|aaa|user|aaa001|2014-12-11|2590
false|bbb|user|bbb||
false|ccc|user|ccc|2021-10-19|86
false|ddd|user|ddd|2018-11-16|1154
false|eee|user|eee|2020-06-02|590
Some lines in the file will not contain a date value (which is expected). I need a solution for a Bash script on Linux.
I am able to submit a command using echo for a single line and then calculate the number of days from the current date by using cut on the 5th field (see below)...
echo "false|aaa|user|aaa001|2014-12-11" | echo $(( ($(date --date=date +"%Y-%m-%d" +%s) - $(date --date=cut -d'|' -f5 +%s) )/(60*60*24) ))
2590
I don't know how to do this one line at a time, capture the 'number of days' value and then append it to each line in the file.

Here's an approach using
paste to append the outputs
sed to arrange the empty lines and
awk to calculate the desired days.
This works with GNU date. BSD date has to use something like date -jf x +%s.
EDIT: Updated the date to compare with to current day.
% current=$(date +%m/%d/%Y)
% paste -d"\0" file <(cut -d"|" -f5 file |
sed 's/^$/#/' |
xargs -Ix date -d x +%s 2>&1 |
awk -v cur="$(date -d "$current" +%s)" '/invalid/{print 0; next}
{print int((cur-$1)/3600/24)}')
false|aaa|user|aaa001|2014-12-11|2590
false|bbb|user|bbb||0
false|ccc|user|ccc|2021-10-19|86
false|ddd|user|ddd|2018-11-16|1154
false|eee|user|eee|2020-06-02|590
Also date returns date: invalid date ‘#’ in the empty case. If any other implementation behaves differently the awk regex has to be adjusted accordingly.
Data
% cat file
false|aaa|user|aaa001|2014-12-11|
false|bbb|user|bbb||
false|ccc|user|ccc|2021-10-19|
false|ddd|user|ddd|2018-11-16|
false|eee|user|eee|2020-06-02|

Make cat command to operate recursively looping through a directory

I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.
To begin, I have written the following code to separate the relevant data into easy chunks:
#!/bin/bash
destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done
This worked perfectly, so the next step is the worlds simplest cat command:
cat $file3 $file2 > outfile
However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.
See how these files are all sequential over time:
*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14
So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.
This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.
Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.

"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."
If I understand what you want correctly, it can be done with one line of code:
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.
Example
To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:
$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15
Here is the result of running our command:
$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15
As you can see, the first two lines of each file have been transferred to the preceding file.
How it works
awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.
NR==1 || FNR==16{f=FILENAME ".new"}
When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.
For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.
print>f
This prints the current line to file f.
(If this was a shell script, we would use >>. This is not a shell script. This is awk.)
Using a glob to specify the file names
destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' "$destination"*.ascii

Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:
stuff_20090413T235945_20090414T235944_end
| d1 | | d2 |
then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.
#!/bin/bash
for i in *end; do ## find all _end files
d1="${i#*stuff_}" ## isolate first date in filename
d1="${d1%%T*}"
d2="${i%T*}" ## isolate second date
d2="${d2##*_}"
d0=$((d1 - 1)) ## subtract 1 from first, get snip d1
prev="${i/$d1/$d0}" ## create previous 'snip' filename
prev="${prev/$d2/$d1}"
prev="${prev%end}snip"
if [ -f "$prev" ] ## test that prev snip file exists
then
printf "paste to : %s\n" "$prev"
printf " from : %s\n\n" "$i"
fi
done
Test Input Files
$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip
Example Use/Output
$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
from : stuff_20090414T235945_20090415T235944_end
paste to : stuff_20090414T235945_20090415T235944_snip
from : stuff_20090415T235945_20090416T235944_end
paste to : stuff_20090415T235945_20090416T235944_snip
from : stuff_20090416T235945_20090417T235944_end
paste to : stuff_20090416T235945_20090417T235944_snip
from : stuff_20090417T235945_20090418T235944_end
paste to : stuff_20090417T235945_20090418T235944_snip
from : stuff_20090418T235945_20090419T235944_end
(of course replace stuff_ with your actual prefix)
Let me know if you have questions.

You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):
#!/bin/bash
destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
if [ -z "$prev" ]; then
cat $prev $file2 > outfile
fi
prev=$file3
done

Bash script: using variables / parameter in sed

I am trying to write a little bash script, where you can specify a number of minutes and it will show the lines of a log file from those last X minutes.
To get the lines, I am using sed
sed -n '/time/,/time/p' LOGFILE
On CLI this works perfectly, in my script however, it does not.
# Get date
now=$(date "+%Y-%m-%d %T")
# Get date minus X number of minutes -- $1 first argument, minutes
then=$(date -d "-$1 minutes" +"%Y-%m-%d %T")
# Filter logs -- $2 second argument, filename
sed -n '/'$then'/,/'$now'/p' $2
I have tried different approaches and none of them seem to work:
result=$(sed -n '/"$then"/,/"$now"/p' $2)
sed -n "/'$then'/,/'$now'/p" "$2"
sed -n "/$then/,/$now/p" $2
sed -n "/$then/,/$now/p" "$2
Any sugesstions?
I am on Debian 5, echo $SHELL says /bin/sh
EDIT : The script produces no output, so there is no error showing up.
In the logfile every entry starts with a date like this 2013-05-15 14:21:42,794

I assume that the main problem is that you try to perform an arithmetic comparison by string matching. sed -n '/23/,/27/p' gives you the lines between the first line that contains 23 and the next line that contains 27 (and then again from the next line that contains 23 to the next line that contains 27, and so on). It does not give you all lines that contain a number between 23 and 27. If the input looks like
19
22
24
26
27
30
it does not output anything (since there is no 23). An awk solution that uses string matching has the same problem. So, unless your then date string occurs verbatim in the log file, your method will fail. You have to convert your date strings into numbers (drop the -, <space>, and :) and then check whether the resulting number is in the right range, using an arithmetical comparison rather than a string match. This goes beyond the capabilities of sed; awk and perl can do it rather easily. Here is a perl solution:
#!/bin/bash
NOW=$(date "+%Y%m%d%H%M%S")
THEN=$(date -d "-$1 minutes" "+%Y%m%d%H%M%S")
perl -wne '
if (m/^(....)-(..)-(..) (..):(..):(..)/) {
$date = "$1$2$3$4$5$6";
if ($date >= '"$THEN"' && $date <= '"$NOW"') {
print;
}
}' "$2"

Don't give yourself a headache with nested quotes. Use the -v option with awk to pass the value of a shell variable into the script:
#!/bin/bash
# Get date
now=$(date "+%Y-%m-%d %T")
# Get date minus X number of minutes -- $1 first argument, minutes
delta=$(date -d "-$1 minutes" +"%Y-%m-%d %T")
# Filter logs -- $2 second argument, filename
awk -v n="$now" -v d="$delta" '$0~n,$0~d' $2
Also don't use variable names of shell builtins i.e then.

UNIX shell-scripting: Split a textfile by its entries

I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:
20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000
I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines for example are from 2009, january 18th).
How can I split this file into pieces according to the date?
The number of entries per date differs, so using split with a constant number won't work.
Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?
Thanks in advance,
Alex

Using awk:
awk '{print > "data"substr($1,0,8)".dat"}' myfile

This should work if the items are in date sequence:
date=20090101 # Change to the earliest date
while IFS= read -rd $'\n' line
do
if [ "$(echo "$line" | cut -d ' ' -f 1 | cut -c 1-8)" -eq $date ]
then
echo "$line" >> "$date.dat"
else
let date++
fi
done < log.dat

With the caveats that each day needs to have more than 1 record,
and that the output file will have blank lines:
uniq --all-repeated=separate -w8 file | csplit -s - '/^$/' '{*}'
We really should have an option to uniq to output even uniq records.
Also csplit should have an option to suppress the matched line.

Filtering Filenames with bash

I have a directory full of log files in the form
${name}.log.${year}{month}${day}
such that they look like this:
logs/
production.log.20100314
production.log.20100321
production.log.20100328
production.log.20100403
production.log.20100410
...
production.log.20100314
production.log.old
I'd like to use a bash script to filter out all the logs older than x amount of month's and dump it into *.log.old
X=6 #months
LIST=*.log.*;
for file in LIST; do
is_older = file_is_older_than_months( ${file}, ${X} );
if is_older; then
cat ${c} >> production.log.old;
rm ${c};
fi
done;
How can I get all the files older than x months? and... How can I avoid that *.log.old file is included in the LIST attribute?

The following script expects GNU date to be installed. You can call it in the directory with your log files with the first parameter as the number of months.
#!/bin/sh
min_date=$(date -d "$1 months ago" "+%Y%m%d")
for log in *.log.*;do
[ "${log%.log.old}" "!=" "$log" ] && continue
[ "${log%.*}.$min_date" "<" "$log" ] && continue
cat "$log" >> "${log%.*}.old"
rm "$log"
done

Presumably as a log file, it won't have been modified since it was created?
Have you considered something like this...
find ./ -name "*.log.*" -mtime +60 -exec rm {} \;
to delete files that have not been modified for 60 days. If the files have been modified more recently then this is no good of course.

You'll have to compare the logfile date with the current date. Start with the year, multiply by 12 to get the difference in months. Do the same with months, and add them together. This gives you the age of the file in months (according to the file name).
For each filename, you can use an AWK filter to extract the year:
awk -F. '{ print substr($3,0,4) }'
You also need the current year:
date "+%Y"
To calculate the difference:
$(( current_year - file_year ))
Similarly for months.

assuming you have possibility of modifying the logs and the filename timestamp is the more accurate one. Here's an gawk script.
#!/bin/bash
awk 'BEGIN{
months=6
current=systime() #get current time in sec
sec=months*30*86400 #months in sec
output="old.production" #output file
}
{
m=split(FILENAME,fn,".")
yr=substr(fn[m],0,4)
mth=substr(fn[m],5,2)
day=substr(fn[m],7,2)
t=mktime(yr" "mth" "day" 00 00 00")
if ( (current-t) > sec){
print "file: "FILENAME" more than "months" month"
while( (getline line < FILENAME )>0 ){
print line > output
}
close(FILENAME)
cmd="rm \047"FILENAME"\047"
print cmd
#system(cmd) #uncomment to use
}
}' production*

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using sed to find, convert and replace lines - bash

Related

Append number of days since the date in the line to each line in the file using Bash

Make cat command to operate recursively looping through a directory

Bash script: using variables / parameter in sed

UNIX shell-scripting: Split a textfile by its entries

Filtering Filenames with bash

Categories

Resources