Replacing a parameter in a text file based on the latest filename in a directory - bash
I have the following text file (namelist.txt)
&share
wrf_core = 'ARW',
max_dom = 3,
start_date ='YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00',
end_date ='YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00',
interval_seconds = 21600,
io_form_geogrid = 2,
debug_level=0,
/
I want to replace the YYYY, MM, DD, and HH based on the latest filename of a directory.
For example:
An INPUT folder contains the following subdirectories:
2021021000
2021021006
2021021012
2021021018
2021021100
The latest directory from the above is 2021021100
I'm stuck here. The script should read the latest filename of the sub-directory inside the INPUT folder and do the following.
year=$(echo $line | cut -c1-4)
echo $year
month=$(echo $line | cut -c5-6)
echo $month
day=$(echo $line | cut -c7-8)
echo $day
hour=$echo $line | cut -c9-10)
echo $hour
sed -i 's/'YYYY'/'$year'/g' namelist.txt
sed -i 's/'MM'/'$month'/g' namelist.txt
sed -i 's/'DD'/'$day'/g' namelist.txt
sed -i 's/'HH'/'$hour'/g' namelist.txt
The desired output should be like this:
&share
wrf_core = 'ARW',
max_dom = 3,
start_date ='2021-02-11_00:00:00','2021-02-11_00:00:00','2021-02-11_00:00:00',
end_date ='2021-02-11_00:00:00','2021-02-11_00:00:00','2021-02-11_00:00:00',
interval_seconds = 21600,
io_form_geogrid = 2,
debug_level=0,
/
How can I do this in bash?
I'll appreciate any help on this.
Get the directory with the latest date
Bash's globs (* and so on) expand in a sorted order. If the subdirectories in your current working directory are only named in the style YYYYMMDDHH then */ will expand to a list of dates where the last date is at the end of the list. To retrieve only the last entry from that list you can use either an array, a function (using shift), or a command (for instance printf | tail). Here we go with the array:
#! /bin/bash
cd INPUT
dirs=(*/)
last="${dirs[-1]}"
cd -
If there are also other directories you can change the glob so that only directories of the format YYYYMMDDHH are accepted:
dirs=([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9]/)
Replacing the placeholders
You don't need four cut and four sed. The following should work as well:
sed -i "s/YYYY-MM-DD_HH/${last:0:4}-${last:4:2}-${last:6:2}_${last:8:2}/g" yourFile
GNU Awk is a possibility for this:
awk -v dat="2021021100" 'BEGIN { yr=substr(dat,1,4);mn=substr(dat,5,2);day=substr(dat,7,2);hr=substr(dat,9,2)} /YYYY-MM-DD_HH/ { gsub("YYYY-MM-DD_HH",yr"-"mn"-"day"_"hr,$0) }1' namelist.txt > namelist.tmp && mv -f namedlist.tmp namedlist.txt
Explanation:
awk -v dat="2021021100" # Pass the date as a variable dat to awk
'BEGIN {
yr=substr(dat,1,4); # Before processing the file, use substr to extract the time elements from dat
mn=substr(dat,5,2);
day=substr(dat,7,2);
hr=substr(dat,9,2)}
/YYYY-MM-DD_HH/ {
gsub("YYYY-MM-DD_HH",yr"-"mn"-"day"_"hr,$0) # When we find YYYY-MM-DD_HH" in the line, use gsub to substitute this for the yr,mn,day and hr.
}1' namedfile # Print all lines amended or otherwise
If you have more recent versions of GNU awk, can use the -i flag for "in file" changes as opposed to using a tmp file and actioning mv.
Related
How to send shell script output in a tablular form and send the mail
I am a shell script which will give few lines as a output. Below is the output I am getting from shell script. My script flow is like first it will check weather we are having that file, if I am having it should give me file name and modified date. If I am not having it should give me file name and not found in a tabular form and send email. Also it should add header to the output. CMC_daily_File.xlsx Not Found CareOneHMA.xlsx Jun 11 Output File Name Modified Date CMC_daily_File.xlsx Not Found CareOneHMA.xlsx Jun 11 UPDATE sample of script #!/bin/bash if [ -e /saddwsgnas/radsfftor/coffe/COE_daily_File.xlsx ]; then cd /sasgnas/radstor/coe/ ls -la COE_daily_File.xlsx | awk '{print $9, $6"_"$7}' else echo "CMC_COE_daily_File.xlsx Not_Found" fi Output CMC_COE_daily_File.xlsx Jun_11
I thought I might offer you some options with a slightly modified script. I use the stat command to obtain the file modification time in more expansive format, as well as specifying an arbitrary, pre-defined, spacer character to divide the column data. That way, you can focus on displaying the content in its original, untampered form. This would also allow the formatted reporting of filenames which contain spaces without affecting the logic for formatting/aligning columns. The column command is told about that spacer character and it will adjust the width of columns to the widest content in each column. (I only wish that it also allowed you to specify a column divider character to be printed, but that is not part of its features/functions.) I also added the extra AWK action, on the chance that you might be interested in making the results stand out more. #!/bin/sh #QUESTION: https://stackoverflow.com/questions/74571967/how-to-send-shell-script-output-in-a-tablular-form-and-send-the-mail SPACER="|" SOURCE_DIR="/saddwsgnas/radsfftor/coe" SOURCE_DIR="." { printf "File Name${SPACER}Modified Date\n" #for file in COE_daily_File.xlsx for file in test_55.sh awkReportXmlTagMissingPropertyFieldAssignment.sh test_54.sh do if [ -e "${SOURCE_DIR}/${file}" ]; then cd "${SOURCE_DIR}" #ls -la "${file}" | awk '{print $9, $6"_"$7}' echo "${file}${SPACER}"$(stat --format "%y" "${file}" | cut -f1 -d\. | awk '{ print $1, $2 }' ) else echo "${file}${SPACER}Not Found" fi done } | column -x -t -s "|" | awk '{ ### Refer to: # https://man7.org/linux/man-pages/man4/console_codes.4.html # https://www.ecma-international.org/publications-and-standards/standards/ecma-48/ if( NR == 1 ){ printf("\033[93;3m%s\033[0m\n", $0) ; }else{ print $0 ; } ; }' Without that last awk command, the output session for that script was as follows: ericthered#OasisMega1:/0__WORK$ ./test_55.sh File Name Modified Date test_55.sh 2022-11-27 14:07:15 awkReportXmlTagMissingPropertyFieldAssignment.sh 2022-11-05 21:28:00 test_54.sh 2022-11-27 00:11:34 ericthered#OasisMega1:/0__WORK$ With that last awk command, you get this:
Using SED to substitute regex match with variable value
I have a file with following lines: 2022-Nov-23 2021-Jul-14 I want to replace the month with its number, my script should accept the date as an argument, and I added these variables to it: Jan=01 Feb=02 Mar=03 Apr=04 May=05 Jun=06 Jul=07 Aug=08 Sep=09 Oct=10 Nov=11 Dec=12 How can I match the month name in the string with regex and substitute it based on the variables? here is what I have for now: echo "$1" | sed 's/(\w{3})/${\1}/' But it doesn't work.
With a file called months containing: Jan=01 Feb=02 Mar=03 Apr=04 May=05 Jun=06 Jul=07 Aug=08 Sep=09 Oct=10 Nov=11 Dec=12 And a script: #!/bin/sh sub() ( set -a . "${0%/*}/months" awk -F- -vOFS=- '{ $2 = ENVIRON[$2]; print }' ) printf 2022-Nov-23 | sub printf 2021-Jul-14 | sub The output is: 2022-11-23 2021-07-14
You might convert your data into sed script, that is create say file mon2num.sed with following content s/Jan/01/ s/Feb/02/ s/Mar/03/ s/Apr/04/ s/May/05/ s/Jun/06/ s/Jul/07/ s/Aug/08/ s/Sep/09/ s/Oct/10/ s/Nov/11/ s/Dec/12/ and having file.txt with content as follows 2022-Nov-23 2021-Jul-14 you might do sed -f mon2num.sed file.txt which gives output 2022-11-23 2021-07-14
BASH - Replace text in 'ini' file using variables
I'm expecting this to be an easy one for someone (alas not me!). Using a bash script, I want to replace a value in a config file '/etc/app/app.cfg' (ini style), using variables for both search and replace. The Value Name and Value I wish to update (note the space either side of the equals: LOGDIR = /etc/app/logs I have defined the following in the bash script: # Get existing LogDir value CURRENT_LOGDIR=$(grep 'LogDir =' /pathtofile | sed 's/LogDir *= *//g') # Set New LogDir LOGDIR=/mnt/eft/fs1/logs # Update LogDir if different if [[ -d $(echo $CURRENT_LOGDIR) != $LOGDIR ]] ; then # Update LogDir value: **bash command - I need help with !** fi I have tried many combinations with sed, to no avail, hence asking this question. Things I've tried: echo "LogDir = $LOGDIR" | sed '#s/$CURRENT_DIR/$LOGDIR/#g' /etc/app/app.cfg sed -i '/#/!s/\(LogDir[[:space:]]*=[[:space:]]*\)\(.*\)/\1$LOGDIR#/g' /etc/app/app.cfg sed -i 's/LogDir[[:space:]]=.*/LogDir = {LOGDIR}/' /etc/app/app.cfg sed -i "s/^LogDir[[:space:]]*=.*/LogDir=$LOGDIR/}" /etc/app/app.cfg sed -i '/#/!s/\(LogDir[[:space:]]*=[[:space:]]*\)\(.*\)/\1"$LOGDIR"/' /etc/app/app.cfg Desired output: Update LogDir value in /etc/app/app.cfg For example: LogDir = /mnt/eft/fs1/logs
What is that } doing on the end? Looks like a typo. sed "s/^LogDir[[:space:]]*=.*/LogDir=$LOGDIR/" /etc/app/app.cfg And sed edit file in place
Using sed $ LOGDIR=/mnt/eft/fs1/logs $ sed -i.bak "s|\([^=]*=[[:space:]]\).*|\1$LOGDIR|" /etc/app/app.cfg $ cat /etc/app/app.cfg LOGDIR = /mnt/eft/fs1/logs
How about this: awk '$1 == "LogDir" {print "LogDir = /mnt/eft/fs1/logs"; next} {print}' old_configuration_file >new_configuration_file The first awk clause replaces the old LogDir entry by the new one, and the second clause passes the other lines through unchanged.
How to process tr across all files in a directory and output to a different name in another directory?
mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n" which outputs: #.txt ag.txt bg.txt bh.txt bi.txt bid.txt dh.txt dw.txt er.txt ha.txt jo.txt kc.txt lfr.txt lg.txt ng.txt pb.txt r-c.txt rj.txt rw.txt se.txt sh.txt vr.txt wa.txt is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file: for i in *.txt; do sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics done No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing. Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution: $ awk ' FNR==1 { # for first record of every file close(f) # close previous file f f="path_to_dir/" FILENAME # new filename with path sub(/txt$/,"ics",f) } # replace txt with ics { gsub(/\|/,"\n") # replace | with \n print > f }' *.txt # print to new file
How to convert HHMMSS to HH:MM:SS Unix?
I tried to convert the HHMMSS to HH:MM:SS and I am able to convert it successfully but my script takes 2 hours to complete because of the file size. Is there any better way (fastest way) to complete this task Data File data.txt 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,071600, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,072200,072200, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,072600,072600, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073200,073200, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073500,073500, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,073700,073700, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,073900,073900, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,074400,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,090200, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,090900,090900, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,091500,091500, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,091900,091900, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092500,092500, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092900,092900, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,093200,093200, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,093500,093500, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,094500,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,170100, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,170400,170400, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,170700,170700, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171000,171000, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171500,171500, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,171900,171900, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172500,172500, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172900,172900, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,173500,173500, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,174100,, My code : script.sh #!/bin/bash awk -F"," '{print $5}' Data.txt > tmp.txt # print first line first string before , to tmp.txt i.e. all Numbers will be placed into tmp.txt sort tmp.txt | uniq -d > Uniqe_number.txt # unique values be stored to Uniqe_number.txt rm tmp.txt # removes tmp file while read line; do echo $line cat Data.txt | grep ",$line," > Numbers/All/$line.txt # grep Number and creats files induvidtually awk -F"," '{print $5","$4","$7","$8","$9","$10","$11}' Numbers/All/$line.txt > Numbers/All/tmp_$line.txt mv Numbers/All/tmp_$line.txt Numbers/Final/Final_$line.txt done < Uniqe_number.txt ls Numbers/Final > files.txt dos2unix files.txt bash time_replace.sh when you execute above script it will call time_replace.sh script My Code for time_replace.sh #!/bin/bash for i in `cat files.txt` do while read aline do TimeDep=`echo $aline | awk -F"," '{print $6}'` #echo $TimeDep finalTimeDep=`echo $TimeDep | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` #echo $finalTimeDep ########## TimeAri=`echo $aline | awk -F"," '{print $7}'` #echo $TimeAri finalTimeAri=`echo $TimeAri | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` #echo $finalTimeAri sed -i 's/',$TimeDep'/',$finalTimeDep'/g' Numbers/Final/$i sed -i 's/',$TimeAri'/',$finalTimeAri'/g' Numbers/Final/$i ############################ done < Numbers/Final/$i done Any better solution? Appreciate any help. Thanks Sri
If there's a large quantity of files, then the pipelines are probably what are going to impact performance more than anything else - although processes can be cheap, if you're doing a huge amount of processing then cutting down the amount of time you do pass data through a pipeline can reap dividends. So you're probably going to be better off writing the entire script in awk (or perl). For example, awk can send output to an arbitary file, so the while lop in your first file could be replaced with an awk script that does this. You also don't need to use a temporary file. I assume the sorting is just for tracking progress easily as you know how many numbers there are. But if you don't care for the sorting, you can simply do this: #!/bin/sh awk -F ',' ' { print $5","$4","$7","$8","$9","$10","$11 > Numbers/Final/Final_$line.txt }' datafile.txt ls Numbers/Final > files.txt Alternatively, if you need to sort you can do sort -t, -k5,4,10 (or whichever field your sort keys actually need to be). As for formatting the datetime, awk also does functions, so you could actually have an awk script that looks like this. This would replace both of your scripts above whilst retaining the same functionality (at least, as far as I can make out with a quick analysis) ... (Note! Untested, so may contain vauge syntax errors): #!/usr/bin/awk BEGIN { FS="," } function formattime (t) { return substr(t,1,2)":"substr(t,3,2)":"substr(t,5,2) } { print $5","$4","$7","$8","$9","formattime($10)","formattime($11) > Numbers/Final/Final_$line.txt } which you can save, chmod 700, and call directly as: dostuff.awk filename Other awk options include changing fields in-situ, so if you want to maintain the entire original file but with formatted datetimes, you can do a modification of the above. Change the print block to: { $10=formattime($10) $11=formattime($11) print $0 } If this doesn't do everything you need it to, hopefully it gives some ideas that will help the code.
It's not clear what all your sorting and uniq-ing is for. I'm assuming your data file has only one entry per line, and you need to change the 10th and 11th comma-separated fields from HHMMSS to HH:MM:SS. while IFS=, read -a line ; do echo -n ${line[0]},${line[1]},${line[2]},${line[3]}, echo -n ${line[4]},${line[5]},${line[6]},${line[7]}, echo -n ${line[8]},${line[9]}, if [ -n "${line[10]}" ]; then echo -n ${line[10]:0:2}:${line[10]:2:2}:${line[10]:4:2} fi echo -n , if [ -n "${line[11]}" ]; then echo -n ${line[11]:0:2}:${line[11]:2:2}:${line[11]:4:2} fi echo "" done < data.txt The operative part is the ${variable:offset:length} construct that lets you extract substrings out of a variable.
In Perl, that's close to child's play: #!/usr/bin/env perl use strict; use warnings; use English( -no_match_vars ); local($OFS) = ","; while (<>) { my(#F) = split /,/; $F[9] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[9]; $F[10] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[10]; print #F; } If you don't want to use English, you can write local($,) = ","; instead; it controls the output field separator, choosing to use comma. The code reads each line in the file, splits it up on the commas, takes the last two fields, counting from zero, and (if they're not empty) inserts colons in between the pairs of digits. I'm sure a 'Code Golf' solution would be made a lot shorter, but this is semi-legible if you know any Perl. This will be quicker by far than the script, not least because it doesn't have to sort anything, but also because all the processing is done in a single process in a single pass through the file. Running multiple processes per line of input, as in your code, is a performance disaster when the files are big. The output on the sample data you gave is: 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,07:16:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,07:26:00,07:26:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:32:00,07:32:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:35:00,07:35:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,07:37:00,07:37:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,07:39:00,07:39:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:44:00,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,09:02:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:09:00,09:09:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:15:00,09:15:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,09:19:00,09:19:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:25:00,09:25:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:29:00,09:29:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,09:32:00,09:32:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,09:35:00,09:35:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:45:00,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,17:01:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,17:04:00,17:04:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,17:07:00,17:07:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:10:00,17:10:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:15:00,17:15:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,17:19:00,17:19:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:25:00,17:25:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:29:00,17:29:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:35:00,17:35:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:41:00,,