Using SED to substitute regex match with variable value - bash

I have a file with following lines:
2022-Nov-23
2021-Jul-14
I want to replace the month with its number, my script should accept the date as an argument, and I added these variables to it:
Jan=01
Feb=02
Mar=03
Apr=04
May=05
Jun=06
Jul=07
Aug=08
Sep=09
Oct=10
Nov=11
Dec=12
How can I match the month name in the string with regex and substitute it based on the variables? here is what I have for now:
echo "$1" | sed 's/(\w{3})/${\1}/'
But it doesn't work.

With a file called months containing:
Jan=01
Feb=02
Mar=03
Apr=04
May=05
Jun=06
Jul=07
Aug=08
Sep=09
Oct=10
Nov=11
Dec=12
And a script:
#!/bin/sh
sub() (
set -a
. "${0%/*}/months"
awk -F- -vOFS=- '{ $2 = ENVIRON[$2]; print }'
)
printf 2022-Nov-23 | sub
printf 2021-Jul-14 | sub
The output is:
2022-11-23
2021-07-14

You might convert your data into sed script, that is create say file mon2num.sed with following content
s/Jan/01/
s/Feb/02/
s/Mar/03/
s/Apr/04/
s/May/05/
s/Jun/06/
s/Jul/07/
s/Aug/08/
s/Sep/09/
s/Oct/10/
s/Nov/11/
s/Dec/12/
and having file.txt with content as follows
2022-Nov-23
2021-Jul-14
you might do
sed -f mon2num.sed file.txt
which gives output
2022-11-23
2021-07-14

Related

How to get values in a line while looping line by line in a file (shell script)

I have a file which looks like this (file.txt)
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
I want to loop trough this line by line an extract the key values.
so the result should be like ,
AJGUIGIDH568
AJGUIGIDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
So I wrote a code like this to loop line by line and extract the value between {"key":" and ","rule": because key values is in between these 2 patterns.
while read p; do
echo $p | sed -n "/{"key":"/,/","rule":,/p"
done < file.txt
But this is not working. can someone help me to figure out me this. Thanks in advance.
Your sample input is almost valid json. You could tweak it to make it valid and then extract the values with jq with something like:
sed -e 's/squid/"squid/' -e 's/$/"}/' file.txt | jq -r .key
Or, if your actual input really is valid json, then just use jq:
jq -r .key file.txt
If the "random-txt" may include double quotes, making it difficult to massage the input to make it valid json, perhaps you want something like:
awk '{print $4}' FS='"' file.txt
or
sed -n '/{"key":"\([^"]*\).*/s//\1/p' file.txt
or
while IFS=\" read open_brace key colon val _; do echo "$val"; done < file.txt
For the shown data, you can try this awk:
awk -F '"[:,]"' '{print $2}' file
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
With the give example you can simple use
cut -d'"' -f4 file.txt
Assumptions:
there may be other lines in the file so we need to focus on just the lines with "key" and "rule"
the only text between "key" and "rule" is the desired string (eg, squid never shows up between the two patterns of interest)
Adding some additional lines:
$ cat file.txt
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
ignore this line}
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
ignore this line}
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
One sed idea:
$ sed -nE 's/^(.*"key":")([^"]*)(","rule".*)$/\2/p' file.txt
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
Where:
-E - enable extended regex support (and capture groups without need to escape sequences)
-n - suppress printing of pattern space
^(.*"key":") - [1st capture group] everything from start of line up to and including "key":"
([^"]*) - [2nd capture group] everything that is not a double quote (")
(","rule".*)$ - [3rd capture group] everything from ",rule" to end of line
\2/p - replace the line with the contents of the 2nd capture group and print

Replacing a parameter in a text file based on the latest filename in a directory

I have the following text file (namelist.txt)
&share
wrf_core = 'ARW',
max_dom = 3,
start_date ='YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00',
end_date ='YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00','YYYY-MM-DD_HH:00:00',
interval_seconds = 21600,
io_form_geogrid = 2,
debug_level=0,
/
I want to replace the YYYY, MM, DD, and HH based on the latest filename of a directory.
For example:
An INPUT folder contains the following subdirectories:
2021021000
2021021006
2021021012
2021021018
2021021100
The latest directory from the above is 2021021100
I'm stuck here. The script should read the latest filename of the sub-directory inside the INPUT folder and do the following.
year=$(echo $line | cut -c1-4)
echo $year
month=$(echo $line | cut -c5-6)
echo $month
day=$(echo $line | cut -c7-8)
echo $day
hour=$echo $line | cut -c9-10)
echo $hour
sed -i 's/'YYYY'/'$year'/g' namelist.txt
sed -i 's/'MM'/'$month'/g' namelist.txt
sed -i 's/'DD'/'$day'/g' namelist.txt
sed -i 's/'HH'/'$hour'/g' namelist.txt
The desired output should be like this:
&share
wrf_core = 'ARW',
max_dom = 3,
start_date ='2021-02-11_00:00:00','2021-02-11_00:00:00','2021-02-11_00:00:00',
end_date ='2021-02-11_00:00:00','2021-02-11_00:00:00','2021-02-11_00:00:00',
interval_seconds = 21600,
io_form_geogrid = 2,
debug_level=0,
/
How can I do this in bash?
I'll appreciate any help on this.
Get the directory with the latest date
Bash's globs (* and so on) expand in a sorted order. If the subdirectories in your current working directory are only named in the style YYYYMMDDHH then */ will expand to a list of dates where the last date is at the end of the list. To retrieve only the last entry from that list you can use either an array, a function (using shift), or a command (for instance printf | tail). Here we go with the array:
#! /bin/bash
cd INPUT
dirs=(*/)
last="${dirs[-1]}"
cd -
If there are also other directories you can change the glob so that only directories of the format YYYYMMDDHH are accepted:
dirs=([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9]/)
Replacing the placeholders
You don't need four cut and four sed. The following should work as well:
sed -i "s/YYYY-MM-DD_HH/${last:0:4}-${last:4:2}-${last:6:2}_${last:8:2}/g" yourFile
GNU Awk is a possibility for this:
awk -v dat="2021021100" 'BEGIN { yr=substr(dat,1,4);mn=substr(dat,5,2);day=substr(dat,7,2);hr=substr(dat,9,2)} /YYYY-MM-DD_HH/ { gsub("YYYY-MM-DD_HH",yr"-"mn"-"day"_"hr,$0) }1' namelist.txt > namelist.tmp && mv -f namedlist.tmp namedlist.txt
Explanation:
awk -v dat="2021021100" # Pass the date as a variable dat to awk
'BEGIN {
yr=substr(dat,1,4); # Before processing the file, use substr to extract the time elements from dat
mn=substr(dat,5,2);
day=substr(dat,7,2);
hr=substr(dat,9,2)}
/YYYY-MM-DD_HH/ {
gsub("YYYY-MM-DD_HH",yr"-"mn"-"day"_"hr,$0) # When we find YYYY-MM-DD_HH" in the line, use gsub to substitute this for the yr,mn,day and hr.
}1' namedfile # Print all lines amended or otherwise
If you have more recent versions of GNU awk, can use the -i flag for "in file" changes as opposed to using a tmp file and actioning mv.

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

How to split a string in bash delimited by tab

I'm trying to split a tab delimitted field in bash.
I am aware of this answer: how to split a string in shell and get the last field
But that does not answer for a tab character.
I want to do get the part of a string before the tab character, so I'm doing this:
x=`head -1 my-file.txt`
echo ${x%\t*}
But the \t is matching on the letter 't' and not on a tab. What is the best way to do this?
Thanks
If your file look something like this (with tab as separator):
1st-field 2nd-field
you can use cut to extract the first field (operates on tab by default):
$ cut -f1 input
1st-field
If you're using awk, there is no need to use tail to get the last line, changing the input to:
1:1st-field 2nd-field
2:1st-field 2nd-field
3:1st-field 2nd-field
4:1st-field 2nd-field
5:1st-field 2nd-field
6:1st-field 2nd-field
7:1st-field 2nd-field
8:1st-field 2nd-field
9:1st-field 2nd-field
10:1st-field 2nd-field
Solution using awk:
$ awk 'END {print $1}' input
10:1st-field
Pure bash-solution:
#!/bin/bash
while read a b;do last=$a; done < input
echo $last
outputs:
$ ./tab.sh
10:1st-field
Lastly, a solution using sed
$ sed '$s/\(^[^\t]*\).*$/\1/' input
10:1st-field
here, $ is the range operator; i.e. operate on the last line only.
For your original question, use a literal tab, i.e.
x="1st-field 2nd-field"
echo ${x% *}
outputs:
1st-field
Use $'ANSI-C' strings in the parameter expansion:
$ x=$'abc\tdef\tghi'
$ echo "$s"
abc def ghi
$ echo ">>${x%%$'\t'*}<<"
>>abc<<
read field1 field2 <<< ${tabDelimitedField}
or
read field1 field2 <<< $(command_producing_tab_delimited_output)
Use awk.
echo $yourfield | awk '{print $1}'
or, in your case, for the first field from the the last line of a file
tail yourfile | awk '{x=$1}END{print x}'
There is an easy way for a tab separated string : convert it to an array.
Create a string with tabs ($ added before for '\t' interpretation) :
AAA=$'ABC\tDEF\tGHI'
Split the string as an array using parenthesis :
BBB=($AAA)
Get access to any element :
echo ${BBB[0]}
ABC
echo ${BBB[1]}
DEF
echo ${BBB[2]}
GHI
x=first$'\t'second
echo "${x%$'\t'*}"
See QUOTING in man bash
The answer from https://stackoverflow.com/users/1815797/gniourf-gniourf hints at the use of built in field parsing in bash, but does not really complete the answer. The use of the IFS shell parameter to set the input field separate will complete the picture and give the ability to parse files which are tab-delimited, of a fixed number of fields, in pure bash.
echo -e "a\tb\tc\nd\te\tf" > myfile
while IFS='<literaltab>' read f1 f2 f3;do echo "$f1 = $f2 + $f3"; done < myfile
a = b + c
d = e + f
Where, of course, is replaced by a real tab, not \t. Often, Control-V Tab does this in a terminal.

Resources