replace string with exact match in bash script - bash

I have a many repeated content as give below in a file . These are only uniq content.
CHECKSUM="Y"
CHECKSUM="N"
CHECKSUM="U"
CHECKSUM="
I want to replace empty field with "Null" and need output as :
CHECKSUM="Y"
CHECKSUM="N"
CHECKSUM="U"
CHECKSUM="Null"
What I can think of as :
#First find the matching content
cat file.txt | egrep 'CHECKSUM="Y"|CHECKSUM="N"|CHECKSUM="U"' > file_contain.txt
# Find the content where given string are not there
cat file.txt | egrep -v 'CHECKSUM="Y"|CHECKSUM="N"|CHECKSUM="U"' > file_donot_contain.txt
# Replace the string in content not found file
sed -i 's/CHECKSUM="/CHECKSUM="Null"/g' file_donot_contain.txt
# Merge the files
cat file_contain.txt file_donot_contain.txt > output.txt
But I find this is not efficient way of doing. Any other suggestion ?

To achieve this you need to mark that this is the end of the line, not just part of it, using $ (And optionally ^ to mark the start of the line too):
sed -i s'/^CHECKSUM="$/CHECKSUM="Null"/' file.txt

Related

Unix sed command - global replacement is not working

I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.

How to get values in a line while looping line by line in a file (shell script)

I have a file which looks like this (file.txt)
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
I want to loop trough this line by line an extract the key values.
so the result should be like ,
AJGUIGIDH568
AJGUIGIDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
So I wrote a code like this to loop line by line and extract the value between {"key":" and ","rule": because key values is in between these 2 patterns.
while read p; do
echo $p | sed -n "/{"key":"/,/","rule":,/p"
done < file.txt
But this is not working. can someone help me to figure out me this. Thanks in advance.
Your sample input is almost valid json. You could tweak it to make it valid and then extract the values with jq with something like:
sed -e 's/squid/"squid/' -e 's/$/"}/' file.txt | jq -r .key
Or, if your actual input really is valid json, then just use jq:
jq -r .key file.txt
If the "random-txt" may include double quotes, making it difficult to massage the input to make it valid json, perhaps you want something like:
awk '{print $4}' FS='"' file.txt
or
sed -n '/{"key":"\([^"]*\).*/s//\1/p' file.txt
or
while IFS=\" read open_brace key colon val _; do echo "$val"; done < file.txt
For the shown data, you can try this awk:
awk -F '"[:,]"' '{print $2}' file
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
With the give example you can simple use
cut -d'"' -f4 file.txt
Assumptions:
there may be other lines in the file so we need to focus on just the lines with "key" and "rule"
the only text between "key" and "rule" is the desired string (eg, squid never shows up between the two patterns of interest)
Adding some additional lines:
$ cat file.txt
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
ignore this line}
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
ignore this line}
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
One sed idea:
$ sed -nE 's/^(.*"key":")([^"]*)(","rule".*)$/\2/p' file.txt
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
Where:
-E - enable extended regex support (and capture groups without need to escape sequences)
-n - suppress printing of pattern space
^(.*"key":") - [1st capture group] everything from start of line up to and including "key":"
([^"]*) - [2nd capture group] everything that is not a double quote (")
(","rule".*)$ - [3rd capture group] everything from ",rule" to end of line
\2/p - replace the line with the contents of the 2nd capture group and print

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Multi-line grep with positive and negative filtering

I need to grep for a multi-line string that doesn't include one string, but does include others. This is what I'm searching for in some HTML files:
<not-this>
<this> . . . </this>
</not-this>
In other words, I want to find files that contain <this> and </this> on the same line, but should not be surrounded by html tags <not-this> on the lines before and/or after. Here is some shorthand logic for what I want to do:
grep 'this' && '/this' && !('not-this')
I've seen answers with the following...
grep -Er -C 2 '.*this.*this.*' . | grep -Ev 'not-this'
...but this just erases the line(s) containing the "not" portion, and displays the other lines. What I'd like is for it to not pull those results at all if "not-this" is found within a line or two of "this".
Is there a way to accomplish this?
P.S. I'm using Ubuntu and gnome-terminal.
It sounds like an awk script might work better here:
$ cat input.txt
<not-this>
<this>BAD! DO NOT PRINT!</this>
</not-this>
<yes-this>
<this>YES! PRINT ME!</this>
</yes-this>
$ cat not-this.awk
BEGIN {
notThis=0
}
/<not-this>/ {notThis=1}
/<\/not-this>/ {notThis=0}
/<this>.*<\/this>/ {if (notThis==0) print}
$ awk -f not-this.awk input.txt
<this>YES! PRINT ME!</this>
Or, if you'd prefer, you can squeeze this awk script onto one long line:
$ awk 'BEGIN {notThis=0} /<not-this>/ {notThis=1} /<\/not-this>/ {notThis=0} /<this>.*<\/this>/ {if (notThis==0) print}' input.txt

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources