sed & awk, second column modifications - bash

I've got a file that I need to make some simple modifications to. Normally, I wouldn't have an issue, however the columns are nearly identical which throws me off.
Some examples:
net_192.168.0.64_26 192.168.0.64_26
net_192.168.0.128-26 192.168.0.128-26
etc
Now, normally in a stream I'd just modify the second column, however I need to write this to a file which confuses me.
The following string does what I need it do to but then I lose visibility to the first column, and can't pipe it somewhere useful:
cat file.txt | awk '{print $2}' | sed 's/1_//g;s/2_//g;s/1-//g;s/2-//g;s/_/\ /g;s/-/\ /g' | egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}'
Output needs to look like (subnet becomes the 3rd column):
net_192.168.0.64_26 192.168.0.64 26
net_192.168.0.128-26 192.168.0.128 26
How do I do what the above line does, while keeping both columns visible so I can pipe them to a new file/modify the old etc.
Thanks!

try this, if it is ok for you:
awk '{gsub(/[_-]/," ",$2)}1' file
test with your example text:
kent$ echo "net_192.168.0.64_26 192.168.0.64_26
net_192.168.0.128-26 192.168.0.128-26"|awk '{gsub(/[_-]/," ",$2)}1'
net_192.168.0.64_26 192.168.0.64 26
net_192.168.0.128-26 192.168.0.128 26

If you just want to replace the characters _,- with a single space from the second field then:
$ awk '{gsub(/[-_]/," ",$2)}1' file
net_192.168.0.64_26 192.168.0.64 26
net_192.168.0.128-26 192.168.0.128 26

And a sed version:
sed 's/\(.*\)[-_]/\1 /' file

Related

Bash simple pattern extractor on text

I'm stuck on a simple problem of finding a pattern in a string. I've never been comfortable with sed or regex in general.
I'm trying to get the number in the second column in one variable, and the number in the third column in another variable. The numbers are separated by tabs:
Here's what I have now :
while read line
do
middle="$(echo "$line" | sed 's/([0-9]+)\t\([0-9]+\)\t([0-9]+)\\.([0-9]+)/\1/')"
last="$(echo "$line" | sed 's/([0-9]+)\t([0-9]+)\t\([0-9]+)\\.([0-9]+\)/\1/')"
done
Here is the text :
11 1545 0.026666
12 1633 0.025444
13 1597 0.026424
14 1459 0.025634
I know there are simpler tools than 'sed', so feel free to put them to me in response.
Thanks.
This functionality is built into read.
while read first second third more; do
…
done
By default, read splits its input into whitespace-separated fields. Each variable receives one field, except the last one which receives whatever remains on the line. This matches your requirement provided there aren't any empty columns.
Use AWK to save yourself:
while read line
do
middle="$(awk '{print $2}' <<< "$line")"
last="$(awk '{print $3}' <<< "$line")"
done

Transforming a field in a csv file and resaving to another file with bash [duplicate]

This question already has answers here:
How can I change a certain field of a file into upper-case using awk?
(2 answers)
Closed last year.
I apologize in advance if this seems like a simple question. However, I am a beginner in bash commands and scripting, so I hope you all understand why I am not able to solve this on my own.
What I want to achieve is to change the values in one field of a csv file to uppercase, and then resave the csv file with the transformed field and all the other fields included, each retaining their index.
For instance, I have this csv:
1,Jun 4 2021,car,4856
2,Jul 31 2021,car,4154
3,Aug 14 2021,bus,4070
4,Aug 2 2021,car,4095
I want to transform the third field that holds the vehicle type into uppercase - CAR, BUS, etc. and then resave the csv file with the transformed field.
I have tried using the 'tr' command thus:
cut -d"," -f4 data.csv | tr '[:lower:]' '[:upper:]'
This takes the field and does the transformation. But how do I paste and replace the column in the csv file?
It did not work because the field argument cannot be passed into the tr command.
With GNU awk:
awk -i inplace 'BEGIN{FS=","; OFS=","} {$3=toupper($3)} {print}' file
Output to file:
1,Jun 4 2021,CAR,4856
2,Jul 31 2021,CAR,4154
3,Aug 14 2021,BUS,4070
4,Aug 2 2021,CAR,4095
See: How can I change a certain field of a file into upper-case using awk?, Save modifications in place with awk and 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
A gnu sed solution:
sed -i -E 's/^(([^,]+,){2})([^,]+)/\1\U\3/' file.csv
cat file
1,Jun 4 2021,CAR,4856
2,Jul 31 2021,CAR,4154
3,Aug 14 2021,BUS,4070
4,Aug 2 2021,CAR,4095
Explanation:
^: Start
(([^,]+,){2}): Match first 2 fields and capture them in group #1
([^,]+): Match 3rd field and capture it in group #3
\1: Put capture value of group #1 back in replacement
\U\3: Put uppercase capture value of group #3 back in replacement
Or a gnu-awk solution:
awk -i inplace 'BEGIN {FS=OFS=","} {$3 = toupper($3)} 1' file.csv
Using the cut and tr, you need to add paste to the mix.
SEP=","
IN="data.csv"
paste -d$SEP \
<( <$IN cut -d$SEP -f1,2 ) \
<( <$IN cut -d$SEP -f3 | tr '[:lower:]' '[:upper:]' ) \
<( <$IN cut -d$SEP -f4 )
I did factor out the repeating things - separator and input file - into variables SEP and IN respectively.
How it all works:
get the untransformed columns before #3
get col #3 and transform it with tr
get the remaining columns
paste it all together, line by line
the need for intermediate files is avoided by using shell substitution
Downsides:
the data seems to be read 3 times, but disk cache will help a lot
the data is parsed 3 times, for sure (by cut)
but unless your input is a few gigabytes, this does not matter

extract words matching a pattern and print character length

I have a test file which looks like this
file.txt
this is a smart boy "abc.smartxyz" is the name
what you in life doesn;t matter
abc.smartabc is here to help you.
where is the joy of life
life is joyous at "https://abc.smart/strings"
grep 'abc.smart' file.txt
this is a smart boy "abc.smartxyz" is the name
abc.smartabc is here to help you.
life is joyous at "https://abc.smart/strings"
Now I want to be able to extract all words that have the string abc.smart from this grepped file and also print out how many characters they are. Output I am after is something like
"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27
Please can someone help with this.
With awk
awk '{for (i=1;i<=NF;i++) if ($i~/abc.smart/) print $i,length($i)}' file
You can run it directly on the first file. Output:
"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27
This might work for you (GNU grep and sed):
grep -o '\S*abc\.smart\S*' file | sed 's/"/\\"/g;s/.*/echo "& $(expr length &)"/e'
Use grep to output words containing abc.smart and sed to evaluate each string to calculate its length using the bash expr command.

How to extract multiple fields with specific character lengths in Bash?

I have a file (test.csv) with a few fields and what I wanted is the Title and Path with 10 character for the title and remove a few levels from the path. What have done is use the awk command to pick two fields:
$ awk -F "," '{print substr($4, 1, 10)","$6}' test.csv [1]
The three levels in the path need to be removed are not always the same. It can be /article/17/1/ or this /open-organization/17/1 so I can't use the substr for field $6.
Here the result I have:
Title,Path
Be the ope,/article/17/1/be-open-source-supply-chain
Developing,/open-organization/17/1/developing-open-leaders
Wanted result would be:
Title,Path
Be the ope,be-open-source-supply-chain
Developing,developing-open-leaders
The title is ok with 10 characters but I still need to remove 3 levels off the path.
I could use the cut command:
cut -d'/' -f5- to remove the "/.../17/1/"
But not sure how this can be piped to the [1]
I tried to use a for loop to get the title and the path one by one by but I have difficulty in getting the awk command to run one line at time.
I have spent hours on this with no luck. Any help would be appreciated.
Dummy Data for testing:
test.csv
Post date,Content type,Author,Title,Comment count,Path,Tags,Word count
31 Jan 2017,Article,Scott Nesbitt,Book review: Ours to Hack and to Own,0,/article/17/1/review-book-ours-to-hack-and-own,Books,660
31 Jan 2017,Article,Jason Baker,5 new guides for working with OpenStack,2,/article/17/1/openstack-tutorials,"OpenStack, How-tos and tutorials",419
you can replace the string by using regex.
stringZ="Be the ope,/article/17/1/be-open-source-supply-chain"
sed -E "s/((\\/\\w+){3}\\/)//" <<< $stringZ
note that you need to use -i if you are going to give file as input to sed

Unix Shell scripting in AIX(Sed command)

I have a text file which consists of jobname,business name and time in min seperated with '-'(SfdcDataGovSeq-IntegraterJob-43).There are many jobs in this text file. I want to search with the jobname and change the time from 43 to 0 only for that particular row and update the same text file. Kindly advise what needs to be done.
Query that i am using : (cat test.txt | grep "SfdcDataGovSeq" | sed -e 's/43/0/' > test.txt) but the whole file is getting replaced with only one line.
sed -e '/SfdcDataGovSeq/ s/43/0/' test.txt
This will only replace if the search is positive.
Agreed with Ed, Here is a workaround to put word boundaries Although Equality with awk is robust.
sed -e '/SfdcDataGovSeq/ s/\<43\>/0/g' test.txt
You should be using awk instead of sed:
awk 'BEGIN{FS=OFS="-"} $1=="SfdcDataGovSeq" && $3==43{$3=0} 1' file
Since it does full string or numeric (not regexp) matches on specific fields, the above is far more robust than the currently accepted sed answer which would wreak havoc on your input file given various possible input values.

Resources