extract words matching a pattern and print character length - bash

I have a test file which looks like this
file.txt
this is a smart boy "abc.smartxyz" is the name
what you in life doesn;t matter
abc.smartabc is here to help you.
where is the joy of life
life is joyous at "https://abc.smart/strings"
grep 'abc.smart' file.txt
this is a smart boy "abc.smartxyz" is the name
abc.smartabc is here to help you.
life is joyous at "https://abc.smart/strings"
Now I want to be able to extract all words that have the string abc.smart from this grepped file and also print out how many characters they are. Output I am after is something like
"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27
Please can someone help with this.

With awk
awk '{for (i=1;i<=NF;i++) if ($i~/abc.smart/) print $i,length($i)}' file
You can run it directly on the first file. Output:
"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27

This might work for you (GNU grep and sed):
grep -o '\S*abc\.smart\S*' file | sed 's/"/\\"/g;s/.*/echo "& $(expr length &)"/e'
Use grep to output words containing abc.smart and sed to evaluate each string to calculate its length using the bash expr command.

Related

How to extract multiple fields with specific character lengths in Bash?

I have a file (test.csv) with a few fields and what I wanted is the Title and Path with 10 character for the title and remove a few levels from the path. What have done is use the awk command to pick two fields:
$ awk -F "," '{print substr($4, 1, 10)","$6}' test.csv [1]
The three levels in the path need to be removed are not always the same. It can be /article/17/1/ or this /open-organization/17/1 so I can't use the substr for field $6.
Here the result I have:
Title,Path
Be the ope,/article/17/1/be-open-source-supply-chain
Developing,/open-organization/17/1/developing-open-leaders
Wanted result would be:
Title,Path
Be the ope,be-open-source-supply-chain
Developing,developing-open-leaders
The title is ok with 10 characters but I still need to remove 3 levels off the path.
I could use the cut command:
cut -d'/' -f5- to remove the "/.../17/1/"
But not sure how this can be piped to the [1]
I tried to use a for loop to get the title and the path one by one by but I have difficulty in getting the awk command to run one line at time.
I have spent hours on this with no luck. Any help would be appreciated.
Dummy Data for testing:
test.csv
Post date,Content type,Author,Title,Comment count,Path,Tags,Word count
31 Jan 2017,Article,Scott Nesbitt,Book review: Ours to Hack and to Own,0,/article/17/1/review-book-ours-to-hack-and-own,Books,660
31 Jan 2017,Article,Jason Baker,5 new guides for working with OpenStack,2,/article/17/1/openstack-tutorials,"OpenStack, How-tos and tutorials",419
you can replace the string by using regex.
stringZ="Be the ope,/article/17/1/be-open-source-supply-chain"
sed -E "s/((\\/\\w+){3}\\/)//" <<< $stringZ
note that you need to use -i if you are going to give file as input to sed

shell: how to cut first n fields from second column

please help to delete first 9 fields from column number 2 (delimeter space) from a file like below.i tried using cut & awk but didnt meet required output.
my input file
appu 11062017-10:00
ammu 11062017-11:00
anna 11062017-12:00
required output..
appu 10:00
ammu 11:00
anna 12:00
please note that the fields 11062017- (date) will not be same, but digit length (9 characters) will be same.
please help me with the command.
Using awk and sub to replace 9 first chars of $2 with an empty string:
$ awk '{sub(/.{9}/,"",$2)}1' file
appu 10:00
ammu 11:00
anna 12:00
As per #anubhava's comments. It can be implemented via awk with substr.
awk -F " " '{print $1,substr($2,10)}' my_input_file.txt
An alternate awk solution
awk '{ split($2,arry,"-");print $1" "arry[2] }' filename
Here we make use of the "-" delimiter to attain the data we need
Try to use
sed -i 's/\ .*-/\ /g' file
sed remove all characters between space and -
Another way with sed
's/\([[:digit:]]\{8\}\-\)//' filename
The regex [[:digit:]]{8}- would match 8 digits followed by a hyphen.

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

sed & awk, second column modifications

I've got a file that I need to make some simple modifications to. Normally, I wouldn't have an issue, however the columns are nearly identical which throws me off.
Some examples:
net_192.168.0.64_26 192.168.0.64_26
net_192.168.0.128-26 192.168.0.128-26
etc
Now, normally in a stream I'd just modify the second column, however I need to write this to a file which confuses me.
The following string does what I need it do to but then I lose visibility to the first column, and can't pipe it somewhere useful:
cat file.txt | awk '{print $2}' | sed 's/1_//g;s/2_//g;s/1-//g;s/2-//g;s/_/\ /g;s/-/\ /g' | egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}'
Output needs to look like (subnet becomes the 3rd column):
net_192.168.0.64_26 192.168.0.64 26
net_192.168.0.128-26 192.168.0.128 26
How do I do what the above line does, while keeping both columns visible so I can pipe them to a new file/modify the old etc.
Thanks!
try this, if it is ok for you:
awk '{gsub(/[_-]/," ",$2)}1' file
test with your example text:
kent$ echo "net_192.168.0.64_26 192.168.0.64_26
net_192.168.0.128-26 192.168.0.128-26"|awk '{gsub(/[_-]/," ",$2)}1'
net_192.168.0.64_26 192.168.0.64 26
net_192.168.0.128-26 192.168.0.128 26
If you just want to replace the characters _,- with a single space from the second field then:
$ awk '{gsub(/[-_]/," ",$2)}1' file
net_192.168.0.64_26 192.168.0.64 26
net_192.168.0.128-26 192.168.0.128 26
And a sed version:
sed 's/\(.*\)[-_]/\1 /' file

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

Resources