How do i get the value present in first double quotes? - bash

I'm currently writing a bash script to get the first value among the many comma separated strings.
I have a file that looks like this -
things: "water bottle","40","new phone cover",10
I just need to return the value in first double quotes.
water bottle
The value in first double quotes can be one word/two words. That is, water bottle can be sometimes replaced with pen.
I tried -
awk '/:/ {print $2}'
But this just gives
I wanted to comma separate it, but there's colon(:) after things. So, I'm not sure how to separate it.
How do i get the value present in first double quotes?
I used the below code since I particularly wanted to use awk -
awk '/:/' test.txt | cut -d\" -f2

A solution using the cut utility could be
cut -d\" -f2 infile > outfile

Using gnu awk you could make use of a capture group, and use a negated character class to not cross the , as that is the field delimiter.
awk 'match($0, /^[^",:]*:[^",]*"([^"]*)"/, a) {print a[1]}' file
water bottle
The pattern matches
^ Start of string
[^",:]*:Optionally match any value except " and , and :, then match :
[^",]* Optionally match any value except " and ,
"([^"]*)" Capture in group 1 the value between double quotes
If the value is always between double quotes, a short option to get the desired result could be setting the field separator to " and check if group 1 contains a colon, although technically you can also get water bottle if there is only a leading double quote and not closing one.
awk -F'"' '$1 ~ /:/ {print $2}' file

With your shown samples, please try following awk code.
awk '/^things:/ && match($0,/"[^"]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Explanation: In awk program checking if line starts with things: AND using match function to match everything between 1st and 2nd " and printing them accordingly.

Solution 1: awk
You can use a single awk command:
awk -F\" 'index($1, ":"){print $2}' test.txt > outfile
See the online demo.
The -F\" sets the field separator to a " char, index($1, ":") condition makes sure Field 1 contains a : char (no regex needed) and then {print $2} prints the second field value.
Solution 2: awk + cut
You can use awk + cut:
awk '/:/' test.txt | cut -d\" -f2 > outfile
With awk '/:/' test.txt, you will extract line(s) containing : char, and then the piped cut -d\" -f2 command will split the string with " as a separator and return the second item. See the online demo.
Solution 3: sed
Alternatively, you can use sed:
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' file > outfile
See the online demo:
things: "water bottle","40","new phone cover",10
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' <<< "$s"
# => water bottle
The command means
-n - the option suppresses the default line output
^[^"]*"\([^"]*\)".* - a POSIX BRE regex pattern that matches
^ - start of string
[^"]* - zero or more chars other than "
" - a " char
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
".* - a " char and the rest of the string.
\1 replaces the match with Group 1 value
p - only prints the result of a successful substitution.


how to iterate over awk result

I have the following string that I want to retrieve a specific ID for eu-central-1 only:
so what I want as an output is: ami-bbbb
The way I am doing it right now is:
echo a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk -F',' '{ print $2 }' |
awk -F':' '{print $2}'
The problem with this approach is that I am explicity specifying that eu-central-1 is the second ($2) result for the first awk call, but sometimes they might in different order, so I might need to iterate over this result. Is it possible to achieve this in one line, and without knowing before hand in which place in the string eu-central-1:ami-bbbb will land?
Use grep like so:
echo your_string | grep -Po '\beu-central-1:\K[^,]+'
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
\b : Word boundary.
\K : Pretend that the match starts at this point. Specifically, ignore the preceding part of the regex when printing the match.
[^,]+ : Any characters that are not a comma, one or more occurrences.
grep manual
I'd prefer grep as in Timur Shtatland's answer. But for completeness here is an alternative:
You can set awk's record separator (linebreak by default) and then only print that record starting with eu-central-1.
awk -F: -v RS=, '$1 == "eu-central-1" { print $2 }'
With GNU sed or OSX/BSD sed for -E:
$ sed -E 's/(^|.*,)eu-central-1:([^,]*).*/\2/' file
One sed idea:
# desired id in middle of input string:
echo 'a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd' | \
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p"
-En - enable extended regex support
^(.*,)* - [capture group #1] - matches start of line plus zero or more instances of characters ending with a comma (,)
^(.*,)*${id}: - capture group #1 followed by ${id} + :
([^,]*) - [capture group #2] - matches everything up to, but not including, the next comma (,)
(,.*)*$ - [capture group #3] - matches zero or more instances of comma followed by other characters to end of line
\2/p - print capture group #2
Alternatively, using a here-string to eliminate the pipe/sub-process call:
# desired id at start of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'eu-central-1:ami-bbbb,a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd'
# desired id at end of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd,eu-central-1:ami-bbbb'
All three generate:
Defining , as line (record) separator and : as field separator, a simple condition over $1 prints the result.
echo -n a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk 'BEGIN{RS=","; FS=":"}$1=="eu-central-1"{print $2}'

Bash + sed/awk/cut to delete nth character

I trying to delete 6,7 and 8th character for each line.
Below is the file containing text format.
Actual output..
#cat test
Expecting below, after formatting.
#cat test
Even I tried with below , no luck
#awk -F ":" '{print $1":"$2","$3}' test
#sed 's/^\(.\{7\}\).\(.*\)/\1\2/' test { Here I can remove only one character }
Even with cut also failed
#cut -d ":" -f1,2,3 test
Need to delete character in each line like 6th , 7th , 8th
Suggestion please
With GNU cut you can use the --complement switch to remove characters 6 to 8:
cut --complement -c6-8 file
Otherwise, you can just select the rest of the characters yourself:
cut -c1-5,9- file
i.e. characters 1 to 5, then 9 to the end of each line.
With awk you could use substrings:
awk '{ print substr($0, 1, 5) substr($0, 9) }' file
Or you could write a regular expression, but the result will be more complex.
For example, to remove the last three characters from the first comma-separated field:
awk -F, -v OFS=, '{ sub(/...$/, "", $1) } 1' file
Or, using sed with a capture group:
sed -E 's/(.{5}).{3}/\1/' file
Capture the first 5 characters and use them in the replacement, dropping the next 3.
it's a structured text, why count the chars if you can describe them?
$ awk '{sub(":..,",",")}1' file
remove the seconds.
The solutions below are generic and assume no knowledge of any format. They just delete character 6,7 and 8 of any line.
sed 's/.//8;s/.//7;s/.//6' <file> # from high to low
sed 's/.//6;s/.//6;s/.//6' <file> # from low to high (subtract 1)
sed 's/\(.....\).../\1/' <file>
sed 's/\(.{5}\).../\1/' <file>
s/BRE/replacement/n :: substitute nth occurrence of BRE with replacement
awk 'BEGIN{OFS=FS=""}{$6=$7=$8="";print $0}' <file>
awk -F "" '{OFS=$6=$7=$8="";print}' <file>
awk -F "" '{OFS=$6=$7=$8=""}1' <file>
This is 3 times the same, removing the field separator FS let awk assume a field to be a character. We empty field 6,7 and 8, and reprint the line with an output field separator OFS which is empty.
cut -c -5,9- <file>
cut --complement -c 6-8 <file>
Just for fun, perl, where you can assign to a substring
perl -pe 'substr($_,5,3)=""' file
With awk :
echo "18:40:12,,UP" | awk '{ $0 = ( substr($0,1,5) substr($0,9) ) ; print $0}'
If you are running on bash, you can use the string manipulation functionality of it instead of having to call awk, sed, cut or whatever binary:
while read STRING
echo ${STRING:0:5}${STRING:9}
done < myfile.txt
${STRING:0:5} represents the first five characters of your string, ${STRING:9} represents the 9th character and all remaining characters until the end of the line. This way you cut out characters 6,7 and 8 ...

print 1st string of a line if last 5 strings match input

I have a requirement to print the first string of a line if last 5 strings match specific input.
Example: Specified input is 2
Expected Output:
As you can see, China is excluded as it doesn't meet the requirement (last 5 digits have to be matched with the input).
grep ';2;2;2;2;2$' file | cut -d';' -f1
$ in a regex stands for "end of line", so grep will print all the lines that end in the given string
-d';' tells cut to delimit columns by semicolons
-f1 outputs the first column
You could use awk:
awk -F';' -v v="2" -v count=5 '
if($i == v) c++
if(c>=count){print $1;next}
}' file
v is the value to match
count is the maximum number of value to print the wanted string
the for loop is parsing all fields delimited with a ; in order to find a match
This script doesn't need the 5 values 2 to be consecutive.
With sed:
sed -n 's/^\([^;]*\).*;2;2;2;2;2$/\1/p' file
It captures and output non ; first characters in lines ending with ;2;2;2;2;2
It can be shortened with GNU sed to:
sed -nE 's/^([^;]*).*(;2){5}$/\1/p' file
awk -F\; '/;2;2;2;2;2$/{print $1}' file

grep - how to display another word instead of the matching of grep

Given input like:
technique lol
technology case
london knife
ocean sky
I'm currently using
grep -Eo '^[^ ]+' FILE | grep "tech"
for match every word which contain "tech" in the ID column.
In this case, it display :
However does anyone can tell me how can I display the word from the second column regarding the word matching in the first column ?
For example how to display the word:
(display the value instead the key)
Also, how can I display the key (as above) and the value separate by "=" like ? (without any spaces):
You can grep for lines starting with "tech" and then just display the second column. The exact format depends on how your input file columns are separated. If they are tab separated:
grep '^tech' FILE | cut -f 2
If they are space separated:
grep '^tech' FILE | tr -s ' ' $'\t' | cut -f 2
This "squeezes" repeated spaces and replaces them with a single tab character.
For your second question, you can use
sed -n '/^tech/ s/[[:space:]]\+/=/p' FILE
This means "don't print (-n); on lines matching ^tech, make the substitution and print".
Using awk:
awk '$1 ~ "tech" {print $2}' < inputfile
or with key=value
awk '$1 ~ "tech" {print $1"="$2}' < inputfile

Trim leading and trailing spaces from a string in awk

I'm trying to remove leading and trailing space in 2nd column of the below input.txt:
Name, Order  
Trim, working
I have used the below awk to remove leading and trailing space in 2nd column but it is not working. What am I missing?
awk -F, '{$2=$2};1' input.txt
This gives the output as:
Name, Order  
Trim, working
Leading and trailing spaces are not removed.
If you want to trim all spaces, only in lines that have a comma, and use awk, then the following will work for you:
awk -F, '/,/{gsub(/ /, "", $0); print} ' input.txt
If you only want to remove spaces in the second column, change the expression to
awk -F, '/,/{gsub(/ /, "", $2); print$1","$2} ' input.txt
Note that gsub substitutes the character in // with the second expression, in the variable that is the third parameter - and does so in-place - in other words, when it's done, the $0 (or $2) has been modified.
Full explanation:
-F, use comma as field separator
(so the thing before the first comma is $1, etc)
/,/ operate only on lines with a comma
(this means empty lines are skipped)
gsub(a,b,c) match the regular expression a, replace it with b,
and do all this with the contents of c
print$1","$2 print the contents of field 1, a comma, then field 2
input.txt use input.txt as the source of lines to process
EDIT I want to point out that #BMW's solution is better, as it actually trims only leading and trailing spaces with two successive gsub commands. Whilst giving credit I will give an explanation of how it works.
gsub(/^[ \t]+/,"",$2); - starting at the beginning (^) replace all (+ = zero or more, greedy)
consecutive tabs and spaces with an empty string
gsub(/[ \t]+$/,"",$2)} - do the same, but now for all space up to the end of string ($)
1 - ="true". Shorthand for "use default action", which is print $0
- that is, print the entire (modified) line
remove leading and trailing white space in 2nd column
awk 'BEGIN{FS=OFS=","}{gsub(/^[ \t]+/,"",$2);gsub(/[ \t]+$/,"",$2)}1' input.txt
another way by one gsub:
awk 'BEGIN{FS=OFS=","} {gsub(/^[ \t]+|[ \t]+$/, "", $2)}1' infile
Warning by #Geoff: see my note below, only one of the suggestions in this answer works (though on both columns).
I would use sed:
sed 's/, /,/' input.txt
This will remove on leading space after the , .
More general might be the following, it will remove possibly multiple spaces and/or tabs after the ,:
sed 's/,[ \t]\?/,/g' input.txt
It will also work with more than two columns because of the global modifier /g
#Floris asked in discussion for a solution that removes trailing and and ending whitespaces in each colum (even the first and last) while not removing white spaces in the middle of a column:
sed 's/[ \t]\?,[ \t]\?/,/g; s/^[ \t]\+//g; s/[ \t]\+$//g' input.txt
*EDIT by #Geoff, I've appended the input file name to this one, and now it only removes all leading & trailing spaces (though from both columns). The other suggestions within this answer don't work. But try: " Multiple spaces , and 2 spaces before here " *
IMO sed is the optimal tool for this job. However, here comes a solution with awk because you've asked for that:
awk -F', ' '{printf "%s,%s\n", $1, $2}' input.txt
Another simple solution that comes in mind to remove all whitespaces is tr -d:
cat input.txt | tr -d ' '
I just came across this. The correct answer is:
awk 'BEGIN{FS=OFS=","} {gsub(/^[[:space:]]+|[[:space:]]+$/,"",$2)} 1'
just use a regex as a separator:
', *' - for leading spaces
' *,' - for trailing spaces
for both leading and trailing:
awk -F' *,? *' '{print $1","$2}' input.txt
Simplest solution is probably to use tr
$ cat -A input
^I Name, ^IOrder $
Trim, working $
$ tr -d '[:blank:]' < input | cat -A
The following seems to work:
awk -F',[[:blank:]]*' '{$2=$2}1' OFS="," input.txt
If it is safe to assume only one set of spaces in column two (which is the original example):
awk '{print $1$2}' /tmp/input.txt
Adding another field, e.g. awk '{print $1$2$3}' /tmp/input.txt will catch two sets of spaces (up to three words in column two), and won't break if there are fewer.
If you have an indeterminate (large) number of space delimited words, I'd use one of the previous suggestions, otherwise this solution is the easiest you'll find using awk.
