grep - how to display another word instead of the matching of grep - shell

Given input like:
ID VALUE
technique lol
technology case
london knife
ocean sky
I'm currently using
grep -Eo '^[^ ]+' FILE | grep "tech"
for match every word which contain "tech" in the ID column.
In this case, it display :
technique
technology
However does anyone can tell me how can I display the word from the second column regarding the word matching in the first column ?
For example how to display the word:
lol
case
(display the value instead the key)
Also, how can I display the key (as above) and the value separate by "=" like ? (without any spaces):
key=value
Thanks

You can grep for lines starting with "tech" and then just display the second column. The exact format depends on how your input file columns are separated. If they are tab separated:
grep '^tech' FILE | cut -f 2
If they are space separated:
grep '^tech' FILE | tr -s ' ' $'\t' | cut -f 2
This "squeezes" repeated spaces and replaces them with a single tab character.
For your second question, you can use
sed -n '/^tech/ s/[[:space:]]\+/=/p' FILE
This means "don't print (-n); on lines matching ^tech, make the substitution and print".

Using awk:
awk '$1 ~ "tech" {print $2}' < inputfile
or with key=value
awk '$1 ~ "tech" {print $1"="$2}' < inputfile

Related

Remove starting substring http from strings using AWK?

I'm wondering Is there a better and cleaner way to remove strings at beginning and last of each line in a file using AWK only?
Here's what I got so far
cat results.txt | awk '{gsub("https://", "") ;print}' | tr -d ":443"
File: results.txt
https://www.google.com:443
https://www.tiktok.com:443
https://www.instagram.com:443
To get the result
www.google.com
www.tiktok.com
www.instagram.com
With GNU awk.
Use / and : as field separators and print fourth column:
awk -F '[/:]' '{print $4}' results.txt
Or use https:// and : as field separators and print second column:
awk -F 'https://|:' '{print $2}' results.txt
Output:
www.google.com
www.tiktok.com
www.instagram.com
If it's a list of URLs like that, you could take advantage of the fact that the field separator in awk can be a regular expression:
awk -F':(//)?' '{print $2}'
This says that your field seperator is ": optionally followed by //", which would split each line into:
[$1] http
[$2] www.google.com
[$3] 443
And then we print out only field $2.
cat results.txt | awk '{gsub("https://", "") ;print}' | tr -d ":443"
I think you are misunderstading what tr -d does, it is used to delete enumerated characters (not substring), it does seems to do what you want because your test input
https://www.google.com:443
https://www.tiktok.com:443
https://www.instagram.com:443
do not contain : or 4 or 3 which should be kept, if you need test case which will shown malfunction try
https://www.normandy1944.info:443
Also code as above feature anti-pattern known as useless use of cat as GNU AWK can deal with file on its' own that is
cat results.txt | awk '{gsub("https://", "") ;print}'
can be written more succintly as
awk '{gsub("https://", "") ;print}' results.txt
I would rewrite whole your code (cat,awk,tr) to single awk as follows
awk '{gsub("^https://|:443$","");print}' results.txt
Explanation: replace https:// following start of line (^) or (|) :443 before end of line ($) using empty string (i.e. delete these parts) then print. Note that ^ and $ will prevent deleting https:// and :443 in middle of strings, though feel free to remove ^ and $ if you find these to be unlikely.

how to use cut command -f flag as reverse

This is a text file called a.txt
ok.google.com
abc.google.com
I want to select every subdomain separately
cat a.txt | cut -d "." -f1 (it select ok From left side)
cat a.txt | cut -d "." -f2 (it select google from left side)
Is there any way, so I can get result from right side
cat a.txt | cut (so it can select com From right side)
There could be few ways to do this, one way which I could think of right now could be using rev + cut + rev solution. Which will reverse the input by rev command and then set field separator as . and print fields as per they are from left to right(but actually they are reversed because of the use of rev), then pass this output to rev again to get it in its actual order.
rev Input_file | cut -d'.' -f 1 | rev
You can use awk to print the last field:
awk -F. '{print $NF}' a.txt
-F. sets the record separator to "."
$NF is the last field
And you can give your file directly as an argument, so you can avoid the famous "Useless use of cat"
For other fields, but counting from the last, you can use expressions as suggested in the comment by #sundeep or described in the users's guide under
4.3 Nonconstant Field Numbers. For example, to get the domain, before the TLD, you can substract 1 from the Number of Fields NF :
awk -F. '{ print $(NF-1) }' a.txt
You might use sed with a quantifier for the grouped value repeated till the end of the string.
( Start group
\.[^[:space:].]+ Match 1 dot and 1+ occurrences of any char except a space or dot
){1} Close the group followed by a quantifier
$ End of string
Example
sed -E 's/(\.[^[:space:].]+){1}$//' file
Output
ok.google
abc.google
If the quantifier is {2} the output will be
ok
abc
Depending on what you want to do after getting the values then you could use bash for splitting your domain into an array of its components:
#!/bin/bash
IFS=. read -ra comps <<< "ok.google.com"
echo "${comps[-2]}"
# or for bash < 4.2
echo "${comps[${#comps[#]}-2]}"
google

How do i get the value present in first double quotes?

I'm currently writing a bash script to get the first value among the many comma separated strings.
I have a file that looks like this -
name
things: "water bottle","40","new phone cover",10
place
I just need to return the value in first double quotes.
water bottle
The value in first double quotes can be one word/two words. That is, water bottle can be sometimes replaced with pen.
I tried -
awk '/:/ {print $2}'
But this just gives
water
I wanted to comma separate it, but there's colon(:) after things. So, I'm not sure how to separate it.
How do i get the value present in first double quotes?
EDIT:
SOLUTION:
I used the below code since I particularly wanted to use awk -
awk '/:/' test.txt | cut -d\" -f2
A solution using the cut utility could be
cut -d\" -f2 infile > outfile
Using gnu awk you could make use of a capture group, and use a negated character class to not cross the , as that is the field delimiter.
awk 'match($0, /^[^",:]*:[^",]*"([^"]*)"/, a) {print a[1]}' file
Output
water bottle
The pattern matches
^ Start of string
[^",:]*:Optionally match any value except " and , and :, then match :
[^",]* Optionally match any value except " and ,
"([^"]*)" Capture in group 1 the value between double quotes
If the value is always between double quotes, a short option to get the desired result could be setting the field separator to " and check if group 1 contains a colon, although technically you can also get water bottle if there is only a leading double quote and not closing one.
awk -F'"' '$1 ~ /:/ {print $2}' file
With your shown samples, please try following awk code.
awk '/^things:/ && match($0,/"[^"]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Explanation: In awk program checking if line starts with things: AND using match function to match everything between 1st and 2nd " and printing them accordingly.
Solution 1: awk
You can use a single awk command:
awk -F\" 'index($1, ":"){print $2}' test.txt > outfile
See the online demo.
The -F\" sets the field separator to a " char, index($1, ":") condition makes sure Field 1 contains a : char (no regex needed) and then {print $2} prints the second field value.
Solution 2: awk + cut
You can use awk + cut:
awk '/:/' test.txt | cut -d\" -f2 > outfile
With awk '/:/' test.txt, you will extract line(s) containing : char, and then the piped cut -d\" -f2 command will split the string with " as a separator and return the second item. See the online demo.
Solution 3: sed
Alternatively, you can use sed:
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' file > outfile
See the online demo:
#!/bin/bash
s='name
things: "water bottle","40","new phone cover",10
place'
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' <<< "$s"
# => water bottle
The command means
-n - the option suppresses the default line output
^[^"]*"\([^"]*\)".* - a POSIX BRE regex pattern that matches
^ - start of string
[^"]* - zero or more chars other than "
" - a " char
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
".* - a " char and the rest of the string.
\1 replaces the match with Group 1 value
p - only prints the result of a successful substitution.

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

Greping asterisk through bash

I am validating few columns in a pipe delimited file. My second column is defaulted with '*'.
E.g. data of file to be validated:
abc|* |123
def|** |456
ghi|* |789
2nd record has 2 stars due to erroneous data.
I teied it as:
Value_to_match="*"
unmatch_count=cat <filename>| cut -d'|' -f2 | awk '{$1=$1};1' | grep -vw "$Value_to_match" | sort -n | uniq | wc -l
echo "unmatch_count"
This gives me count as 0 whereas I am expecting 1 (for **) as I have used -w with grep which is exact match and -v which is invert match.
How can I grep **?
The problem here is grep considering ** a regular expression. To prevent this, use -F to use fixed strings:
grep -F '**' file
However, you have an unnecessarily big set of piped operations, while awk alone can handle it quite well.
If you want to check lines containing ** in the second column, say:
$ awk -F"|" '$2 ~ /\*\*/' file
def|** |456
If you want to count how many of such lines you have, say:
$ awk -F"|" '$2 ~ /\*\*/ {sum++} END {print sum}' file
1
Note the usage of awk:
-F"|" to set the field separator to |.
$2 ~ /\*\*/ to say: hey, in every line check if the second field contains two asterisks (remember we sliced lines by |). We are escaping the * because it has a special meaning as a regular expression.
If you want to output those lines that have just one asterisk as second field, say:
$ awk -F"|" '$2 ~ /^*\s*$/' file
abc|* |123
ghi|* |789
Or check for those not matching this regex with !~:
$ awk -F"|" '$2 !~ /^*\s*$/' a
def|** |456

Resources