Printing multiple parts of the same line matching a pattern using bash - bash

I am writing a unix command to get lines matching abcd at position 87-90 and for the lines matching this critieria it should get me position 10-15, 124-128,250-265.I tried something like this.
grep -h abcd sample.txt |cut -c 10-15,cut -c 124-128,cut -c 250-260
Though this is syntactically wrong I hope it conveys what I am trying to achieve.Could you help me concatenate all the results from the multiple cuts?

cut -c accepts a list of characters. As described in the man page, "each list is made up of one range, or many ranges separated by commas."
grep -h abcd sample.txt | cut -c 10-15,124-128,250-260

Related

extract and display matching strings with similar pattern

I have an extremely long line. It contains a lot of strings with a similar patterns like below;
t[0-4]_vmdk_[a-z]_anything
t followed by a single digit with possible 0-4, and them '_vmdk_', and any long string with possible [a-z], then finally "_" in the end. The rest could be anything.
Ex:
asdfasfsa/_asdf**t2_vmdk_abc_**badfad**t3_vmdk_xyz_**asdfasdf**t1_vmdk_efg_**asbafdfb....
Please help me to display all such strings. Thank you!
The simplest is probably with grep. If the input string comes from the standard output of another command (e.g. echo):
$ str='asdfasfsa/_asdf**t2_vmdk_abc_**badfad**t3_vmdk_xyz_**asdfasdf**t1_vmdk_efg_**asbafdfb....'
$ echo "$str" | grep -o 't[0-4]_vmdk_[a-z]*'
t2_vmdk_abc
t3_vmdk_xyz
t1_vmdk_efg
Explanation: the -o option prints "only the matched (non-empty) parts of a matching line".
If the input strings are stored in a file:
$ grep -o 't[0-4]_vmdk_[a-z]*' file.txt
t2_vmdk_abc
t3_vmdk_xyz
t1_vmdk_efg

How to use grep/awk/sed to print until a certain character?

I am a complete beginner on shell scripting and I am trying to iterate through a set of JSON files and trying to extract a certain field out of it. Each JSON file has a "country:"xxx" field. In each JSON file, there are 10k of the same field with the same country name so I need only the first occurrence and I can do that using "-m 1".
I tried to use grep for this but could not figure out how to extract the whole field including the country name from each file at first occurrence.
for FILE in *.json;
do
grep -o -a -m 1 -h -r '"country":"' $FILE;
done
I tried to use another pipe and use the below pattern but it did not work
| egrep -o '^[^"]+'
Actual Output:
"country":"
"country":"
"country":"
Desired Output:
"country:"romania"
"country:"united kingdom"
"country:"tajikistan"
but I need the whole thing. Any help would be great. Thanks
There is one general answer on the question "I only want the first occurence", and that answer is:
... | head -n 1
This mean, whatever your do: take the head (the first lines), the -n switch gives you the possibility to say how many you want (one in this case).
The same can be done for the last occurence(s), but then you use tail instead of head (you can also use the -n switch).
After trying many things. I found the pattern I was looking for.
grep -Po '"country":.*?[^\\]",' $FILE | head -n 1;

How to capture digits in front of specific keyword in bash

Imagine following string:
<tr><td>12,3</td><td>deg</td><td>23,4</td><td>humi</td><td>34,5</td><td>press</td></tr>
In bash, how do I extract 23.4, based on the condition that it is followed by humi?
grep -o works well for this sort of thing. I'm sure performance would be better with a single sed command than two greps but that's rarely a serious concern.
X='<tr><td>12,3</td><td>deg</td><td>23,4</td><td>humi</td><td>34,5</td><td>press</td></tr>'
echo $X | grep -o '[0-9,.]*</td><td>humi' | grep -o '[0-9,.]*'
# Result: 23,4
You can additionally pipe through tr , . to get English number format.

Bash grep keyword plus trailing numbers upto first whitespace

I'm looking to filter tcpdump output and extracting only two constant element names and their string of changing numbers which is followed by a white space and more unwanted data. Is there a way to only extract up to the first white space using GREP of SED? I've been using bash for about a month and this is the first time my googlefoo has failed me.
Example output: red23:34:23 black23:43 purple00:55:22 yellow32:43 green10:10 (color names are constant)
Looking to extract: black23:43 yellow32:43
The -o option in grep prints only the matching part, so to get just black and the numbers you might do this:
output='red23:34:23 black23:43 purple00:55:22 yellow32:43 green10:10'
echo "$output" | grep -Eo 'black[0-9]+:[0-9]+'
and you could parameterize it like so:
color='green'
echo "$output" | grep -Eo "${color}[0-9]+:[0-9]+"

Grep characters before and after match?

Using this:
grep -A1 -B1 "test_pattern" file
will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?
The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?
3 characters before and 4 characters after
$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.
You could use
awk '/test_pattern/ {
match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file
You mean, like this:
grep -o '.\{0,20\}test_pattern.\{0,20\}' file
?
That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.
I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:
cgrep() {
# For files that are arrays 10's of thousands of characters print.
# Use cpgrep to print 30 characters before and after search pattern.
if [ $# -eq 2 ] ; then
# Format was 'cgrep "search string" /path/to/filename'
grep -o -P ".{0,30}$1.{0,30}" "$2"
else
# Format was 'cat /path/to/filename | cgrep "search string"
grep -o -P ".{0,30}$1.{0,30}"
fi
} # cgrep()
Here's what it looks like in action:
$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
-rw-r--r-- 1 rick rick 25780 Jul 3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.
Notice the two different ways you can call cgrep that parallels grep method.
There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.
With gawk , you can use match function:
x="hey there how are you"
echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
ere are
If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.
echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how
This can also be applied to words instead of just characters.Following will print one word before the actual matching string.
echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey
Following will print one word after the pattern:
echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how
Following will print one word before the pattern , then the actual word and then one word after the pattern:
echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how
If using ripgreg this is how you would do it:
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
You can use regexp grep for finding + second grep for highlight
echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string
23_string_and
With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:
ugrep -o -C30 pattern testfile.txt
gives:
1: ... long line with an example pattern to match. The line could...
2: ...nother example line with a pattern.
The same on a terminal with color highlighting gives:
Multiple matches on a line are either shown with [+nnn more]:
or with option -k (--column-number) to show each individually with context and the column number:
The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.
I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.
e.g. echo zzzabczzzz | grep -o '.abc..'
Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.
If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.
There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

Resources