I need to check for the user ID in a weird looking string. I only want the lines that have it. How do I check for 4 integers in a row in the following sample strings?
"111/S/H0110//Jake, Greenfield ServiceRequest/bin/ksh"
"740/S/H5155//Jake, Greenfield/bin/ksh"
"90/S/Customer /usr/bin/ksh"
"740/S///Jake, Greenfield/bin/ksh"
In these examples I would want these lines to pass:
111/S/H0110//Jake, Greenfield ServiceRequest/bin/ksh
740/S/H5155//Jake, Greenfield/bin/ksh
and NOT these to pass:
90/S/Customer /usr/bin/ksh
740/S///Jake, Greenfield/bin/ksh
BONUS QUESTION
The ID can be anything from,
[A-Z][A-Z][0-9][0-9][0-9][0-9]
[0-9][0-9][0-9][0-9][0-9][0-9]
[A-Z]-[0-9][0-9][0-9][0-9]
meaning, for example:
7A7777
AA7777
A77777
A-7777
(though I would settle for "just" finding "7777" in the string)
The solutions below assume each line is an entry, and each entry is made up of fields delimited by a forward slash (/) character.
awk -F/ '$3~/[[:digit:]]{4}$/' filename
Awk is pretty efficient at it.
As indicated in comments, this can make it:
grep -E '[A-Z]{2}[0-9]{4}|[A-Z]{2}[0-9]{4}|[A-Z]-[0-9]{4}'
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
(1) (2) (3)
This matches the requirements:
[A-Z][A-Z][0-9][0-9][0-9][0-9] --> [A-Z]{2}[0-9]{4} (1)
[0-9][0-9][0-9][0-9][0-9][0-9] --> [0-9]{6} (2)
[A-Z]-[0-9][0-9][0-9][0-9] --> [A-Z]-[0-9]{4} (3)
grep is the tool you are looking for:
grep '[0-9]\{4\}'
This awk command checks for the ID contains letter number combination. If it's there, then it prints then corresponding line.
$ awk -F/ '$3~/[A-Z-]*[0-9][A-Z0-9]*/ {print}' file
"111/S/H0110//Jake, Greenfield ServiceRequest/bin/ksh"
"740/S/H5155//Jake, Greenfield/bin/ksh"
If you want only the numbers in the ID field then try this command,
$ awk -F/ '$3~/[A-Z-]*[0-9][A-Z0-9]*/ { gsub (/[A-Z-]/,"",$3); print $3}' file
0110
5155
Related
I want to extract a value using "awk subtring" which should also count the number of spaces without any separator.
For example, below is the input, and I want to extract the "29611", including space,
201903011232101029 2961104E3021 223 0 12113 5 15 8288 298233 0 45 0 39 4
I used this method, but it used space as a separator:
more abbas.dat | awk '{print substr($1,1,16),substr($1,17,25)}'
Expected output should be :
201903011232101029 2961
But it prints only
201903011232101029
My question is how can we print using "substr" which count spaces?
I know, I can use this command to get the desired output but it is not helpful for my objective
more abbas.dat | awk '{print substr($1,1,16),substr($2,1,5)}'
1st solution: With your shown samples, please try following awk code. Written and tested in GNU awk. Using match function of awk here to get required output.
To print 1st field followed by varying spaces followed by 5 digits from 2nd field then use following:
awk 'match($0,/^[0-9]+[[:space:]]+[0-9]{5}/){print substr($0,RSTART,RLENGTH)}' Input_file
OR To print 16 letters in 1st field and 5 from second field including varying length of spaces between 1st and 2nd fields:
awk 'match($0,/^([0-9]{16})[^[:space:]]+([[:space:]]+)([0-9]{5})/,arr){print arr[1] arr[2] arr[3]}' Input_file
2nd solution: Using GNU grep please try following, considering that your 2nd column first 4 needed values can be anything(eg: digits, alphabets etc).
grep -oP '^\S+\s+.{5}' Input_file
OR to only match 4 digits in 2nd field have a minor change in above grep.
grep -oP '^\S+\s+\d{5}' Input_file
If there is always one space you can use the following command which will print the first group, plus the first 5 character of the second group.
N.B. It's not clear in the question whether you want 4 or 5 characters but that can be adjusted easily.
more abbas.dat | awk '{print $1" "substr($2,1,5) }'
I think the simplest way is to include "Fs" in your command.
more abbas.dat | awk -Fs '{print substr($1,1,16),substr($1,17,25)}'
$ awk '{print substr($0,1,24)}' file
201903011232101029 29611
If that's not all you need then edit your question to clarify your requirements.
I am trying to find a way, how to extract a word between special character and other words.
Example of the text:
description "CST 500M TEST/VPNGW/11040 X {} // test"
description "test2-VPNGW-110642 -VPNGW"
I am trying to achieve result like,only the word including VPNGW:
TEST/VPNGW/11040
test2-VPNGW-110642
I tried with grep and AWK, but looks like my knowledge is not so far enough.
The way to print with awk '{$1=""; $2=""; ... is not working due to the whole word is not always on the same position.
Thanks for the help!
With grep you can output only the part of the string that matches the regex:
grep -o '[^ "]\+VPNGW[^ "]\+' file.name
You could try something like:
grep -Eoi 'test.*[0-9]'
Of course this would be greedy and if there is another number after the ones in the required string it will grab up to there. Normally I would suggest an inverted test to stop at the thing you don't want:
grep -Eoi 'test[^ ]+'
The problem with this is like in your first example, there is more than one occurrence of the string 'test' and so the output for the first example is:
TEST/VPNGW/11040
test"
Of course knowing what your real data looks like you can make your own decision on what might best suit
Uou could go with the perl regex machine in grep and use a look-ahead:
grep -Eoi 'test[^ ]+(?= )'
Again though, if you have the string 'test' somewhere else on the line followed by a single space, this will still not work as desired.
Lastly, awk can do the job but you would need to cycle through each item or set RS to white space:
Option 1:
awk '{for(i=1;i<=NF;i++)if(tolower($i) ~ /test.*[0-9]/)print $i}'
Option 2:
awk 'tolower($i) ~ /test.*[0-9]/' RS="[[:space:]]+"
awk '/test2/{sub(/"/,"")}$0{print $4}/test2/{print $2}' file
TEST/VPNGW/11040
test2-VPNGW-110642
I am new to bash programming and I hit a roadblock.
I need to be able to calculate the largest record number within a txt file and store that into a variable within a function.
Here is the text file:
student_records.txt
12345,fName lName,Grade,email
64674,fName lName,Grade,email
86345,fName lName,Grade,email
I need to be able to get the largest record number ($1 or first field) in order for me to increment this unique record and add more records to the file. I seem to not be able to figure this one out.
First, I sort the file by the first field in descending order and then, perform this operation:
largest_record=$(awk-F,'NR==1{print $1}' student_records.txt)
echo $largest_record
This gives me the following error on the console:
awk-F,NR==1{print $1}: command not found
Any ideas? Also, any suggestions on how to accomplish this in the best way?
Thank you in advance.
largest=$(sort -r file|cut -d"," -f1|head -1)
You need spaces, and quotes
awk -F, 'NR==1{print $1}'
The command is awk, you need a space after it so bash parses your command line properly, otherwise it thinks the whole thing is the name of the command, which is what the error messages is telling you.
Learn how to use the man command so you can learn how to invoke other commands:
man awk
This will tell you what the -F option does:
The -F fs option defines the input field separator to be the regular expression fs.
So in your case the field separator is a comma -F,
What follows in quotes is what you want awk to interpret, it says to match a line with the pattern NR==1, NR is special, it is the record number, so you want it to match the first record, following that is the action you want awk to take when that pattern matches, {print $1}, which says to print the first field (comma separated) of the line.
A better way to accomplish this would be to use awk to find the largest record for you rather than sorting it first, this gives you a solution that is linear in the number for records - you just want the max, no need to do extra work of sorting the whole file:
awk -F, 'BEGIN {max = 0} {if ($1>max) max=$1} END {print max}' student_records.txt
For this and other awk "one liners" look here.
I have following data file in bash. I want to search if the user entered webserver is present in the data file, if present it should return the Phase and Managed server name.
1 K1 tvtw1 tvtm1
1 K1 tvtw2 tvtw2
2 K2 tvtw26 tvtw26
3 k5 tvtw29 tvtm29
I tried grep "$webserver" serverList.lst | awk '{print $1}' but it returns multiple values for tvtw2. Is there any way to find exact server name from the list ?
If I understand correctly, if column 3 matches exactly, then you want to get the value of column 1:
awk -v serv=tvtw2 '$3 == serv {print $1}' serverList.lst
That is, we put the string you want to match in variable serv, and then use that as a filter expression in awk to match column 3 exactly.
you need add word boundary in your grep regex, so that tvtw26 won't be selected.
e.g. grep '\btvtw2\b' file ...
However since you have already used awk, you can consider to use awk for all. #janos 's answer showed how could it be done.
Try grep -w (-w stands for word regex)
However, grep | awk is useless-use-of-grep. See janos's answer for more optimal solution.
I have a list of email addresses. I want to remove the ones that start with numbers and capital letters only. For example if the file contains:
0035EA7C#xxxx.com
A7C0035E#zzzz.com
email#yyy.com
I need to delete the first 2 lines in SSH.
Thanks!
You can use grep to get the desired result:
grep -v '^[0-9[:upper:]]\+#'
^ matches the beginning of a line. [...] is a character class, it contains digits and uppercase letters. it must be present once or more \+. # stands for itself.
Whit a awk solution :
awk '/^[^[:upper:]0-9]+#/' file.txt
This might work for you:
sed '/^[A-Z0-9]/d' file