grep from first part of a line before delimiter - shell

I have to grep from this data- test1.txt:
1 - Billing_Type
604 - Customer_Name
2 - Contact_Name
3 - Customer_Phone_Number
4 - Contact_Phone_Number
5 - Customer_Type
6 - Reason_Code
7 - CALLE 1
8 - CALLE 2
9 - NUMERO
10 - ID
11 - Service Address
1700001031 - Serial_Number
1700001008 - STB_REF_AP_ID
1700001027 - Smart_Card_ID
I am comparing the first part of the file e.g. 1700001031, 1, 8 etc in a loop from a file and then copying the second part of the file in a variable like the Serial_Number, Billing_Type, CALLE 2.
This is the statement i have used :
sample statement
grep -w 1 test1.txt | cut -d'-' -f2 |tr -d ' '
but the problem with this statement is that for values 1 and 2 is will output two lines.
for 1 as ID,it will print:
Billing_Type
CALLE 1
as the ATTR_NAME also contains the word value 1 in 'CALLE 1'.
how do i search in the first part only and get the second without making any extra files?

You really want to use awk not grep for this:
$ awk -F' - ' '$1==1{print $2}' file
Billing_Type
$ awk -F' - ' '$1==7{print $2}' file
CALLE 1
$ awk -F' - ' '$1==1700001031{print $2}' file
Serial_Number
This does a numeric equality test against the first field $1 and if the line matches it's prints the second field $2 using - as the field separator.
With GNU Grep you could do the following but the awk approach is definitely the way to go:
$ grep -Po '^1\s+-\s+\K.*' file
Billing_Type
$ grep -Po '^7\s+-\s+\K.*' file
CALLE 1
$ grep -Po '^1700001031\s+-\s+\K.*' file
Serial_Number
This matches the start of string of the string ^ then a given number followed by one or more spaces, a dash and more spaces \s+-\s+, \K is part of perl compliant regular expressions so don't count on it being widely available, what it does is makes all the previously matched part of the string be forgotten about. Finally we match the rest of the line .* and only this is printed thanks to the -o option and the \K.
The approach with sed would be to match the line then subsitute the start of the line with an empty string:
$ sed -rn '/^1\s/{s/^[0-9]+\s+-\s+//p}' file
Billing_Type
$ sed -rn '/^7\s/{s/^[0-9]+\s+-\s+//p}' file
CALLE 1
$ sed -rn '/^1700001031\s/{s/^[0-9]+\s+-\s+//p}' file
Serial_Number

You just need to add ^ before search ID number to your command
grep -w '^1' test1.txt | cut -d'-' -f2 |tr -d ' '

Related

Adjusting column padding in bash

Any idea how can I put the output as the following?
Input:
1 GATTT
2 ATCGT
Desired output:
1 GATTT
2 ATCGT
I tried the following and it did not work
cut -c7,1-6,8-
$ awk -v OFS='\t' '{print $1,$2}' input
1 GATTT
2 ATCGT
or
$ awk '{print $1 "\t" $2}' input
SED can also be used:
sed "s/[:digit:]* .*/ &/g" input
1 GATTT
2 ATCGT
I'm assuming that the original whitespace were 6 spaces based on your cut command. The easiest way to knock this out with simple bash commands is using a tab for separation on the output.
echo " 1 GATTT" | cut -d ' ' -f 7- | tr ' ' '\t'
The cut command makes the delimeter a space character and takes from field 7 on. Then the tr (translate) command converts the remaining space to a tab.

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

getting a column of a specific line in bash

I have this command :
id=$(xl list|egrep $Name| tr -s ' ' | cut -d ' ' -f 2)
which xl list output something like this:
Name ID Mem VCPUs State Time(s)
Domain-0 0 5923 8 r----- 4266.0
new_redhat9-clone 3 1027 1 r----- 1019.6
new_redhat9 4 1027 1 -b---- 40.1
Actually I want to get the ID of a given Name. This works when Name=new_redhat9-clone (it returns 3) but doesnt work when Name=new_redhat9 (it returns: 3 4!!!!).
what is wrong?!!!
grep searches the string pattern match. egrep new_redhat9 match with "new_redhat9" and "new_redhat9-clone". Try add whiteespace (or \t) after pattern, rewrite like this
id=$(xl list|egrep 'new_redhat9 '| tr -s ' ' | cut -d ' ' -f 2)
You could use awk instead of egrep,tr and cut commands,
id=$(xl list | awk '$1=="new_redhat9" {print $2}')
Awk command searches for the exact string new_redhat9 in the first column of xl list output . If it finds any then then value of column2 on the corresponding record is stored to the variable id.
You could check the output through echo $id command.
If the name is stored in a variable, then give a try to the below command
id=$(xl list | awk -v var=$Name '$1==var {print $2}')

how can I get the index of a character in a given concurrence which is repeated several times in a TEXT line using SHELL (BASH) script

I have a Text string like below
"/path/to/log/file/LOG_FILE.log.2013-10-02-15:2013-10-02 15:46:57.809 INFO - TTT005|Receive|0000293|N~0000284~YOS~TTT005~ ~000~YC~|YOS TYOS-YCUPDT1-H 20131002154657669284YCARR TTT005 Y0TD04 |1|0150520106050|001|051052020603|003|015030010101502702060510520101|000||000|| "
Here "|" is repeated several times within the string and I need to get the index of 4th occurrence of "|" character using shell-script (BASH) command. I tried to find a way using grep command's options.
Thanks.
Using awk you can do:
awk -F '|' '{print index($0, $5)-1}' file
This will print character position of fourth pipe in the file.
grep can print the byte-offset; when used with -o it prints the byte-offset of the matching part.
$ string="/path/to/log/file/LOG_FILE.log.2013-10-02-15:2013-10-02 15:46:57.809 INFO - TTT005|Receive|0000293|N~0000284~YOS~TTT005~ ~000~YC~|YOS TYOS-YCUPDT1-H 20131002154657669284YCARR TTT005 Y0TD04 |1|0150520106050|001|051052020603|003|015030010101502702060510520101|000||000||"
$ grep -ob "[^|]*" <<< "${string}" | sed '5!d' | cut -d: -f1
132
Alternatively, without using grep:
$ newstring=$(echo "${string}" | cut -d\| -f5-)
$ echo $(( ${#string} - ${#newstring} ))
132

How to get the second column from command output?

My command's output is something like:
1540 "A B"
6 "C"
119 "D"
The first column is always a number, followed by a space, then a double-quoted string.
My purpose is to get the second column only, like:
"A B"
"C"
"D"
I intended to use <some_command> | awk '{print $2}' to accomplish this. But the question is, some values in the second column contain space(s), which happens to be the default delimiter for awk to separate the fields. Therefore, the output is messed up:
"A
"C"
"D"
How do I get the second column's value (with paired quotes) cleanly?
Use -F [field separator] to split the lines on "s:
awk -F '"' '{print $2}' your_input_file
or for input from pipe
<some_command> | awk -F '"' '{print $2}'
output:
A B
C
D
If you could use something other than 'awk' , then try this instead
echo '1540 "A B"' | cut -d' ' -f2-
-d is a delimiter, -f is the field to cut and with -f2- we intend to cut the 2nd field until end.
This should work to get a specific column out of the command output "docker images":
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 16.04 12543ced0f6f 10 months ago 122 MB
ubuntu latest 12543ced0f6f 10 months ago 122 MB
selenium/standalone-firefox-debug 2.53.0 9f3bab6e046f 12 months ago 613 MB
selenium/node-firefox-debug 2.53.0 d82f2ab74db7 12 months ago 613 MB
docker images | awk '{print $3}'
IMAGE
12543ced0f6f
12543ced0f6f
9f3bab6e046f
d82f2ab74db7
This is going to print the third column
Or use sed & regex.
<some_command> | sed 's/^.* \(".*"$\)/\1/'
You don't need awk for that. Using read in Bash shell should be enough, e.g.
some_command | while read c1 c2; do echo $c2; done
or:
while read c1 c2; do echo $c2; done < in.txt
If you have GNU awk this is the solution you want:
$ awk '{print $1}' FPAT='"[^"]+"' file
"A B"
"C"
"D"
awk -F"|" '{gsub(/\"/,"|");print "\""$2"\""}' your_file
#!/usr/bin/python
import sys
col = int(sys.argv[1]) - 1
for line in sys.stdin:
columns = line.split()
try:
print(columns[col])
except IndexError:
# ignore
pass
Then, supposing you name the script as co, say, do something like this to get the sizes of files (the example assumes you're using Linux, but the script itself is OS-independent) :-
ls -lh | co 5

Resources