Not getting the values of columns in shell script - shell

-1
I have a text file abc.txt in which there are n number of columns I want to extract only column number 5 whose heading is senddata I am using the below command for that:-
awk -F "(|~|)" '{ print $5 }' /opt/var/acb.txt
I am using the above command to extract column 5 from file abc.txt in which I am getting the complete column values but not the heading of the column.
the file abc.txt has data as follows :-
prodid|~|prodtype|~|creationtime|~|affirmcode|~|senddata|~|city|~|country
334|~|T|~|4:09|~|BC334|~|Y|~|KG|~|ABC
443|~|F|~|4:44|~|RT548|~|Y|~|FR|~|FR
How can I achieve that ?

The pipe character is special for regular expressions. A multi-character FS is treated as a regular expression. Try
awk -F '[|]~[|]' ...

As | has special meaning in regular expressions you have to escape it if you want literal |. Let file.txt content be
prodid|~|prodtype|~|creationtime|~|affirmcode|~|senddata|~|city|~|country
334|~|T|~|4:09|~|BC334|~|Y|~|KG|~|ABC
443|~|F|~|4:44|~|RT548|~|Y|~|FR|~|FR
then
awk 'BEGIN{FS="\\|~\\|"}{print $5}' file.txt
output
senddata
Y
Y
(tested in gawk 4.2.1)

Related

How can I parse CSV files with quoted fields containing commas, in awk?

I have a big CSV field, and I use awk with the field separator set to a comma. However, some fields are quoted and contain a comma, and I'm facing this issue:
Original file:
Downloads $ cat testfile.csv
"aaa","bbb","ccc","dddd"
"aaa","bbb","ccc","d,dd,d"
"aaa","bbb","ccc","dd,d,d"
I am trying this way:
Downloads $ cat testfile.csv | awk -F "," '{ print $2","$3","$4 }'
"bbb","ccc","dddd"
"bbb","ccc","d
"bbb","ccc","dd
Expecting result:
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
I would use a tool that is able to properly parse CSV, such as xsv. With it, the command would look like
$ xsv select 2-4 testfile.csv
bbb,ccc,dddd
bbb,ccc,"d,dd,d"
bbb,ccc,"dd,d,d"
or, if you really want every value quoted, with a second step:
$ xsv select 2-4 testfile.csv | xsv fmt --quote-always
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
Include (escaped) quotes in your field separator flag, and add them to your output print fields:
testfile.csv | awk -F "\",\"" '{print "\""$2"\",\""$3"\",\""$4}'
output:
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
If gawk or GNU awk is available, you can make use of FPAT, which matches the fields, instead of splitting on field separators.
awk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, '{print $2, $3, $4}' testfile.csv
Result:
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
The string ([^,]+)|(\"[^\"]+\") is a regex pattern which matches either of:
([^,]+) ... matches a sequence of any characters other than a comma.
(\"[^\"]+\") ... matches a string enclosed by double quotes (which may include commas in between).
The parentheses around the patterns are put for visual clarity purpose and the regex will work without them such as FPAT='[^,]+|\"[^\"]+\"' because the alternative | has lower precedence.

Extract first 5 fields from semicolon-separated file

I have a semicolon-separated file with 10 fields on each line. I need to extract only the first 5 fields.
Input:
A.txt
1;abc ;xyz ;0.0000;3.0; ; ;0.00; ; xyz;
Output file:
B.txt
1;abc ;xyz ;0.0000;3.0;
You can cut from field1-5:
cut -d';' -f1-5 file
If the ending ; is needed, you can append it by other tool or using grep(assume your grep has -P option):
kent$ grep -oP '^(.*?;){5}' file
1;abc ;xyz ;0.0000;3.0;
In sed you can match the pattern string; 5 times:
sed 's/\(\([^;]*;\)\{5\}\).*/\1/' A.txt
or, when your sedsupports -r:
sed -r 's/(([^;]*;){5}).*/\1/' A.txt
cut -f-5 -d";" A.txt > B.txt
Where:
- -f selects the fields (-5 from start to 5)
- -d provides a delimiter, (here the semicolon)
Given that the input is field-based, using awk is another option:
awk 'BEGIN { FS=OFS=";"; ORS=OFS"\n" } { NF=5; print }' A.txt > B.txt
If you're using BSD/macOS, insert $1=$1; after NF=5; to make this work.
FS=OFS=";" sets both the input field separator, FS, and the output field separator, OFS, to a semicolon.
The input field separator is used to break each input record (line) into fields.
The output field separator is used to rebuild the record when individual fields are modified or the number of fields are modified.
ORS=OFS"\n" sets the output record separator to a semicolon followed by a newline, given that a trailing ; should be output.
Simply omit this statement if the trailing ; is undesired.
{ NF=5; print } truncates the input record to 5 fields, by setting NF, the number (count) of fields to 5 and then prints the modified record.
It is at this point that OFS comes into play: the first 5 fields are concatenated to form the output record, using OFS as the separator.
Note: BSD/macOS Awk doesn't modify the record just by setting NF; you must additionally modify a field explicitly for the changed field count to take effect: a dummy operation such as $1=$1 (assigning field 1 to itself) is sufficient.
awk '{print $1,$2,$3}' A.txt >B.txt
1;abc ;xyz ;0.0000;3.0;

How can I replace a character in a specific column? [duplicate]

I have a text file and I'm trying to replace a specific character (.) in the first column to another character (-). Every field is delimited by comma. Some of the lines have the last 3 columns empty, so they have 3 commas at the end.
Example of text file:
abc.def.ghi,123.4561.789,ABC,DEF,GHI
abc.def.ghq,124.4562.789,ABC,DEF,GHI
abc.def.ghw,125.4563.789,ABC,DEF,GHI
abc.def.ghe,126.4564.789,,,
abc.def.ghr,127.4565.789,,,
What I tried was using awk to replace '.' in the first column with '-', then print out the contents.
ETA: Tried out sarnold's suggestion and got the output I want.
ETA2: I could have a longer first column. Is there a way to change ONLY the first 3 '.' in the first column to '-', so I get the output
abc-def-ghi-qqq.www,123.4561.789,ABC,DEF,GHI
abc-def-ghq-qqq.www,124.4562.789,ABC,DEF,GHI
abc-def-ghw-qqq.www,125.4563.789,ABC,DEF,GHI
abc-def-ghe-qqq.www,126.4564.789,,,
abc-def-ghr-qqq.www,127.4565.789,,,
. is regexp notation for "any character". Escape it with \ and it means .:
$ awk -F, '{gsub(/\./,"-",$1); print}' textfile.csv
abc-def-ghi 123.4561.789 ABC DEF GHI
abc-def-ghq 124.4562.789 ABC DEF GHI
abc-def-ghw 125.4563.789 ABC DEF GHI
abc-def-ghe 126.4564.789
abc-def-ghr 127.4565.789
$
The output field separator is a space, by default. Set OFS = "," to set that:
$ awk -F, 'BEGIN {OFS=","} {gsub(/\./,"-",$1); print}' textfile.csv
abc-def-ghi,123.4561.789,ABC,DEF,GHI
abc-def-ghq,124.4562.789,ABC,DEF,GHI
abc-def-ghw,125.4563.789,ABC,DEF,GHI
abc-def-ghe,126.4564.789,,,
abc-def-ghr,127.4565.789,,,
This still allows changing multiple fields:
$ awk -F, 'BEGIN {OFS=","} {gsub(/\./,"-",$1); gsub("1", "#",$2); print}' textfile.csv
abc-def-ghi,#23.456#.789,ABC,DEF,GHI
abc-def-ghq,#24.4562.789,ABC,DEF,GHI
abc-def-ghw,#25.4563.789,ABC,DEF,GHI
abc-def-ghe,#26.4564.789,,,
abc-def-ghr,#27.4565.789,,,
I don't know what -OFS, does, but it isn't a supported command line option; using it to set the output field separator was a mistake on my part. Setting OFS within the awk program works well.
This might work for you:
awk -F, -vOFS=, '{for(n=1;n<=3;n++)sub(/\./,"-",$1)}1' file
abc-def-ghi-qqq.www,123.4561.789,ABC,DEF,GHI
abc-def-ghq-qqq.www,124.4562.789,ABC,DEF,GHI
abc-def-ghw-qqq.www,125.4563.789,ABC,DEF,GHI
abc-def-ghe-qqq.www,126.4564.789,,,
abc-def-ghr-qqq.www,127.4565.789,,,

interactive shell or bash script to manipulate a text file

I have a text file that contains 2 columns( example below )
Account_name Device_name
12345 1a3T567890f2
Values of the Device_name column then needs to be changed to:
Uppercase letters if letters exist (example 1A3T567890F2)
awk '{ print toupper($0) }' file.txt > file2.txt
The Colon symbol needs to be inserted to separate the value in to 2 char
chunks (example 1A:3T:56:78:90:F2)
sed 's/\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)/\1:\2:\3:\4:\5:\6/g' file2.txt > file3.txt
I would like to create a script that does those two functions at once.
You can just add \U at the start of your sed's replace expression to switch the following to uppercase :
sed 's/(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)/\U\1:\2:\3:\4:\5:\6/g' file2.txt > file3.txt
Test run :
$ echo "1a3T567890f2" | sed -r 's/(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)/\U\1:\2:\3:\4:\5:\6/g'
1A:3T:56:78:90:F2
You can do everything in awk:
awk '{$2=toupper($2);gsub(/[[:alnum:]]{2}/,"&:", $2);sub(/:[[:space:]]*$/,"",$2)}1' file
That's a bit more intuitive and it works for various amount of digits.

Cut and replace bash

I have to process a file with data organized like this
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
etc
Columns can have different length but lines always have the same number of columns.
I want to be able to cut a specific column of a given line and change it to the value I want.
For example I'd apply my command and change the file to
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
I know how to select a specific line with sed and then cut the field but I have no idea on how to replace the field with the value I have.
Thanks
Here's a way to do it with awk:
Going with your example, if you wanted to replace the 3rd field of the 1st line:
awk 'BEGIN{FS=OFS=":"} {if (NR==1) {$3 = "XXXX"}; print}' input_file
Input:
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Output:
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Explanation:
awk: invoke the awk command
'...': everything enclosed by single-quotes are instructions to awk
BEGIN{FS=OFS=":"}: Use : as delimiters for both input and output. FS stands for Field Separator. OFS stands for Output Field Separator.
if (NR==1) {$3 = "XXXX"};: If Number of Records (NR) read so far is 1, then set the 3rd field ($3) to "XXXX".
print: print the current line
input_file: name of your input file.
If instead what you are trying to accomplish is simply replace all occurrences of CCC with XXXX in your file, simply do:
sed -i 's/CCC/XXXX/g` input_file
Note that this will also replace partial matches, such as ABCCCDD -> ABXXXXDD
This might work for you (GNU sed):
sed -r 's/^(([^:]*:?){2})CCC/\1XXXX/' file
or
awk -F: -vOFS=: '$3=="CCC"{$3="XXXX"};1' file

Resources