Shell - How to remove a string "EES" from a record after 7th occurrence of colon(:) - shell

how do we remove EES from below input file
{"last_name":"Kiran","first_name":"kumar","sno":"1234","effe‌​ctive_date":"11/01/2‌​011","cancel_date":"‌​12/31/9999","alt_ein‌​_id_indicator":"Y","‌​alt_ein_id_employer_‌​number":"V3EES"}
Expecting the file after transformation to look like this
{"last_name":"Kiran","first_name":"kumar","sno":"1234","effe‌​ctive_date":"11/01/2‌​011","cancel_date":"‌​12/31/9999","alt_ein‌​_id_indicator":"Y","‌​alt_ein_id_employer_‌​number":"V3"}
TIA

Use jq for parsing JSON data
jq -c '.alt_ein_id_employer_number |= sub("EES";"")' file.json
{"last_name":"Kiran","first_name":"kumar","sno":"1234","effective_date":"11/01/2011","cancel_date":"12/31/9999","alt_ein_id_indicator":"Y","alt_ein_id_employer_number":"V3"}

Following awk should remove the EES string from 8th field or after 7th colon.
awk -F':' '{sub("EES","",$8)} 1' OFS=":" Input_file
Will add a detailed explanation for same too shortly.
Explanation:
awk -F':' Means I am setting up field separator here, by default in awk field separator's value is space so I am setting into colon now. So it will break the lines in parts with respect to colon only.
{sub("EES","",$8)} Means, I am using substitute utility of awk, which will work on method sub(regex_to_be_subsituted,new_value,current_line/variable). So here I am giving string EES to be substituted with NULL("") in $8 means 8th field of the line(which you mentioned after 7th colon).
1 means, awk works on method of condition then action, so by writing 1 I am making condition TRUE and didn't mention any action, so by default print will happen.
OFS=":" Means, setting output field separator, by default OFS will be space so as per your Input_file I am setting it to :
Input_file Means, simply mentioning Input_file name here.
If you want to save output into same Input_file then following may help you.
awk -F':' '{sub("EES","",$8)} 1' OFS=":" Input_file > temp_file && mv temp_file Input_file

Related

How to remove string between two characters and before the first occurrence using sed

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?
You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters
1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

Delete first column of csv file [duplicate]

This question already has answers here:
awk - how to delete first column with field separator
(5 answers)
Closed 5 years ago.
I would like to know how i can delete the first column of a csv file with awk or sed
Something like this :
FIRST,SECOND,THIRD
To something like that
SECOND,THIRD
Thanks in advance
Following awk will be helping you in same.
awk '{sub(/[^,]*/,"");sub(/,/,"")} 1' Input_file
Following sed may also help you in same.
sed 's/\([^,]*\),\(.*\)/\2/' Input_file
Explanation:
awk ' ##Starting awk code here.
{
sub(/[^,]*/,"") ##Using sub for substituting everything till 1st occurence of comma(,) with NULL.
sub(/,/,"") ##Using sub for substituting comma with NULL in current line.
}
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.
Using awk
$awk -F, -v OFS=, '{$1=$2; $2=$3; NF--;}1' file
SECOND,THIRD
With Sed
sed -i -r 's#^\w+,##g' test.csv
Grab the begin of the line ^, every character class [A-Za-z0-9] and also underscore until we found comma and replace with nothing.
Adding g after delimiters you can do a global substitution.
Using sed : ^[^,]+, regex represent the first column including the first comma. ^ means start of the line, [^,]+, means anything one or more times but a comma sign followed by a comma.
you can use -i with sed to make changes in file if needed.
sed -r 's/^[^,]+,//' input
SECOND,THIRD

Add quotes to strings between commas shell

I have a string like
1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Now, I am trying to achieve below string
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
So, I am trying to replace everything after 8th occurrence of comma (,) from start and before 12th occurrence of comman (,) from end to be in quotes.
I tried some options of awk but unable to achieve it . Anyway to get this done .
Thanks in advance .
try:
awk -v s1="\"" -F, '{$9=s1 $9;$(NF-12)=$(NF-12) s1} 1' OFS=, Input_file
So here I am making a variable which is " and making field separator as comma. Then I am re-creating 9th field as per your requirement with s1 and $9. Then re-creating 13th field from last(point to be noted no hardcoding of field number here so it may have any number of fields) and adding s1's value in last of it's current value. Then mentioning 1 will print the line. Setting OFS(output field separator) as comma too.
x='1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P'
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' <<< "$x"
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Explanation: Here FS and OFS are set to comma as the input stream is CSV.double quote is stored in a variable named q. Then the value of the desired columns are altered to get the desired results. You can change the values of columns to get any other results.
For files:
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' inputfile
$ awk -v FS=',' -v OFS=',' '{$9="\"" $9;$11=$11"\""; print}' your_file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
This might work for you (GNU sed):
sed 's/,/&"/8;s/,/"&/11' file
Insert " after and before ' eight and eleven.
awk '{sub(/\^A,-C,-T/,"\42^A,-C,-T\42")}1' file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
The fine point here is to escape the caret.

Using a multi-character field separator in awk on Solaris

I wish to use a string (BIRCH) as a field delimiter in awk to print second field. I am trying the following command:
cat tmp.log|awk -FBirch '{ print $2}'
Below output is getting printed:
irch2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Desired output:
2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Contents of tmp.log file.
-bash-3.2# cat tmp.log
Dec 05 13:49:23 [x.x.x.x.180.100] business-log-dev/int [TEST][0x80000001][business-log][info] mpgw(Test): trans(8497187)[request][10.x.x.x]:
Birch2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Am I doing something wrong?
OS: Solaris10
Shell: Bash
Tried below command suggested in one of the ansers below. I am getting the desired output, but with an extra empty line at the top. How can this be eliminated from the output?
-bash-3.2# /usr/xpg4/bin/awk -FBirch '{print $2}' tmp.log
2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Originally, I suggested putting quotes around "Birch" (-F'Birch') but actually, I don't think that should make any difference.
I'm not at all experienced working with Solaris but you may want to also try using nawk ("new awk") instead of awk.
nawk -FBirch '{print $2}' file
If this works, you may want to consider creating an alias so that you always use the newer version of awk with more features.
You may also want to try using the version of awk in the /usr/xpg4/bin directory, which is a POSIX compliant implementation so should support multi-character FS:
/usr/xpg4/bin/awk -FBirch '{print $2}' file
If you only want to print lines which have more than one field, you can add a condition:
/usr/xpg4/bin/awk -FBirch 'NF>1{print $2}' file
This only prints the second field when there is more than one field.
From the man page of the default awk on solaris usr/bin/awk
-Fc Uses the character c as the field separator
(FS) character. See the discussion of FS
below.
As you can see solaris awk only takes a single character as a Field separator
Also in the man page is split
split(s, a, fs)
Split the string s into array elements a[1], a[2], ...
a[n], and returns n. The separation is done with the
regular expression fs or with the field separator FS if
fs is not given.
As you can see here it takes a regular expression as a separator so we can use.
awk 'split($0,a,"Birch"){print a[2]}' file
To print the second field split by Birch

Remove first columns then leave remaining line untouched in awk

I am trying to use awk to remove first three fields in a text file. Removing the first three fields is easy. But the rest of the line gets messed up by awk: the delimiters are changed from tab to space
Here is what I have tried:
head pivot.threeb.tsv | awk 'BEGIN {IFS="\t"} {$1=$2=$3=""; print }'
The first three columns are properly removed. The Problem is the output ends up with the tabs between columns $4 $5 $6 etc converted to spaces.
Update: The other question for which this was marked as duplicate was created later than this one : look at the dates.
first as ED commented, you have to use FS as field separator in awk.
tab becomes space in your output, because you didn't define OFS.
awk 'BEGIN{FS=OFS="\t"}{$1=$2=$3="";print}' file
this will remove the first 3 fields, and leave rest text "untouched"( you will see the leading 3 tabs). also in output the <tab> would be kept.
awk 'BEGIN{FS=OFS="\t"}{print $4,$5,$6}' file
will output without leading spaces/tabs. but If you have 500 columns you have to do it in a loop, or use sub function or consider other tools, cut, for example.
Actually this can be done in a very simple cut command like this:
cut -f4- inFile
If you don't want the field separation altered then use sed to remove the first 3 columns instead:
sed -r 's/(\S+\s+){3}//' file
To store the changes back to the file you can use the -i option:
sed -ri 's/(\S+\s+){3}//' file
awk '{for (i=4; i<NF; i++) printf $i " "; print $NF}'

Resources