Delete first column of csv file [duplicate] - bash

This question already has answers here:
awk - how to delete first column with field separator
(5 answers)
Closed 5 years ago.
I would like to know how i can delete the first column of a csv file with awk or sed
Something like this :
FIRST,SECOND,THIRD
To something like that
SECOND,THIRD
Thanks in advance

Following awk will be helping you in same.
awk '{sub(/[^,]*/,"");sub(/,/,"")} 1' Input_file
Following sed may also help you in same.
sed 's/\([^,]*\),\(.*\)/\2/' Input_file
Explanation:
awk ' ##Starting awk code here.
{
sub(/[^,]*/,"") ##Using sub for substituting everything till 1st occurence of comma(,) with NULL.
sub(/,/,"") ##Using sub for substituting comma with NULL in current line.
}
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.

Using awk
$awk -F, -v OFS=, '{$1=$2; $2=$3; NF--;}1' file
SECOND,THIRD

With Sed
sed -i -r 's#^\w+,##g' test.csv
Grab the begin of the line ^, every character class [A-Za-z0-9] and also underscore until we found comma and replace with nothing.
Adding g after delimiters you can do a global substitution.

Using sed : ^[^,]+, regex represent the first column including the first comma. ^ means start of the line, [^,]+, means anything one or more times but a comma sign followed by a comma.
you can use -i with sed to make changes in file if needed.
sed -r 's/^[^,]+,//' input
SECOND,THIRD

Related

How to remove string between two characters and before the first occurrence using sed

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?
You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters
1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

Extract version using shell commands

I am trying to extract version from below mentioned URL's using shell commands like, grep, awk, sed, cut which ever is most suitable
https://abcd/efgh/1.1.3/hijkl/mnop
https://abcd/efgh/hijkl/2.3.4.5/mnop
https://abcd/3.4/efgh/hijkl/mnop
I am looking to extract the version(numbers with dot) alone from the URL, where the position may vary as in above example. Looking for suggestions.
Expected output to be :
1.1.3
2.3.4.5
3.4
You may use this grep:
grep -Eo '[0-9]+(\.[0-9]+)+' file
1.1.3
2.3.4.5
3.4
With awk, written and tested with shown samples in GNU awk.
awk 'match($0,/([0-9]+\.){1,}[0-9]+/){print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/([0-9]+\.){1,}[0-9]+/){ ##using match function to match regex of ([0-9]+\.){1,}[0-9]+ in current line.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched regex above, starting index is RSTART till value of RLENGTH here.
}
' Input_file ##Mentioning Input_file name here.
I would use GNU AWK following way, let file.txt content be
https://abcd/efgh/1.1.3/hijkl/mnop
https://abcd/efgh/hijkl/2.3.4.5/mnop
https://abcd/3.4/efgh/hijkl/mnop
then
awk 'BEGIN{RS="[/\n]"}/^[.[:digit:]]+$/' file.txt
output
1.1.3
2.3.4.5
3.4
Explanation: I specify row seperator (RS) as either / or newline (\n) then print only rows (i.e. parts between / or newline and / or / and newline) which contain only . or digits - in order to achieve such effect I use ^ and $ denoting start and end of record.

Shell - How to remove a string "EES" from a record after 7th occurrence of colon(:)

how do we remove EES from below input file
{"last_name":"Kiran","first_name":"kumar","sno":"1234","effe‌​ctive_date":"11/01/2‌​011","cancel_date":"‌​12/31/9999","alt_ein‌​_id_indicator":"Y","‌​alt_ein_id_employer_‌​number":"V3EES"}
Expecting the file after transformation to look like this
{"last_name":"Kiran","first_name":"kumar","sno":"1234","effe‌​ctive_date":"11/01/2‌​011","cancel_date":"‌​12/31/9999","alt_ein‌​_id_indicator":"Y","‌​alt_ein_id_employer_‌​number":"V3"}
TIA
Use jq for parsing JSON data
jq -c '.alt_ein_id_employer_number |= sub("EES";"")' file.json
{"last_name":"Kiran","first_name":"kumar","sno":"1234","effective_date":"11/01/2011","cancel_date":"12/31/9999","alt_ein_id_indicator":"Y","alt_ein_id_employer_number":"V3"}
Following awk should remove the EES string from 8th field or after 7th colon.
awk -F':' '{sub("EES","",$8)} 1' OFS=":" Input_file
Will add a detailed explanation for same too shortly.
Explanation:
awk -F':' Means I am setting up field separator here, by default in awk field separator's value is space so I am setting into colon now. So it will break the lines in parts with respect to colon only.
{sub("EES","",$8)} Means, I am using substitute utility of awk, which will work on method sub(regex_to_be_subsituted,new_value,current_line/variable). So here I am giving string EES to be substituted with NULL("") in $8 means 8th field of the line(which you mentioned after 7th colon).
1 means, awk works on method of condition then action, so by writing 1 I am making condition TRUE and didn't mention any action, so by default print will happen.
OFS=":" Means, setting output field separator, by default OFS will be space so as per your Input_file I am setting it to :
Input_file Means, simply mentioning Input_file name here.
If you want to save output into same Input_file then following may help you.
awk -F':' '{sub("EES","",$8)} 1' OFS=":" Input_file > temp_file && mv temp_file Input_file

Add quotes to strings between commas shell

I have a string like
1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Now, I am trying to achieve below string
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
So, I am trying to replace everything after 8th occurrence of comma (,) from start and before 12th occurrence of comman (,) from end to be in quotes.
I tried some options of awk but unable to achieve it . Anyway to get this done .
Thanks in advance .
try:
awk -v s1="\"" -F, '{$9=s1 $9;$(NF-12)=$(NF-12) s1} 1' OFS=, Input_file
So here I am making a variable which is " and making field separator as comma. Then I am re-creating 9th field as per your requirement with s1 and $9. Then re-creating 13th field from last(point to be noted no hardcoding of field number here so it may have any number of fields) and adding s1's value in last of it's current value. Then mentioning 1 will print the line. Setting OFS(output field separator) as comma too.
x='1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P'
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' <<< "$x"
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Explanation: Here FS and OFS are set to comma as the input stream is CSV.double quote is stored in a variable named q. Then the value of the desired columns are altered to get the desired results. You can change the values of columns to get any other results.
For files:
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' inputfile
$ awk -v FS=',' -v OFS=',' '{$9="\"" $9;$11=$11"\""; print}' your_file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
This might work for you (GNU sed):
sed 's/,/&"/8;s/,/"&/11' file
Insert " after and before ' eight and eleven.
awk '{sub(/\^A,-C,-T/,"\42^A,-C,-T\42")}1' file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
The fine point here is to escape the caret.

replace a string before the semi colon

I have several files, which begins like this :
unit,s_adj,partner,stk_flow,indic,geo\time;aaaa;2222;
time,s_adj,partner,stk_flow,lolo,geo\time;bbb;2222;
I want to replace the first occurence before the semi-colon with that new occurence YEAR
The desired output would be:
YEAR;aaaa;2222;
YEAR;bbb;2222;
I tried with the following command line but it does not seem to do what I want
awk -F ";" 'NR==1 {$1=""; print "year"}' input_file
Your suggestions are welcomed.
Best.
try this:
sed 's/[^;]*/YEAR/' file
if you only want the substitution happen on the 1st line:
sed '1s/[^;]*/YEAR/' file
You can also do:
awk '{$1="YEAR"}1' OFS=\; FS=\; input-file

Resources