Unable to remove last field CSV file - shell

i have csv file contains data like, I need to get all fields as it is except last one.
"one","two","this has comment section1"
"one","two","this has comment section2 and ( anything ) can come here ( ok!!!"
gawk 'BEGIN {FS=",";OFS=","}{sub(FS $NF, x)}1'
gives error-
fatal: Unmatched ( or (:
I know if i remove '(' from second line solves the problem but i can not remove anything from comment section.

With any awk you could try:
awk 'BEGIN{FS=",";OFS=","}{$NF="";sub(/,$/,"")}1' Input_file
Or with GNU awk try:
awk 'BEGIN{FS=",";OFS=","}NF{--NF};1' Input_file

Since you mention that everything can come here, you might also have a line that looks like:
"one","two","comment with a , comma"
So it is a bit hard to just use the <comma>-character as a field separator.
The following two posts are now very handy:
What's the most robust way to efficiently parse CSV using awk?
[U&L] How to delete the last column of a file in Linux (Note: this is only for GNU awk)
Since you work with GNU awk, you can thus do any of the following two:
$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF--}1'
$ awk 'BEGIN{FPAT="[^,]*|\"[^\"]+\"";OFS=","}NF{NF--}1'
$ awk 'BEGIN{FPAT="[^,]*|\042[^\042]+\042";OFS=","}NF{NF--}1'
Why is your command failing: The sub(ere,repl,in) command of awk assumes that the first part ere is an extended regular expression. Hence, the bracket has a special meaning. If you want to replace fields which are known and unique, you should not use sub, but just redefine the field:
$ awk '{$NF=""}'
If you want to replace a string matching a field, you should do this:
s=$(number);while(i=index(s,$0)){$0=substr(1,i-1) "repl" substr(i+length(s),$0) }

Related

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.

AWK - Delete whole line when inside that line a piece matchs a string

I have a db.sql full of lines containing sometime the string _wc_session_
(26680, '_wc_session_expires_120f486fe21c9ae4ce247c04f3b009f9', '1445934089', 'no'),
(26682, '_wc_session_expires_73516b532380c28690a4437d20967e03', '1445934114', 'no'),
(26683, '_wc_session_1a71c566970b07ac2b48c5da4e0d43bf', 'a:21:{s:4:"cart";s:305:"a:1:{s:32:"7fe1f8abaad094e0b5cb1b01d712f708";a:9:{s:10:"product_id";i:459;s:12:"variation_id";s:0:"";s:9:"variation";a:0:{}s:8:"quantity";i:1;s:10:"line_total";d:6;s:8:"line_tax";i:0;s:13:"line_subtotal";i:6;s:17:"line_subtotal_tax";i:0;s:13:"line_tax_data";a:2:{s:5:"total";a:0:{}s:8:"subtotal";a:0:{}}}}";s:15:"applied_coupons";s:6:"a:0:{}";s:23:"coupon_discount_amounts";s:6:"a:0:{}";s:27:"coupon_discount_tax_amounts";s:6:"a:0:{}";s:21:"removed_cart_contents";s:6:"a:0:{}";s:19:"cart_contents_total";d:6;s:20:"cart_contents_weight";i:0;s:19:"cart_contents_count";i:1;s:5:"total";i:0;s:8:"subtotal";i:6;s:15:"subtotal_ex_tax";i:6;s:9:"tax_total";i:0;s:5:"taxes";s:6:"a:0:{}";s:14:"shipping_taxes";s:6:"a:0:{}";s:13:"discount_cart";i:0;s:17:"discount_cart_tax";i:0;s:14:"shipping_total";i:0;s:18:"shipping_tax_total";i:0;s:9:"fee_total";i:0;s:4:"fees";s:6:"a:0:{}";s:10:"wc_notices";s:205:"a:1:{s:7:"success";a:1:{i:0;s:166:"Ver carrito Se ha añadido "Incienso Gaudí Lavanda" con éxito a tu carrito.";}}";}', 'no'),
I'd like to remove those whole lines with AWK when _wc_session_ is within. I mean the whole line like:
(26682, '_wc_session_expires_73516b532380c28690a4437d20967e03', '1445934114', 'no'),
So far I've found the right REGEX that select the whole line
when "_wc_session_" is found
(^\(.*_wc_session_.*\)\,)
but we I try to run
awk '!(^\(.*_wc_session_.*\)\,)' db.sql > temp.sql
I get
awk: line 1: syntax error at or near ^
Am I missing something?
If you're set on awk:
awk '!/_wc_session/' db.sql
You may also you sed -i to write output "inplace" (in the input file):
sed -i '/_wc_session/d' db.sql
Edit:
A more precise approach with awk would be to use the inherent , from your file as delimiter and only check column 2 for the respective pattern. This approach is useful in case the pattern would be in a different column and that line should not be removed.
awk -F',' '$2 !~ "_wc_session" {print $0}' db.sql
With simple grep following may help you in same and should do the trick.
grep -v "(26682, '_wc_session_expires_73516b532380c28690a4437d20967e03', '1445934114', 'no')" Input_file
EDIT: If you want to remove the lines which have only string _wc_session_expires_ in any line then following may help you in same.
grep -v "_wc_session_expires_" Input_file
Mistake on the input Regex, the right one is
'!/(^(.*wc_session.*)\,)/'

How to replace multiple " in between two, with help of sed or awk

In csv file If in between two, there are more than two " present then I want to replace them with only two " using shell script.
Example
If in csv file it is like, """any word"", it should get replaced with, "any word", or if it is like, [any number of "], it should get replaced with, "".
FYI: " this is double quote not two single quote.
and [] are not present actually in data , i gave it for understanding
awk solution:
sample testfile contents:
sdsdf,"""hello"",sdsdf
asdasd,[asdasd asdasd]",sdfsdf
sdf,"[asdasd]",asdasd
The job:
awk -F, '{ for(i=1;i<=NF;i++) if($i~/"{2,}/) gsub(/"+/,"\"",$i);
else if($i~/^[^"]*"{1,}[^"]*$/) $i="\"\""; }1' OFS=',' testfile
The output:
sdsdf,"hello",sdsdf
asdasd,"",sdfsdf
sdf,"[asdasd]",asdasd
Try to use Roman's file
awk -F, '{gsub(/"""hello""/,"\42hello\42",$2)gsub(/\[asdasd asdasd\]/,"\42")}1' OFS=, file
sdsdf,"hello",sdsdf
asdasd,"",sdfsdf
sdf,"[asdasd]",asdasd
Here's a sed solution, which as OP works between commas, but which doesn't work, if there are commas in between the quotation marks:
sed ':a;s/\(,"[^,"]*\|^"[^,"]*\)"\([^,]\)/\1\2/;ta' testfile
Using Roman's test file my output is:
sdsdf,"hello",sdsdf
asdasd,[asdasd asdasd]",sdfsdf
sdf,"[asdasd]",asdasd
Note that the second field of the second line is different in my version, as I'm not sure what behavior OP wants in that case or if fields like that even exist.

I need to be able to print the largest record value from txt file using bash

I am new to bash programming and I hit a roadblock.
I need to be able to calculate the largest record number within a txt file and store that into a variable within a function.
Here is the text file:
student_records.txt
12345,fName lName,Grade,email
64674,fName lName,Grade,email
86345,fName lName,Grade,email
I need to be able to get the largest record number ($1 or first field) in order for me to increment this unique record and add more records to the file. I seem to not be able to figure this one out.
First, I sort the file by the first field in descending order and then, perform this operation:
largest_record=$(awk-F,'NR==1{print $1}' student_records.txt)
echo $largest_record
This gives me the following error on the console:
awk-F,NR==1{print $1}: command not found
Any ideas? Also, any suggestions on how to accomplish this in the best way?
Thank you in advance.
largest=$(sort -r file|cut -d"," -f1|head -1)
You need spaces, and quotes
awk -F, 'NR==1{print $1}'
The command is awk, you need a space after it so bash parses your command line properly, otherwise it thinks the whole thing is the name of the command, which is what the error messages is telling you.
Learn how to use the man command so you can learn how to invoke other commands:
man awk
This will tell you what the -F option does:
The -F fs option defines the input field separator to be the regular expression fs.
So in your case the field separator is a comma -F,
What follows in quotes is what you want awk to interpret, it says to match a line with the pattern NR==1, NR is special, it is the record number, so you want it to match the first record, following that is the action you want awk to take when that pattern matches, {print $1}, which says to print the first field (comma separated) of the line.
A better way to accomplish this would be to use awk to find the largest record for you rather than sorting it first, this gives you a solution that is linear in the number for records - you just want the max, no need to do extra work of sorting the whole file:
awk -F, 'BEGIN {max = 0} {if ($1>max) max=$1} END {print max}' student_records.txt
For this and other awk "one liners" look here.

How not to get expanded variables in AWK

Good day,
I was wondering how not to get expanded variables in AWK.
variable to pass:achi
But, when I try with:
awk -F, -v var1="achi" '$(NF-1)~var1' file
It just does not work. It prints all lines that match achi.
I'll appreciate some insights to do it properly.
Input
achi, francia
nachi, peru
universidad achi, japon
achito, suecia
Expected Output
achi, francia
You seem to be trying to test equivalence with the pattern matching operator ~. The proper operator to test equivalence is ==.
awk -F, -v var1="achi" '$(NF-1)==var1' file
If you are expecting more fields you should take into account that your values are separated with a comma and a space, this can be done using ", " as the field separator.
awk -F", " -v var1="achi" '$(NF-1)==var1'

Resources