convert a file content using shell script - bash
Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me.
The file's content like this
/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5
to this:
a2>a3
b2>b3
c2>c3
The script should ignore the first line and print the second and third values separated by '>'
I'm half way there, and here is my code
#!/bin/bash
#while Loops
i=1
while IFS=\" read t1 t2 t3
do
test $i -eq 1 && ((i=i+1)) && continue
echo $t1|cut -d\, -f2 | { tr -d '\n'; echo \>$t2; }
done < $1
The problem in my code is that the last line isnt printed unless the file finishes with an empty line \n
And I want the echo to be printed inside a new CSV file(I tried to set the standard output to my new file but only the last echo is printed there).
Can someone please help me out? Thanks in advance.
Rather than treating the double quotes as a field separator, it seems cleaner to just delete them (assuming that is valid). Eg:
$ < input tr -d '"' | awk 'NR>1{print $2,$3}' FS=, OFS=\>
a2>a3
b2>b3
c2>c3
If you cannot just strip the quotes as in your sample input but those quotes are escaping commas, you could hack together a solution but you would be better off using a proper CSV parsing tool. (eg perl's Text::CSV)
Here's a simple pipeline that will do the trick:
sed '1d' data.txt | cut -d, -f2-3 | tr -d '"' | tr ',' '>'
Here, we're just removing the first line (as desired), selecting fields 2 & 3 (based on a comma field separator), removing the double quotes and mapping the remaining , to >.
Use this Perl one-liner:
perl -F',' -lane 'next if $. == 1; print join ">", map { tr/"//d; $_ } #F[1,2]' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
Related
How to get values in a line while looping line by line in a file (shell script)
I have a file which looks like this (file.txt) {"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here {"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here {"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here {"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here {"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here {"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here I want to loop trough this line by line an extract the key values. so the result should be like , AJGUIGIDH568 AJGUIGIDH568 YUUUIGIDH566 HJHHIGIDH568 ATYUGUIDH556 QfgUIGIDH568 So I wrote a code like this to loop line by line and extract the value between {"key":" and ","rule": because key values is in between these 2 patterns. while read p; do echo $p | sed -n "/{"key":"/,/","rule":,/p" done < file.txt But this is not working. can someone help me to figure out me this. Thanks in advance.
Your sample input is almost valid json. You could tweak it to make it valid and then extract the values with jq with something like: sed -e 's/squid/"squid/' -e 's/$/"}/' file.txt | jq -r .key Or, if your actual input really is valid json, then just use jq: jq -r .key file.txt If the "random-txt" may include double quotes, making it difficult to massage the input to make it valid json, perhaps you want something like: awk '{print $4}' FS='"' file.txt or sed -n '/{"key":"\([^"]*\).*/s//\1/p' file.txt or while IFS=\" read open_brace key colon val _; do echo "$val"; done < file.txt
For the shown data, you can try this awk: awk -F '"[:,]"' '{print $2}' file AJGUIGIDH568 TJHJHJHDH568 YUUUIGIDH566 HJHHIGIDH568 ATYUGUIDH556 QfgUIGIDH568
With the give example you can simple use cut -d'"' -f4 file.txt
Assumptions: there may be other lines in the file so we need to focus on just the lines with "key" and "rule" the only text between "key" and "rule" is the desired string (eg, squid never shows up between the two patterns of interest) Adding some additional lines: $ cat file.txt {"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here ignore this line} {"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here ignore this line} {"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here ignore this line} {"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here ignore this line} {"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here ignore this line} {"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here ignore this line} One sed idea: $ sed -nE 's/^(.*"key":")([^"]*)(","rule".*)$/\2/p' file.txt AJGUIGIDH568 TJHJHJHDH568 YUUUIGIDH566 HJHHIGIDH568 ATYUGUIDH556 QfgUIGIDH568 Where: -E - enable extended regex support (and capture groups without need to escape sequences) -n - suppress printing of pattern space ^(.*"key":") - [1st capture group] everything from start of line up to and including "key":" ([^"]*) - [2nd capture group] everything that is not a double quote (") (","rule".*)$ - [3rd capture group] everything from ",rule" to end of line \2/p - replace the line with the contents of the 2nd capture group and print
Unix bash - using cut to regex lines in a file, match regex result with another similar line
I have a text file: file.txt, with several thousand lines. It contains a lot of junk lines which I am not interested in, so I use the cut command to regex for the lines I am interested in first. For each entry I am interested in, it will be listed twice in the text file: Once in a "definition" section, another in a "value" section. I want to retrieve the first value from the "definition" section, and then for each entry found there find it's corresponding "value" section entry. The first entry starts with ' gl_ ', while the 2nd entry would look like ' "gl_ ', starting with a '"'. This is the code I have so far for looping through the text document, which then retrieves the values I am interested in and appends them to a .csv file: while read -r line do if [[ $line == gl_* ]] ; then (param=$(cut -d'\' -f 1 $line) | def=$(cut -d'\' -f 2 $line) | type=$(cut -d'\' -f 4 $line) | prompt=$(cut -d'\' -f 8 $line)) while read -r glline do if [[ $glline == '"'$param* ]] ; then val=$(cut -d'\' -f 3 $glline) | "$project";"$param";"$val";"$def";"$type";"$prompt" >> /filepath/file.csv done < file.txt done < file.txt This seems to throw some syntax errors related to unexpected tokens near the first 'done' statement. Example of text that needs to be parsed, and paired: gl_one\User Defined\1\String\1\\1\Some Text gl_two\User Defined\1\String\1\\1\Some Text also gl_three\User Defined\1\Time\1\\1\Datetime now some\junk "gl_one\1\Value1 some\junk "gl_two\1\Value2 "gl_three\1\Value3 So effectively, the while loop reads each line until it hits the first line that starts with 'gl_', which then stores that value (ie. gl_one) as a variable 'param'. It then starts the nested while loop that looks for the line that starts with a ' " ' in front of the gl_, and is equivalent to the 'param' value. In other words, the script should couple the lines gl_one and "gl_one, gl_two and "gl_two, gl_three and "gl_three. The text file is large, and these are settings that have been defined this way. I need to collect the values for each gl_ parameter, to save them together in a .csv file with their corresponding "gl_ values. Wanted regex output stored in variables would be something like this: first while loop: $param = gl_one, $def = User Defined, $type = String, $prompt = Some Text second while loop: $val = Value1 Then it stores these variables to the file.csv, with semi-colon separators. Currently, I have an error for the first 'done' statement, which seems to indicate an issue with the quotation marks. Apart from this, I am looking for general ideas and comments to the script. I.e, not entirely sure I am looking for the quotation mark parameters "gl_ correctly, or if the semi-colons as .csv separators are added correctly. Edit: Overall, the script runs now, but extremely slow due to the inner while loop. Is there any faster way to match the two lines together and add them to the .csv file? Any ideas and comments?
This will generate a file containing the data you want: cat file.txt | grep gl_ | sed -E "s/\"//" | sort | sed '$!N;s/\n/\\/' | awk -F'\' '{print $1"; "$5"; "$7"; "$NF}' > /filepath/file.csv It uses grep to extract all lines containing 'gl_' then sed to remove the leading '"' from the lines that contain one [I have assumed there are no further '"' in the line] The lines are sorted sed removes the return from each pair of lines awk then prints the required columns according to your requirements Output routed to the file.
LANG=C sort -t\\ -sd -k1,1 <file.txt |\ sed ' /^gl_/{ # if definition N; # append next line to buffer s/\n"gl_[^\\]*//; # if value, strip first column t; # and start next loop } D; # otherwise, delete the line ' |\ awk -F\\ -v p="$project" -v OFS=\; '{print p,$1,$10,$2,$4,$8 }' \ >>/filepath/file.csv sort lines so gl_... appears immediately before "gl_... (LANG fixes LC_TYPE) - assumes definition appears before value sed to help ensure matching definition and value (may still fail if duplicate/missing value), and tidy for awk awk to pull out relevant fields
Convert multi-line csv to single line using Linux tools
I have a .csv file that contains double quoted multi-line fields. I need to convert the multi-line cell to a single line. It doesn't show in the sample data but I do not know which fields might be multi-line so any solution will need to check every field. I do know how many columns I'll have. The first line will also need to be skipped. I don't how much data so performance isn't a consideration. I need something that I can run from a bash script on Linux. Preferably using tools such as awk or sed and not actual programming languages. The data will be processed further with Logstash but it doesn't handle double quoted multi-line fields hence the need to do some pre-processing. I tried something like this and it kind of works on one row but fails on multiple rows. sed -e :0 -e '/,.*,.*,.*,.*,/b' -e N -e '1n;N;N;N;s/\n/ /g' -e b0 file.csv CSV example First name,Last name,Address,ZIP John,Doe,"Country City Street",12345 The output I want is First name,Last name,Address,ZIP John,Doe,Country City Street,12345 Jane,Doe,Country City Street,67890 etc. etc.
First my apologies for getting here 7 months late... I came across a problem similar to yours today, with multiple fields with multi-line types. I was glad to find your question but at least for my case I have the complexity that, as more than one field is conflicting, quotes might open, close and open again on the same line... anyway, reading a lot and combining answers from different posts I came up with something like this: First I count the quotes in a line, to do that, I take out everything but quotes and then use wc: quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes If you think of a single multi-line field, knowing if the quotes are 1 or 2 is enough. In a more generic scenario like mine I have to know if the number of quotes is odd or even to know if the line completes the record or expects more information. To check for even or odd you can use the mod operand (%), in general: even % 2 = 0 odd % 2 = 1 For the first line: Odd means that the line expects more information on the next line. Even means the line is complete. For the subsequent lines, I have to know the status of the previous one. for instance in your sample text: First name,Last name,Address,ZIP John,Doe,"Country City Street",12345 You can say line 1 (John,Doe,"Country) has 1 quote (odd) what means the status of the record is incomplete or open. When you go to line 2, there is no quote (even). Nevertheless this does not mean the record is complete, you have to consider the previous status... so for the lines following the first one it will be: Odd means that record status toggles (incomplete to complete). Even means that record status remains as the previous line. What I did was looping line by line while carrying the status of the last line to the next one: incomplete=0 cat file.csv | while read line; do quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes incomplete=$((($quotes+$incomplete)%2)) # Check if Odd or Even to decide status if [ $incomplete -eq 1 ]; then echo -n "$line " >> new.csv # If line is incomplete join with next else echo "$line" >> new.csv # If line completes the record finish fi done Once this was executed, a file in your format generates a new.csv like this: First name,Last name,Address,ZIP John,Doe,"Country City Street",12345 I like one-liners as much as everyone, I wrote that script just for the sake of clarity, you can - arguably - write it in one line like: i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv I would appreciate it if you could go back to your example and see if this works for your case (which you most likely already solved). Hopefully this can still help someone else down the road... Recovering the multi-line fields Every need is different, in my case I wanted the records in one line to further process the csv to add some bash-extracted data, but I would like to keep the csv as it was. To accomplish that, instead of joining the lines with a space I used a code - likely unique - that I could then search and replace: i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l ~newline~ " || echo "$l";done >new.csv the code is ~newline~, this is totally arbitrary of course. Then, after doing my processing, I took the csv text file and replaced the coded newlines with real newlines: sed -i 's/ ~newline~ /\n/g' new.csv References: Ternary operator: https://stackoverflow.com/a/3953666/6316852 Count char occurrences: https://stackoverflow.com/a/41119233/6316852 Other peculiar cases: https://www.linuxquestions.org/questions/programming-9/complex-bash-string-substitution-of-csv-file-with-multiline-data-937179/ TL;DR Run this: i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv ... and collect results in new.csv I hope it helps!
If Perl is your option, please try the following: perl -e ' while (<>) { $str .= $_; } while ($str =~ /("(("")|[^"])*")|((^|(?<=,))[^,]*((?=,)|$))/g) { if (($el = $&) =~ /^".*"$/s) { $el =~ s/^"//s; $el =~ s/"$//s; $el =~ s/""/"/g; $el =~ s/\s+(?!$)/ /g; } push(#ary, $el); } foreach (#ary) { print /\n$/ ? "$_" : "$_,"; }' sample.csv sample.csv: First name,Last name,Address,ZIP John,Doe,"Country City Street",12345 John,Doe,"Country City Street",67890 Result: First name,Last name,Address,ZIP John,Doe,Country City Street,12345 John,Doe,Country City Street,67890
This might work for you (GNU sed): sed ':a;s/[^,]\+/&/4;tb;N;ba;:b;s/\n\+/ /g;s/"//g' file Test each line to see that it contains the correct number of fields (in the example that was 4). If there are not enough fields, append the next line and repeat the test. Otherwise, replace the newline(s) by spaces and finally remove the "'s. N.B. This may be fraught with problems such as ,'s between "'s and quoted "'s.
Try cat -v file.csv. When the file was made with Excel, you might have some luck: When the newlines in a field are a simple \n and the newline at the end is a \r\n (which will look like ^M), parsing is simple. # delete all newlines and replace the ^M with a new newline. tr -d "\n" < file.csv| tr "\r" "\n" # Above two steps with one command tr "\n\r" " \n" < file.csv When you want a space between the joined line, you need an additional step. tr "\n\r" " \n" < file.csv | sed '2,$ s/^ //' EDIT: #sjaak commented this didn't work is his case. When your broken lines also have ^M you still can be a lucky (wo-)man. When your broken field is always the first field in double quotes and you have GNU sed 4.2.2, you can join 2 lines when the first line has exactly one double quote. sed -rz ':a;s/(\n|^)([^"]*)"([^"]*)\n/\1\2"\3 /;ta' file.csv Explanation: -z don't use \n as line endings :a label for repeating the step after successful replacement (\n|^) Search after a newline or the very first line ([^"]*) Substring without a " ta Go back to label a and repeat
awk pattern matching is working. answer in one line : awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile if you'd like to drop quotes, you could use: awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile | sed 's/"//gw NewFile' but I prefer to keep it. to explain the code: /Pattern/ : find pattern in current line. ORS : indicates the output line record. $0 : indicates the whole of the current line. 's/OldPattern/NewPattern/': substitude first OldPattern with NewPattern /g : does the previous action for all OldPattern /w : write the result to Newfile
Multiline CSV: output on a single line, with double-quoted input lines, using a different separator
I'm trying to get a multiline output from a CSV into one line in Bash. My CSV file looks like this: hi,bye hello,goodbye The end goal is for it to look like this: "hi/bye", "hello/goodbye" This is currently where I'm at: INPUT=mycsvfile.csv while IFS=, read col1 col2 || [ -n "$col1" ] do source=$(awk '{print;}' | sed -e 's/,/\//g' ) echo "$source"; done < $INPUT The output is on every line and I'm able to change the , to a / but I'm not sure how to put the output on one line with quotes around it. I've tried BEGIN: source=$(awk 'BEGIN { ORS=", " }; {print;}'| sed -e 's/,/\//g' ) But this only outputs the last line, and omits the first hi/bye: hello/goodbye Would anyone be able to help me?
Just do the whole thing (mostly) in awk. The final sed is just here to trim some trailing cruft and inject a newline at the end: < mycsvfile.csv awk '{print "\""$1, $2"\""}' FS=, OFS=/ ORS=", " | sed 's/, $//'
If you're willing to install trl, a utility of mine, the command can be simplified as follows: input=mycsvfile.csv trl -R '| ' < "$input" | tr ',|' '/,' trl transforms multiline input into double-quoted single-line output separated by ,<space> by default. -R '| ' (temporarily) uses |<space> as the separator instead; this assumes that your data doesn't contain | instances, but you can choose any char. that you know not be part of your data. tr ',|' '/,' then translates all , instances (field-internal to the input lines) into / instances, and all | instances (the temporary separator) into , instances, yielding the overall result as desired. Installation of trl from the npm registry (Linux and macOS) Note: Even if you don't use Node.js, npm, its package manager, works across platforms and is easy to install; try curl -L https://git.io/n-install | bash With Node.js installed, install as follows: [sudo] npm install trl -g Note: Whether you need sudo depends on how you installed Node.js and whether you've changed permissions later; if you get an EACCES error, try again with sudo. The -g ensures global installation and is needed to put trl in your system's $PATH. Manual installation (any Unix platform with bash) Download this bash script as trl. Make it executable with chmod +x trl. Move it or symlink it to a folder in your $PATH, such as /usr/local/bin (macOS) or /usr/bin (Linux).
$ awk -F, -v OFS='/' -v ORS='"' '{$1=s ORS $1; s=", "; print} END{printf RS}' file "hi/bye", "hello/goodbye"
There is no need for a bash loop, which is invariably slow. sed and tr can do this more efficiently: input=mycsvfile.csv sed 's/,/\//g; s/.*/"&", /; $s/, $//' "$input" | tr -d '\n' s/,/\//g uses replaces all (g) , instances with / instances (escaped as \/ here). s/.*/"&", / encloses the resulting line in "...", followed by ,<space>: regex .* matches the entire pattern space (the potentially modified input line) & in the replacement string represent that match. $s/, $// removes the undesired trailing ,<space> from the final line ($) tr -d '\n' then simply removes the newlines (\n) from the result, because sed invariably outputs each line with a trailing newline. Note that the above command's single-line output will not have a trailing newline; simply append ; printf '\n' if it is needed.
In awk: $ awk '{sub(/,/,"/");gsub(/^|$/,"\"");b=b (NR==1?"":", ")$0}END{print b}' file "hi/bye", "hello/goodbye" Explained: $ awk ' { sub(/,/,"/") # replace comma gsub(/^|$/,"\"") # add quotes b=b (NR==1?"":", ") $0 # buffer to add delimiters } END { print b } # output ' file
I'm assuming you just have 2 lines in your file? If you have alternating 2 line pairs, let me know in comments and I will expand for that general case. Here is a one-line awk conversion for you: # NOTE: I am using the octal ascii code for the # double quote char (\42=") in my printf statement $ awk '{gsub(/,/,"/")}NR==1{printf("\42%s\42, ",$0)}NR==2{printf("\42%s\42\n",$0)}' file output: "hi/bye", "hello/goodbye"
Here is my attempt in awk: awk 'BEGIN{ ORS = " " }{ a++; gsub(/,/, "/"); gsub(/[a-z]+\/[a-z]+/, "\"&\""); print $0; if (a == 1){ print "," }}{ if (a==2){ printf "\n"; a = 0 } }' Works also if your Input has more than two lines.If you need some explanation feel free to ask :)
bash remove/change values from one field with a loop
I have a file where the 10th column in excel contains prices. CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5500",19.50,gomez,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50000",19.50,bieber,20160506,0,,N,E,,,,,, When it goes to csv the quotes and the comma's stay. I need to pick out the column that is surrounded by quotes - I use grep -o and then after clearing the commas, i get rid of the quotes. I can't use quotes or comma to delimit in awk because the prices get broken up into different fields. cat /tmp/wowmom | awk -F ',' '{print $10}' "5000" "75 "100 "5500" "50 "350 "50000" while read line do clean_price=$(grep -o '".*"' $line) echo "$clean_price" | tr -d',' > cleanprice1 echo "cleanprice1" | tr -d'"' > clearnprice2 done </tmp/wowmom I get errors though "No such file or directory" on the grep grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,:No such file or directory grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,:No such file or directory grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,:No such file or directory grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory I want to some way, Isolate the value within quotes with a grep -o and take out comma from the number , then use awk to take the quotes out of field 10. I am doinng this manually right now It is a suprizingly long job - there are thousands of lines on this.
You an use FPAT with gnu-awk for this: awk -v FPAT='"[^"]+",|[^,]*' '{gsub(/[",]+/, "", $10)} 1' OFS=, file CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,
You are using the wrong tool here. sed -r 's/^(([^,]+,){9})"([^,]+),?([^,]+)"/\1\3\4/' file.csv > newfile.csv The regular expression captures the first nine fields into the first back reference (and also populates the second with the last of the nine fields), the number before the separator comma in the third, and the rest of the number in the fourth, then the substitution glues them back without the skipped elements. If you have numbers with more than one thousands separator (i.e. above one million), you will need a slightly more complex script. In terms of what's wrong with your original script, the second argument to grep is the name of the file to grep, not the string to grep. You can use a here string (in Bash) or pipe the string to grep, but again, this is not how you do it properly. grep -o '"[^"]*"' <<<"$line" or printf '%s' "$line" | grep -o '"[^"]*"' Notice also the quotes -- omitting quotes are a common newbie error; you can get away with it for a while, and then it bites you.
A pure Bash solution: while IFS=\" read -r l n r; do printf '%s\n' "$l${n//,/}$r" done < input_file.txt
If you're looking for perl: #!perl use strict; use warnings; use Text::CSV; use autodie; my $csv = Text::CSV->new({binary=>1, eol=>"\n"}); my $filename = shift #ARGV; open my $fh, "<", $filename; while (my $row = $csv->getline($fh)) { $row->[9] =~ s/,//g; $csv->print(*STDOUT, $row); } close $fh; demo: $ perl csv.pl file CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,, CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,