Bash 4.4.0
Ubuntu 16.04
I have several columns in a CSV file that are all capital letters and some are lowercase. Some columns have only one word while others may have 50 words. At this time, I convert column by column with 2 commands and it is quite taxing on the server when the file has 50k lines.
Example:
#-- Place the header line in a temp file
head -n 1 "$tmp_input1" > "$tmp_input3"
#-- Remove the header line in orginal file
tail -n +2 "$tmp_input1" > "$tmp_input1-temp" && mv "$tmp_input1-temp" "$tmp_input1"
#-- Change the words in the 11th column to lower case then change the first leter to upper case
awk -F"," 'BEGIN{OFS=","} {$11 = tolower($11); print}' "$tmp_input4" > "$tmp_input5"
sed -i "s/\b\(.\)/\u\1/g" "$tmp_input5"
#-- Change the words in the 12th column to lower case then change the first leter to upper case
awk -F"," 'BEGIN{OFS=","} {$12 = tolower($12); print}' "$tmp_input5" > "$tmp_input6"
sed -i "s/\b\(.\)/\u\1/g" "$tmp_input6"
#-- Change the words in the 13th column to lower case then change the first leter to upper case
awk -F"," 'BEGIN{OFS=","} {$13 = tolower($13); print}' "$tmp_input6" > "$tmp_input7"
sed -i "s/\b\(.\)/\u\1/g" "$tmp_input7"
cat "$tmp_input7" >> "$tmp_input3"
Is it possible to do multiple columns in a single command?
Here is an example of the csv file:
"dealer_id","vin","conditon","stocknumber","make","model","year","broken","trim","bodystyle","color","interiorcolor","interiorfabric","engine","enginedisplacement","engineaspiration","engineText","transmission","drivetrain","mpgcity","mpghighway","mileage","cylinders","fuelconditon","optiontext","description","titlestatus","warranty","price","specialprice","window_sticker_price","mirrorhangerprice","images","ModelCode","PackageCodes"
"JOHNVANC04A","2C4RC1N73JR290946","N","JR290946","Chrysler","Pacifica","2018","","Hybrid Limited FWD","Mini-van, Passenger","Brilliant BLACK Crystal PEARL Coat","","..LEATHER SEATS..","V6 Cylinder Engine","3.6L","","","AUTOMATIC","FWD","0","0","553","6","H","..1-SPEED A/T..,..AUTO-OFF HEADLIGHTS..,..BACK-UP CAMERA..,..COOLED DRIVER SEAT..,..CRUISE CONTROL..","======KEY FEATURES INCLUDE: . LEATHER SEATS. THIRD ROW SEAT. QUAD BUCKET SEATS. REAR AIR. HEATED DRIVER SEAT.","","0","41680","","48830","","http://i.autoupktech.com/c640/9c40231cbcfa4ef89425d108e4e3a410.jpg",http://i.autoupnktech.com/c640/9c40231cbcfa4ef89425d108e4e3a410.jpg","RUES53","AAX,AT2,DFQ,EH3,GWM,WPU"
Here's a snippet of the above columns refined
Column 11 should be - "Brilliant Black Crystal Pearl Coat"
Column 13 should be - "Leather Seats"
Column 16 should be - "Automatic"
Column 23 should be - "1-Speed A/T,Auto-Off Headlights,Back-up Camera"
Column 24 should be - "Key Features Include: Leather Seats,Third Row Seat"
Keep in mind, the double-quotes surrounding the columns can't be stripped. I only need to convert certain columns and not the entire file. Here's an example of the columns 11, 13, 16, 23 and 24 converted.
"Brilliant Black Crystal Pearl Coat","Leather Seats","Automatic","1-Speed A/T,Auto-Off Headlights,Back-up Camera","Key Features Include: Leather Seats,Third Row Seat"
Just to add another option, here is a one liner using just sed:
sed -i -e 's/.*/\L&/' -e 's/[a-z]*/\u&/g' filename
And here is a proof of concept:
$ cat testfile
jUSt,a,LONG,list of SOME,RAnDoM WoRDs
ANother LIne
OneMore,LiNe
$ sed -e 's/.*/\L&/' -e 's/[a-z]*/\u&/g' testfile
Just,A,Long,List Of Some,Random Words
Another Line
Onemore,Line
$
If you want to convert just the headers of the CSV file (first line), just replace s with 1s on both search patterns.
You can find an excellent article explaining the magic here: sed – Convert to Title Case.
Here is another alternative (off-topic here, I know) in Python 3:
import csv
from pathlib import Path
infile = Path('infile.csv')
outfile = Path('outfile.csv')
titled_cols = [10, 12, 15, 22, 23]
titled_data = []
with infile.open() as fin, outfile.open('w', newline='') as fout:
for row in csv.reader(fin, quoting=csv.QUOTE_ALL):
for i,col in enumerate(row):
if i in titled_cols:
col = col.title()
titled_data.append(row)
csv.writer(fout, quoting=csv.QUOTE_ALL).writerows(titled_data)
Just define the columns you want to be title cased on titled_cols (columns have zero based indexes) and it will do what you want.
I guess infile and outfile are self-explanatory and outfile will contain the modified version of your original file.
I hope it helps.
You could create a user-defined function and apply it to the columns you need to modify.
awk -F, 'function toproper(s) { return toupper(substr(s, 1, 1)) tolower(substr(s, 2, length(s))) } {printf("%s,%s,%s,%s\n", toproper($1), toproper($2), toproper($3), toproper($4));}'
Input:
FOO,BAR,BAZ,ETC
Output:
Foo,Bar,Baz,Etc
Assuming the fields of the csv file are not quoted by double quotes,
meaning that we can simply split a record on commas and whitespaces, how
about a Perl solution:
perl -pe 's/(^|(?<=[,\s]))([^,\s])([^,\s]*)((?=[,\s])|$)/\U$2\L$3/g' input.csv
input.csv:
Bash,4.4.0,Ubuntu,16.04
I have several columns in a CSV file,that, are, all capital letters
and some are lowercase.
Some columns have only,one,word,while others may have 50 words.
output:
Bash,4.4.0,Ubuntu,16.04
I Have Several Columns In A Csv File,That, Are, All Capital Letters
And Some Are Lowercase.
Some Columns Have Only,One,Word,While Others May Have 50 Words.
This version uses AWK to do the job:
This is the command (change file to your filename)
awk -F"," 'BEGIN{OFS=","}{ for (i=1; i<=NF; i++) { $i=toupper(substr($i,1,1))""tolower(substr($i,2,length($i)))}print $0}' file | awk -F" " 'BEGIN{OFS=" "} { for (i=1; i<=NF; i++) { $i=toupper(substr($i,1,1))""substr($i,2,length($i))}print $0}'
The test:
cat file
pepe is cool,ASDASD ASDAS,and no podpoiaops
awk -F"," 'BEGIN{OFS=","}{ for (i=1; i<=NF; i++) { $i=toupper(substr($i,1,1))""tolower(substr($i,2,length($i)))}print $0}' file | awk -F" " 'BEGIN{OFS=" "} { for (i=1; i<=NF; i++) { $i=toupper(substr($i,1,1))""substr($i,2,length($i))}print $0}'
Pepe Is Cool,Asdasd Asdas,And No Podpoiaops
Explanation
BEGIN{OFS=","} tells awk how to outuput the line.
The for statement uses NF, the built in internal variable for the
number of fields for each line
The substr divide and change the first letter of the field, and it's assigned to its line value again
All row is printed print $0
Finally, the second awk divides the lines created on the first example, but this time dividing with spaces as separator. This way, It detects all different words on the file, and changes every first Character of them.
Hope it helps
I have a requirement to print the first string of a line if last 5 strings match specific input.
Example: Specified input is 2
India;1;2;3;4;5;6
Japan;1;2;2;2;2;2
China;2;2;2;2
England;2;2;2;2;2
Expected Output:
Japan
England
As you can see, China is excluded as it doesn't meet the requirement (last 5 digits have to be matched with the input).
grep ';2;2;2;2;2$' file | cut -d';' -f1
$ in a regex stands for "end of line", so grep will print all the lines that end in the given string
-d';' tells cut to delimit columns by semicolons
-f1 outputs the first column
You could use awk:
awk -F';' -v v="2" -v count=5 '
{
c=0;
for(i=2;i<=NF;i++){
if($i == v) c++
if(c>=count){print $1;next}
}
}' file
where
v is the value to match
count is the maximum number of value to print the wanted string
the for loop is parsing all fields delimited with a ; in order to find a match
This script doesn't need the 5 values 2 to be consecutive.
With sed:
sed -n 's/^\([^;]*\).*;2;2;2;2;2$/\1/p' file
It captures and output non ; first characters in lines ending with ;2;2;2;2;2
It can be shortened with GNU sed to:
sed -nE 's/^([^;]*).*(;2){5}$/\1/p' file
awk -F\; '/;2;2;2;2;2$/{print $1}' file
Japan
England