I have a CSV file with a first column that reads:
/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/BER5_OSSD_F008071.csv.0.01.out.csv
Followed by additional columns listing counts pulled from other CSV files.
What I want is to remove "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" from each line without affecting any other part of the file.
I've tried using sed, grep, and cut, but that only seems to print the output in the terminal or a new file only containing that part of the line, and not the rest of the columns. Can I remove the "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" and keep everything else the same?
You can use awk to get this job done.
Please see below code which will replace the contents/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/ with "" empty and will update the operation in the same file with the option inplace
yourfile.csv is the input file.
awk -i inplace '{sub(/\/Users\/swilki\/Desktop\/Africa_OSSD\/OSSD_Output\//,"")}1' yourfile.csv
The above will remove the "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" and keep everything else same.
Output of yourfile.csv:
BER5_OSSD_F008071.csv.0.01.out.csv
Option 2, If you want to print in a new file:
Below code will be give the replaced contents in the new file your_newfile.csv
awk '{sub(/\/Users\/swilki\/Desktop\/Africa_OSSD\/OSSD_Output\//,"")}1' yourfile.csv >your_newfile.csv
Below is a sample data. Please note this operation is required to be done on files with millions of records hence I need the optimal method. Essentially we are looking to update 2nd column with concatenation of first two characters from 4th column and excluding first 3 fields ('_' delimited) of 2nd column.
I have been trying using cut and reading the file line by line which is very time consuming. I need something with awk something like
awk -F, '{print $1","substr($4,1,2)"_"cut -f4-6 -d'_'($2)","$3","$4","$5","$6}'
Input Data:
234234234,123_33_3_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,123_11_2_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,123_33_3_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,123_33_3_11111_qewf_mkhsdf,01,09_68645,43234532,2
Output is required as:
234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2
You can use awk and printf for line re-formating
awk -F"[,_]" '{
printf "%s,%s_%s_%s_%s,%s,%s_%s,%s,%s\n", $1,$9,$5,$6,$7,$8,$9,$10,$11,$12
}' file
you get,
234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2
I have a document with 1+ million of the following strings and I like to create some new structures byextract some parts and create a csv file for it, what's the quickest way to do this?
document/0006-291X(85)91157-X
I would like to have a file with on each line the original string and the extracted parts
document/0006-291X(85)91157-X;0006-291X;85
You can try this one-liner awk:
awk -F "[/()]" -v OFS=';' '{print $0,$(NF-2),$(NF-1)}' your-file
It parses the fields of each line with taking /,(,) as delimiters. Then it prints out the whole line, the 3rd field and the second field starting from the end of the line. The option -v OFS=';' prints semicolumns as output field separator.
I have a CSV file that I need to split by date. I've tried using the AWK code listed below (found elsewhere).
awk -F"," 'NR>1 {print $0 >> ($1 ".csv"); close($1 ".csv")}' file.csv
I've tried running this within terminal in both OS X and Debian. In both cases there's no error message (so the code seems to run properly), but there's also no output. No output files, and no response at the command line.
My input file has ~6k rows of data that looks like this:
date,source,count,cost
2013-01-01,by,36,0
2013-01-01,by,42,1.37
2013-01-02,by,7,0.12
2013-01-03,by,11,4.62
What I'd like is for a new CSV file to be created containing all of the rows for a particular date. What am I overlooking?
I've resolved this. Following the logic of this thread, I checked my line endings with the file command and learned that the file had the old-style Mac line terminators. I opened my input CSV file with Text Wrangler and saved it again with Unix style line endings. Once I did that, the awk command listed above worked as expected. It took ~5 seconds to create 63 new CSV files broken out by date.
For retrieve information in a log file with ";" separator I use:
grep "END SESSION" filename.log | cut -d";" -f2
where
-d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
-f, --fields=LIST select only these fields; also print any line
that contains no delimiter character, unless
the -s option is specified
i want to parse a csv file in a shell script. I want to input the name of the file at the prompt. like :
somescript.sh filename
Can it be done?
Also, I will read user input to display a particular a particular data number in the csv.
For example, say the csv file has 10 values in each line:
1,2,3,4,5,6,7,8,9,10
And I want to read the 5th value. how can I do it?
And multiple lines are involved.
Thanks.
If your file is really in such a simple format (just commas, no spaces), then cut -d, -f5 would do the trick.
#!/bin/sh
awk -F, "NR==$2{print \$$3}" "$1"
Usage:
./test.sh FILENAME LINE_NUMBER FIELD_NUMBER