Delimit output created using a grep command - shell

I want to run a line of code to create a Text to columns output on a file which is comma delimited.
I have created a file which results in the output been created in the format comma delimited. So all the data is in column A separated by a ,.
The file is apilist.xls
I want to add a line of code that will take the data here and separate out the data in to the columns, similar to the excel command text to columns.
I have tried:
$ cat apilist.xls | tr "\\t" "," > apilist.csv
This gives the message:
bash: $: command not found
It creates the apilist.csv file but there is no data in this.
Any suggestions on how to progress this?
$ cat apilist.xls | tr "\\t" "," > apilist.csv
Delimited output - move the data from one column to be separated into multiple columns.

You would need to convert your .xls file to .csv before you're able to work on it. Here are a number of methods to convert the file: https://linoxide.com/linux-how-to/methods-convert-xlsx-format-files-csv-linux-cli/
You could then use awk to create columns between the data where it is separated by a comma. This is assuming that you have three columns separated by a comma - you would need to edit the below command to meet your needs:
awk -F ',' '{printf("%s %s %s\n", $1, $2, $3)}' input_file

Related

Cut string of text after a character from a column each line of a csv, keeping the other columns, and printing to a new file

I have a CSV file with a first column that reads:
/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/BER5_OSSD_F008071.csv.0.01.out.csv
Followed by additional columns listing counts pulled from other CSV files.
What I want is to remove "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" from each line without affecting any other part of the file.
I've tried using sed, grep, and cut, but that only seems to print the output in the terminal or a new file only containing that part of the line, and not the rest of the columns. Can I remove the "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" and keep everything else the same?
You can use awk to get this job done.
Please see below code which will replace the contents/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/ with "" empty and will update the operation in the same file with the option inplace
yourfile.csv is the input file.
awk -i inplace '{sub(/\/Users\/swilki\/Desktop\/Africa_OSSD\/OSSD_Output\//,"")}1' yourfile.csv
The above will remove the "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" and keep everything else same.
Output of yourfile.csv:
BER5_OSSD_F008071.csv.0.01.out.csv
Option 2, If you want to print in a new file:
Below code will be give the replaced contents in the new file your_newfile.csv
awk '{sub(/\/Users\/swilki\/Desktop\/Africa_OSSD\/OSSD_Output\//,"")}1' yourfile.csv >your_newfile.csv

How to transform a csv file having multiple delimiters using awk

Below is a sample data. Please note this operation is required to be done on files with millions of records hence I need the optimal method. Essentially we are looking to update 2nd column with concatenation of first two characters from 4th column and excluding first 3 fields ('_' delimited) of 2nd column.
I have been trying using cut and reading the file line by line which is very time consuming. I need something with awk something like
awk -F, '{print $1","substr($4,1,2)"_"cut -f4-6 -d'_'($2)","$3","$4","$5","$6}'
Input Data:
234234234,123_33_3_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,123_11_2_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,123_33_3_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,123_33_3_11111_qewf_mkhsdf,01,09_68645,43234532,2
Output is required as:
234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2
You can use awk and printf for line re-formating
awk -F"[,_]" '{
printf "%s,%s_%s_%s_%s,%s,%s_%s,%s,%s\n", $1,$9,$5,$6,$7,$8,$9,$10,$11,$12
}' file
you get,
234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2

Bash extract parts from string and create csv

I have a document with 1+ million of the following strings and I like to create some new structures byextract some parts and create a csv file for it, what's the quickest way to do this?
document/0006-291X(85)91157-X
I would like to have a file with on each line the original string and the extracted parts
document/0006-291X(85)91157-X;0006-291X;85
You can try this one-liner awk:
awk -F "[/()]" -v OFS=';' '{print $0,$(NF-2),$(NF-1)}' your-file
It parses the fields of each line with taking /,(,) as delimiters. Then it prints out the whole line, the 3rd field and the second field starting from the end of the line. The option -v OFS=';' prints semicolumns as output field separator.

Using awk to split CSV file by column

I have a CSV file that I need to split by date. I've tried using the AWK code listed below (found elsewhere).
awk -F"," 'NR>1 {print $0 >> ($1 ".csv"); close($1 ".csv")}' file.csv
I've tried running this within terminal in both OS X and Debian. In both cases there's no error message (so the code seems to run properly), but there's also no output. No output files, and no response at the command line.
My input file has ~6k rows of data that looks like this:
date,source,count,cost
2013-01-01,by,36,0
2013-01-01,by,42,1.37
2013-01-02,by,7,0.12
2013-01-03,by,11,4.62
What I'd like is for a new CSV file to be created containing all of the rows for a particular date. What am I overlooking?
I've resolved this. Following the logic of this thread, I checked my line endings with the file command and learned that the file had the old-style Mac line terminators. I opened my input CSV file with Text Wrangler and saved it again with Unix style line endings. Once I did that, the awk command listed above worked as expected. It took ~5 seconds to create 63 new CSV files broken out by date.
For retrieve information in a log file with ";" separator I use:
grep "END SESSION" filename.log | cut -d";" -f2
where
-d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
-f, --fields=LIST select only these fields; also print any line
that contains no delimiter character, unless
the -s option is specified

code to identify values in comma separated value file

i want to parse a csv file in a shell script. I want to input the name of the file at the prompt. like :
somescript.sh filename
Can it be done?
Also, I will read user input to display a particular a particular data number in the csv.
For example, say the csv file has 10 values in each line:
1,2,3,4,5,6,7,8,9,10
And I want to read the 5th value. how can I do it?
And multiple lines are involved.
Thanks.
If your file is really in such a simple format (just commas, no spaces), then cut -d, -f5 would do the trick.
#!/bin/sh
awk -F, "NR==$2{print \$$3}" "$1"
Usage:
./test.sh FILENAME LINE_NUMBER FIELD_NUMBER

Resources