using awk in command line to add column to end of csv

using awk in command line to add column to end of csv - shell

I have downloaded this CSV file: https://www.nhsbsa.nhs.uk/sites/default/files/2020-05/Dispensing%20Data%20Jan%2020%20-%20CSV.csv
and am trying to add a column on to the end with a value of "null" for every row.
I have tried using this awk command:
awk 'BEGIN{FS=OFS=","}{print $0 OFS "null"}' ogfile.csv > newfile.csv
but it appears to be adding a new row after every row, with the second column having a field of "null"
new rows instead of new column
can anyone help me understand why this is happening?

Your source file has DOS/Windows line endings. When one sees anomalous output, this is a good first item to check. Two solutions:
Use a utility such as dos2unix to remove the unwanted \r character from your input file. dos2unix is available on most distributions.
or,
Modify your awk command to recognize and remove the offending characters:
awk 'BEGIN{RS="\r\n"; FS=OFS=","}{print $0 OFS "null"}' ogfile.csv

Related

Cut string of text after a character from a column each line of a csv, keeping the other columns, and printing to a new file

I have a CSV file with a first column that reads:
/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/BER5_OSSD_F008071.csv.0.01.out.csv
Followed by additional columns listing counts pulled from other CSV files.
What I want is to remove "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" from each line without affecting any other part of the file.
I've tried using sed, grep, and cut, but that only seems to print the output in the terminal or a new file only containing that part of the line, and not the rest of the columns. Can I remove the "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" and keep everything else the same?

You can use awk to get this job done.
Please see below code which will replace the contents/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/ with "" empty and will update the operation in the same file with the option inplace
yourfile.csv is the input file.
awk -i inplace '{sub(/\/Users\/swilki\/Desktop\/Africa_OSSD\/OSSD_Output\//,"")}1' yourfile.csv
The above will remove the "/Users/swilki/Desktop/Africa_OSSD/OSSD_Output/" and keep everything else same.
Output of yourfile.csv:
BER5_OSSD_F008071.csv.0.01.out.csv
Option 2, If you want to print in a new file:
Below code will be give the replaced contents in the new file your_newfile.csv
awk '{sub(/\/Users\/swilki\/Desktop\/Africa_OSSD\/OSSD_Output\//,"")}1' yourfile.csv >your_newfile.csv

How could I put these lines in range format?

I have a text file with 826,838 lines. Text file looks like this (sorry, couldn't get the image uploader to work).
I'm using sed (sed -n '2p;$p') to print the second and last line but can't figure out how to put the lines in range format.
Current output:
1 3008.00 7380.00 497724.00 3158482.00 497724.00 3158482.00
826838 4744.00 7409.00 480729.00 3207718.00 480729.00 3207718.00
Desired output:
1-826838 3008.00-4744.00 7380.00-7409.00 497724.00-480729.00 3158482.00-3207718.00 497724.00-480729.00 3158482.00-3207718.00
Thank you for your help!

This might work for you (GNU sed):
sed -r '2H;$!d;H;x;:a;s/\n\s*(\S+)\s*(.*\n)\s*(\S+\s*)/\1-\3\n\2/;ta;P;d' file
Store line 2 and the last line in the hold space (HS). Following the last line, swap to the HS and then repeatedly move the first fields of the second and third lines to the first line. Finally print the first line only.

With single awk expression (will get the needed lines and make the needed ranges):
awk 'NR==2{ split($0,a) }END{ for(i=1;i<=NF;i++) printf("%s\t",a[i]"-"$i); print "" }' file
The output:
1-826838 3008.00-4744.00 7380.00-7409.00 497724.00-480729.00 3158482.00-3207718.00 497724.00-480729.00 3158482.00-3207718.00

Bash script - remove lines by looking ahead

I have a csv file where some rows have an empty first field, and some rows have content in the first field. The rows with content in the first field are header rows.
I would like to remove every unnecessary header row. The best way I can see of doing this is by deleting every row for which:
First field is not empty
First field in the following row is not empty
I do not necessarily need to keep the data in the same file, so I can see this being possible using grep, awk, or sed, but none of my attempts have come close to working.
Example input:
header1,value1,etc
,value2,etc
header2,value3,etc
header3,value4,etc
,value5,etc
Desired output:
header1,value1,etc
,value2,etc
header3,value4,etc
,value5,etc
Since the header2 line is not followed by a line with an empty field 1, it is an unnecessary header row.

awk -F, '$1{h=$0;next}h{print h;h=""}1' file
-F,: Use comma as a field separator
$1{h=$0;next}: If the first field has data ( other than 0 ), save the line and go on to the next line.
h{print h;h=""}1: If there is a saved header line, print it and forget it. (This can only execute if there is nothing in $1 because of the next above.)
1: print the current line.

These kind of tasks are often conceptually easier by reversing the file and checking if the previous line is a header:
tac file |
awk -F, '$1 && have_header {next} {print; have_header = length($1)}' |
tac

Bash extract parts from string and create csv

I have a document with 1+ million of the following strings and I like to create some new structures byextract some parts and create a csv file for it, what's the quickest way to do this?
document/0006-291X(85)91157-X
I would like to have a file with on each line the original string and the extracted parts
document/0006-291X(85)91157-X;0006-291X;85

You can try this one-liner awk:
awk -F "[/()]" -v OFS=';' '{print $0,$(NF-2),$(NF-1)}' your-file
It parses the fields of each line with taking /,(,) as delimiters. Then it prints out the whole line, the 3rd field and the second field starting from the end of the line. The option -v OFS=';' prints semicolumns as output field separator.

Using awk to split CSV file by column

I have a CSV file that I need to split by date. I've tried using the AWK code listed below (found elsewhere).
awk -F"," 'NR>1 {print $0 >> ($1 ".csv"); close($1 ".csv")}' file.csv
I've tried running this within terminal in both OS X and Debian. In both cases there's no error message (so the code seems to run properly), but there's also no output. No output files, and no response at the command line.
My input file has ~6k rows of data that looks like this:
date,source,count,cost
2013-01-01,by,36,0
2013-01-01,by,42,1.37
2013-01-02,by,7,0.12
2013-01-03,by,11,4.62
What I'd like is for a new CSV file to be created containing all of the rows for a particular date. What am I overlooking?

I've resolved this. Following the logic of this thread, I checked my line endings with the file command and learned that the file had the old-style Mac line terminators. I opened my input CSV file with Text Wrangler and saved it again with Unix style line endings. Once I did that, the awk command listed above worked as expected. It took ~5 seconds to create 63 new CSV files broken out by date.

For retrieve information in a log file with ";" separator I use:
grep "END SESSION" filename.log | cut -d";" -f2
where
-d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
-f, --fields=LIST select only these fields; also print any line
that contains no delimiter character, unless
the -s option is specified

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

using awk in command line to add column to end of csv - shell

Related

Cut string of text after a character from a column each line of a csv, keeping the other columns, and printing to a new file

How could I put these lines in range format?

Bash script - remove lines by looking ahead

Bash extract parts from string and create csv

Using awk to split CSV file by column

Categories

Resources