AWK remove blank lines and append empty columns to all csv files in the directory - bash
Hi I am looking for a way to combine all the below commands together.
Remove blank lines in the csv file (comma delimited)
Add multiple empty columns to each line up to 100th column
Perform action 1 & 2 on all the files in the folder
I am still learning and this is the best I could get:
awk '!/^[[:space:]]*$/' x.csv > tmp && mv tmp x.csv
awk -F"," '($100="")1' OFS="," x.csv > tmp && mv tmp x.csv
They work out individually but I don't know how how to put them together and I am looking for ways to have it run through all the files under the directory.
Looking for concrete AWK code or shell script calling AWK.
Thank you!
An example input would be:
a,b,c
x,y,z
Expected output would be:
a,b,c,,,,,,,,,,
x,y,z,,,,,,,,,,
you can combine in one script without any loops
$ awk 'BEGIN{FS=OFS=","} FNR==1{close(f); f=FILENAME".updated"} NF{$100=""; print > f}' files...
it won't overwrite the original files.
You can pipe the output of the first to the other:
awk '!/^[[:space:]]*$/' x.csv | awk -F"," '($100="")1' OFS="," > new_x.csv
If you wanted to run the above on all the files in your directory, you would do:
shopt -s nullglob
for f in yourdirectory/*.csv; do
awk '!/^[[:space:]]*$/' "${f}" | awk -F"," '($100="")1' OFS="," > new_"${f}"
done
The shopt -s nullglob is so that an empty directory won't give you a literal *. Quoted from a good source for about looping through files
With recent enough GNU awk you could:
$ gawk -i inplace 'BEGIN{FS=OFS=","}/\S/{NF=100;$1=$1;print}' *
Explained:
$ gawk -i inplace ' # using GNU awk and in-place file editing
BEGIN {
FS=OFS="," # set delimiters to a comma
}
/\S/ { # gawk specific regex operator that matches any character that is not a space
NF=100 # set the field count to 100 which truncates fields above it
$1=$1 # edit the first field to rebuild the record to actually get the extra commas
print # output records
}' *
Some test data (the first empty record is empty, the second empty record has a space and a tab, trust me bro):
$ cat file
1,2,3
1,2,3,4,5,6,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
Output of cat file after the execution of the GNU awk program:
1,2,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
Related
Extracting lines from 2 files using AWK just return the last match
Im a bit new using AWK and im trying to print lines in a file1 that a specific field exists in a file2. I copied exactly examples that I found here but i dont know why its just printing only the last match of the file1. File1 58000 72518 94850 File2 58000;123;abc 69982;456;rty 94000;576;ryt 94850;234;wer 84850;576;cvb 72518;345;ert Result Expected 58000;123;abc 94850;234;wer 72518;345;ert What Im getting 94850;234;wer awk -F';' 'NR==FNR{a[$1]++; next} $1 in a' file1 file2 What im doing wrong?
awk (while usable here), isn't the correct tool for the job. grep with the -f option is. The -f file option will read the patterns from file one per-line and search the input file for matches. So in your case you want: $ grep -f file1 file2 58000;123;abc 94850;234;wer 72518;345;ert (note: I removed the trailing '\' from the data file, replace it if it wasn't a typo) Using awk If you did want to rewrite what grep is doing using awk, that is fairly simple. Just read the contents of file1 into an array and then for processing records from the second file, just check if field-1 is in the array, if so, print the record (default action), e.g. $ awk -F';' 'FNR==NR {a[$1]=1; next} $1 in a' file1 file2 58000;123;abc 94850;234;wer 72518;345;ert (same note about the trailing slash)
Thanks #RavinderSingh13! The file1 really had some hidden characters and I could see it using cat. $ cat -v file1 58000^M 72518^M 94850^M I removed using sed -e "s/\r//g" file1 and the AWK worked perfectly.
Command to remove all but select columns for each file in unix directory
I have a directory with many files in it and want to edit each file to only contain a select few columns. I have the following code which will only print the first column for i in /directory_path/*.txt; do awk -F "\t" '{ print $1 }' "$i"; done but if I try to edit each file by adding >'$I' as below then I lose all the information in my files for i in /directory_path/*.txt; do awk -F "\t" '{ print $1 }' "$i" > "$i"; done However I want to be able to remove all but a select few columns in each file for example 1 and 3.
Given: cat file 1 2 3 4 5 6 You can do in place editing with sed: sed -i.bak -E 's/^([^[:space:]]*).*/\1/' file cat file 1 4 If you want freedom to work with multiple columns and have in place editing, use GNU awk that also supports in place editing: gawk -i inplace '{print $1, $3}' file cat file 1 3 4 6 If you only have POSIX awk or wanted to use cut you generally do this: Modify the file with awk, cut, sed, etc Redirect the output to a temp file Rename the temp file back to the original file name. Like so: awk '{print $1, $3}' file >tmp_file; mv tmp_file file Or with cut: cut -d ' ' -f 1,3 file >tmp_file; mv tmp_file file To do a loop on files in a directory, you would do: for fn in /directory_path/*.txt; do awk -F '\t' '{ print $1 }' "$fn" >tmp_file mv tmp_file "$fn" done
Just to add a little more to #dawg's perfectly well working answer according to my use case. I was dealing with CSVs, and standard CSV can have , in some values as long as it's in double quotes like for example, the below-mentioned row will be a valid CSV row. col1,col2,col2 1,abc,"abc, inc" But the command above was treating the , between the double quotes as delimiter too. Also, the output file delimiter wasn't specified in the command. These are the modifications I had to make for it handle the above two problems: for fn in /home/ubuntu/dir/*.csv; do awk -F ',' '{ FPAT = "([^,]*)|(\"[^\"]+\")"; OFS=","; print $1,$2 }' "$fn" >tmp_file mv tmp_file "$fn" done The OSF delimiter will be the diameter of the output/result file. The FPAT handles the case of , between quotation mark. The regex and the information for that is mentioned ins awk's official documentation in section 4.7 Defining Fields by Content. I was led to that solution through this answer.
Using awk to extract specific line from all text files in a directory
I have a folder with 50 text files and I want to extract the first line from each of them at the command line and output this to a result.txt file. I'm using the following command within the directory that contains the files I'm working with: for files in *; do awk '{if(NR==1) print NR, $0}' *.txt; done > result.txt When I run the command, the result.txt file contains 50 lines but they're all from a single file in the directory rather than one line per file. The common appears to be looping over a single 50 times rather than over each of the 50 files. I'd be grateful if someone could help me understand where I'm going wrong with this.
try this - for i in *.txt;do head -1 $i;done > result.txt OR for files in *.txt;do awk 'NR==1 {print $0}' $i;done > result.txt
Your code has two problems: You have an outer loop that iterates over *, but your loop body doesn't use $files. That is, you're invoking awk '...' *.txt 50 times. This is why any output from awk is repeated 50 times in result.txt. Your awk code checks NR (the number of lines read so far), not FNR (the number of lines read within the current file). NR==1 is true only at the beginning of the very first file. There's another problem: result.txt is created first, so it is included among *.txt. To avoid this, give it a different name (one that doesn't end in .txt) or put it in a different directory. A possible fix: awk 'FNR==1 {print NR, $0}' *.txt > result
Why not use head? For example with find: find midir/ -type f -exec head -1 {} \; >> result.txt If you want to follow your approach you need to specify the file and not use the wildcard with awk: for files in *; do awk '{if(NR==1) print NR, $0}' "$files"; done > result.txt
Shell script copying all columns of text file instead of specified ones
I trying to copy 3 columns from one text file and paste them into a new text file. However, whenever I execute this script, all of the columns in the original text file get copied. Here is the code I used: cut -f 1,2,6 PROFILES.1.0.profile > compiledfile.txt paste compiledfile.txt > myNewFile Any suggestions as to what I'm doing wrong? Also, is there a simpler way to do this? Thanks!
Let's suppose that the input is comma-separated: $ cat File 1,2,3,4,5,6,7 a,b,c,d,e,f,g We can extract columns 1, 2, and 6 using cut: $ cut -d, -f 1,2,6 File 1,2,6 a,b,f Note the use of option -d, to specify that the column separator is a comma. By default, cut uses a tab as the column separator. If the separator in your file is anything else, you must use the -d option.
Using awk awk -vFS=your_delimiter_here -vOFS=your_delimiter_here 'print $1,$2,$6' PROFILES.1.0.profile > compiledfile.txt should do it. For comma separated fields the solution would be awk -vFS=, -vOFS=, '{print $1,$2,$6}' PROFILES.1.0.profile > compiledfile.txt FS is an awk builtin variable which stands for field-separator. Similarly OFS stands for output-field-separator. And the handy -v option with awk helps you assign a value to variable.
You could use awk to this. awk -F "delimiter" ' { print $1,$2 ,$3 #Where $1,$2 and so are column numbers }' filename > newfile
How to copy a .c file to a numbered listing
I simply want to copy my .c file into a line-numbered listing file. Basically generate a .prn file from my .c file. I'm having a hard time finding the right bash command to do so.
Do you mean nl? nl -ba filename.c The -ba means to number all lines, not just non-empty ones.
awk '{print FNR ":" $0}' file1 file2 ... is one way. FNR is FileNumberRecord (the current line number per file). You can change the ":" per your needs. $0 means "the-whole-line-of-input" Or you can do cat -n file1 file2 .... IHTH
On my linux system, I occasionally use pr -tn to prefix line numbers for listings. The -t option suppresses headers and footers; -n says to prefix line numbers. -n allows optional format and digit specifiers; see man page. Anyhow, to print file xyz.c to xyz.prn with line numbering, use: pr -tn xyz.c > xyz.prn Note, this is not as compact and handy as cat -n xyz.c > xyz.prn (using cat -n as suggested in a previous answer); but pr has numerous other options, and I most often use it when I want to both number the lines and put them into multiple columns or print multiple files side by side. Eg for a 2-column numbered listing use: pr -2 -tn xyz.c > xyz.prn
I think shellter has the right idea. However, if your require output written to files with prn extensions, here's one way: awk '{ sub(/\.c$/, "", FILENAME); print FNR ":" $0 > FILENAME ".prn" }' file1.c file2.c ... To perform this on all files in the present working directory: for i in *.c; do awk '{ sub(/\.c$/, "", FILENAME); print FNR ":" $0 > FILENAME ".prn" }' "$i"; done