Awk only works on final file - bash
I am attempting to process many .csv files using the following loop:
for i in *.csv
do
dos2unix $i
awk '{print $0, FILENAME}' $i>new_edit.csv
done
This script shoud append the file name, to the end of each file, and it works. However, looking at the output new_edit.csv only contains data from one of the .csv files entered.
wc -l new_edit.csv
Indicates that my awk is only processing lines from one of my csv files. How can I make my awk process every file?
Instead of using > you should use >> as appending redirector. You could also replace the whole code with:
$ awk '{sub(/\r/,"",$NF); print $0, FILENAME}' *.csv > new_edit.csv
Following program should help you:
since you were using the redirect operator > , which was always overriding the content in file. if we replace it with append redirect oprerator >>, it would process all files and append the content in new file
#!/bin/bash
for i in *.csv
do
awk '{print $0, FILENAME}' $i>>new_edit.csv
done
Related
AWK remove blank lines and append empty columns to all csv files in the directory
Hi I am looking for a way to combine all the below commands together. Remove blank lines in the csv file (comma delimited) Add multiple empty columns to each line up to 100th column Perform action 1 & 2 on all the files in the folder I am still learning and this is the best I could get: awk '!/^[[:space:]]*$/' x.csv > tmp && mv tmp x.csv awk -F"," '($100="")1' OFS="," x.csv > tmp && mv tmp x.csv They work out individually but I don't know how how to put them together and I am looking for ways to have it run through all the files under the directory. Looking for concrete AWK code or shell script calling AWK. Thank you! An example input would be: a,b,c x,y,z Expected output would be: a,b,c,,,,,,,,,, x,y,z,,,,,,,,,,
you can combine in one script without any loops $ awk 'BEGIN{FS=OFS=","} FNR==1{close(f); f=FILENAME".updated"} NF{$100=""; print > f}' files... it won't overwrite the original files.
You can pipe the output of the first to the other: awk '!/^[[:space:]]*$/' x.csv | awk -F"," '($100="")1' OFS="," > new_x.csv If you wanted to run the above on all the files in your directory, you would do: shopt -s nullglob for f in yourdirectory/*.csv; do awk '!/^[[:space:]]*$/' "${f}" | awk -F"," '($100="")1' OFS="," > new_"${f}" done The shopt -s nullglob is so that an empty directory won't give you a literal *. Quoted from a good source for about looping through files
With recent enough GNU awk you could: $ gawk -i inplace 'BEGIN{FS=OFS=","}/\S/{NF=100;$1=$1;print}' * Explained: $ gawk -i inplace ' # using GNU awk and in-place file editing BEGIN { FS=OFS="," # set delimiters to a comma } /\S/ { # gawk specific regex operator that matches any character that is not a space NF=100 # set the field count to 100 which truncates fields above it $1=$1 # edit the first field to rebuild the record to actually get the extra commas print # output records }' * Some test data (the first empty record is empty, the second empty record has a space and a tab, trust me bro): $ cat file 1,2,3 1,2,3,4,5,6, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101 Output of cat file after the execution of the GNU awk program: 1,2,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1,2,3,4,5,6,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
How to combine awk and sed in while read line from text file to pull parts and rearrange the output
I have text files that have a source path + filename and the destination path. What I need is to pull the destination path then add just the filename from the line then add a system command to it. I am nesting a while loop within a for loop to crawl through a directory of text files to first stage files then get the hash using digest then write the results to a text file. Each line in the text file looks like this. /folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/ I can get the destination path or the file name but it is giving me fits trying to get them together. I need it to combine into /folder/folder/folder/xxxxx/file.jpg. Then I need to add a stage command, stage /folder/folder/folder/xxxxx/file.jpg this gets path; for file in ls 10*.txt; do cat $file | awk '{print $2}'; done And this gets the file name; for file in ls 10*.txt; do TIF=`cat $file | awk '{print $6}' FS=/`; echo $TIF; done But when I try to combine them using awk, sed, cut or anything esle I can Google, it only pulls the first one in the statement.
Assuming that your input file has tab separated fields and there are no space chars in any of your file/path data, try this, echo "/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/" \ | awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' output stage /folder/folder/folder/xxxxx/file.jpg This will then work with awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file Review the output to be sure all files will be processed correctly. If so, you can then pass the output to bash and all files will be processed (staged?), i.e. awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file | bash IHTH
You can use sed with the delimiter #. First match the last word (string without slash) before the whitespace, it will be stored in \1. Store the path (after the whitespace) in \2. echo '/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/' | sed -r 's#.*/([^/]*)\s+(.*)#stage \2/\1#'
Using awk to extract specific line from all text files in a directory
I have a folder with 50 text files and I want to extract the first line from each of them at the command line and output this to a result.txt file. I'm using the following command within the directory that contains the files I'm working with: for files in *; do awk '{if(NR==1) print NR, $0}' *.txt; done > result.txt When I run the command, the result.txt file contains 50 lines but they're all from a single file in the directory rather than one line per file. The common appears to be looping over a single 50 times rather than over each of the 50 files. I'd be grateful if someone could help me understand where I'm going wrong with this.
try this - for i in *.txt;do head -1 $i;done > result.txt OR for files in *.txt;do awk 'NR==1 {print $0}' $i;done > result.txt
Your code has two problems: You have an outer loop that iterates over *, but your loop body doesn't use $files. That is, you're invoking awk '...' *.txt 50 times. This is why any output from awk is repeated 50 times in result.txt. Your awk code checks NR (the number of lines read so far), not FNR (the number of lines read within the current file). NR==1 is true only at the beginning of the very first file. There's another problem: result.txt is created first, so it is included among *.txt. To avoid this, give it a different name (one that doesn't end in .txt) or put it in a different directory. A possible fix: awk 'FNR==1 {print NR, $0}' *.txt > result
Why not use head? For example with find: find midir/ -type f -exec head -1 {} \; >> result.txt If you want to follow your approach you need to specify the file and not use the wildcard with awk: for files in *; do awk '{if(NR==1) print NR, $0}' "$files"; done > result.txt
awk overwriting files in a loop
I am trying to look through a set of files. There are 4-5 files for each month in a 2 year period with 1000+ stations in them. I am trying to separate them so that I have one file per station_no (station_no = $1). I thought this was easy and simply went with; awk -F, '{ print > $1".txt" }' *.csv which I've tested with one file and it works fine. However, when I run this it creates the .txt files, but there is nothing in the files. I've now tried to put it in a loop and see if that works; #!/bin/bash #program to extract stations from orig files for file in $(ls *.csv) do awk -F, '{print > $1".txt" }' $file done It works as it loops through the files etc, but it keeps overwriting the when it moves to the next month. How do I stop it overwriting and just adding to the end of the .txt with that name?
You are saying print > file, which truncates on every new call. Use >> instead, so that it appends to the previous content. Also, there is no need to loop through all the files and then call awk for each one. Instead, provide the set of files to awk like this: awk -F, '{print >> ($1".txt")}' *.csv Note, however, that we need to talk a little about how awk keeps files opened for writing. If you say awk '{print > "hello.txt"}' file, awk will keep hello.txt file opened until it finishes processing. In your current approach, awk stops on every file; however, in my current suggested approach the file is open until the last file is processed. Thus, in this case a single > suffices: awk -F, '{print > $1".txt"}' *.csv For the detail on ( file ), see below comments by Ed Morton, I cannot explain it better than him :)
How do I write an awk print command in a loop?
I would like to write a loop creating various output files with the first column of each input file, respectively. So I wrote for i in $(\ls -d /home/*paired.isoforms.results) do awk -F"\t" {print $1}' $i > $i.transcript_ids.txt done As an example if there were 5 files in the home directory named A_paired.isoforms.results B_paired.isoforms.results C_paired.isoforms.results D_paired.isoforms.results E_paired.isoforms.results I would like to print the first column of each of these files into a seperate output file, i.e. I would like to have 5 output files called A.transcript_ids.txt B.transcript_ids.txt C.transcript_ids.txt D.transcript_ids.txt E.transcript_ids.txt or any other name as long as it is 5 different names and I can still link them back to the original files. I understand, that there is a problem with the double usage of $ in both the awk and the loop command, but I don't know how to change that. Is it possible to write a command like this in a loop?
This should do the job: for file in /home/*paired.isoforms.results do base=${file##*/} base=${base%%_*} awk -F"\t" '{print $1}' $file > $base.transcript_ids.txt done I assume that there can be spaces in the first field since you set the delimiter explicitly to tab. This runs awk once per file. There are ways to do it running awk once for all files, but I'm not convinced the benefit is significant. You could consider using cut instead of awk '{print $1}', too. Note that using ls as you did is less satisfactory than using globbing directly; it runs foul of file names with oddball characters (spaces, tabs, etc) in the name.
You can do that entirely in awk: awk -F"\t" '{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; print $1 > out}' *_paired.isoforms.results If your input files don't have names as indicated in the question, you'd have to split on something else ( as well as use a different pattern match for the input files ). My original answer is actually doing extra name resolution every time something is printed. Here's a version that only updates the output filename when FILENAME changes: awk -F"\t" 'FILENAME!=lf{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; lf=FILENAME} {print $1 > out}' *_paired.isoforms.results