Creating individual files from awk output - bash

I am trying to use a script to create several files inside a directory but am struggling with creating separate files for the output from awk.
the awk command is as follows:
$ awk '{ if ($3>=20 && $3<==30 && $5=="Sequenced") print $6 }' ./foo.txt
This produces an example output of:
cat
dog
mouse
What I want to do is to create 3 files from this output that would be say cat.txt, dog.txt, mouse.txt inside a directory ./animals. I would appreciate any help I can get with this.

$ awk '{ if ($3>=20 && $3<==30 && $5=="Sequenced") print $6".txt" }' ./foo.txt|xargs touch
using xargs you can transfer your output to argument list to another command, for example touch. It will creates empty files (or updated modification time if file exists)

You have one extra = in $3<==30. Fixed below.
awk '$3>=20 && $3<=30 && $5=="Sequenced"{ #No need for if here
file="./animals/"$6".txt" #Build the file name
printf "">file #Creates the empty file
close(file)}' ./foo.txt #Close the stream
One-liner:
awk '$3>=20&&$3<=30&&$5=="Sequenced"{f="./animals/"$6".txt";printf "">f;close(f)}' ./foo.txt

Related

Prevent awk from interpeting variable contents?

I'm attempting to parse a make -n output to make sure only the programs I want to call are being called. However, awk tries to interpret the contents of the output and run (?) it. Errors are something like awk: fatal: Cannot find file 'make'. I have gotten around this by saving the output as a temporary file and then reading that into awk. However, I'm sure there's a better way; any suggestions?
EDIT: I'm using the output later in my script and would like to avoid saving a file to increase speed if possible.
Here's what isn't working:
my_input=$(make -n file)
my_lines=$(echo $my_input | awk '/bin/ { print $1 }') #also tried printf and cat
Here's what works but obviously takes longer than it has to because of writing the file:
make -n file > temp
my_lines=$(awk '/bin/ { print $1 }' temp)
Many thanks for your help!
You can directly parse the output when it is generated by the following command and save the result in a file.
make -n file | grep bin > result.out
If you really want to go for an overkill awk solution, change your second line in the following way:
my_lines="$(awk '/bin/ { print }' temp)"

Mac Terminal Bash awk change multiple file names to $NF output

I have been working on this script to retrieve files from all the folders in my directory and trying to change their names to my desired output.
Before filename:
Folder\actors\character\hair\haircurly1.dds
After filename:
haircurly1.dds
I am working with over 12,000 textures with different names that I extracted from an archive. My extractor included the path to the folder where it extracted the files in each file name. For example, a file that should have been named haircurly1.dds was named Folder\actors\character\hair\haircurly1.dds during extraction.
cd ~/Desktop/MainFolder/Folder
find . -name '*\\*.dds' | awk -F\\ '{ print; print $NF; }'
This code retrieves every texture file that I am looking at containing backslashes (as I have already changed some of the files with other codes, however I want one that will change all of the files at once rather than me having to write specific codes for every folder for 12,000+ texture files)
I use print; and it sends me the file path:
./Folder\actors\character\hair\haircurly1.dds
I use print $NF; and it sends me the text after the awk separator:
\
haircurly1.dds
I would like every file name that this script runs through to be changed to the $NF output of the awk command. Anyone know how I can make my script change the file names to their $NF output?
Thank you
Your question isn't clear but it SOUNDS like all you want to do is:
for file in *\\*; do
mv -- "$file" "${file##*\\}"
done
If that's not all you want then edit your question to clarify your requirements.
Have your awk command format and print a "mv" command, and pipe the result to bash. The extra single-quoting ensures bash treats backslash as a normal char.
find . -name '*\\*.dds' | awk -F\\ '{print "mv '\''" $0 "'\'' " $NF}' | bash -x
hth

How to store the output of AWK command in a separate directory?

I have splited a file into multiple text files using below command -
awk '{print $2 > $1"_npsc.txt"}' complete.txt
I want to store all the output generated text files to another directory. How I can achieve this ? Please help.
You could do something like:
awk '{print $2 > "path/to/directory"$1"_npsc.txt"}' complete.txt
Just make sure that you create the director first (and replace path/to/directory with a the path that you like)

How to use awk to split a file and store each filename in a Bash array

Input
A file called input_file.csv, which has 7 columns, and n rows.
Example header and row:
Date Location Team1 Team2 Time Prize_$ Sport
2016 NY Raptors Gators 12pm $500 Soccer
Output
n files, where the rows in each new file are grouped based on their values in column 7 of the original file. Each file is named after that shared value from column 7. Note: each file will have the same header. (The script currently does this.)
Example: if 2 rows in the original file had golf as their value for column 7, they would be grouped together in a file called golf.csv. If 3 other rows shared soccer as their value for column 7, they would be found in soccer.csv.
An array that has the name of each generated file in it. This array lives outside of the scope of awk. (This is what I need help with.)
Example: Array = [golf.csv, soccer.csv]
Situation
The following script produces the desired output. However, I want to run another script on each of the newly generated files and I don't know how.
Question:
My idea is to store the names of each new file in an array. That way, I can loop through the array and do what I want to each file. The code below passes a variable called array into awk, but I don't know how to add the name of each file to the array.
#!/bin/bash
ARRAY=()
awk -v myarray="$ARRAY" -F"\",\"" 'NR==1 {header=$0}; NF>1 && NR>1 {if(! files[$7]) {print header >> ("" $7 ".csv"); files[$7]=1}; print $0 >> ("" $7 ".csv"); close("" $7 ".csv");}' input_file.csv
for i in "${ARRAY[#]}"
do
:
echo $i
done
Rather than struggling to get awk to fill your shell array variable, why not:
make sure that the *.csv files are created in a clean directory
use globbing to loop over all *.csv files in that directory?
awk -F'","' ... # your original Awk command
for i in *.csv # use globbing to loop over resulting *.csv files
do
:
echo $i
done
Just off the top of my head, untested because you haven't supplied very much sample data, what about this?
#!/usr/bin/awk -f
FNR==1 {
header=$0
next
}
! $7 in files {
files[$7]=sprintf("sport-%s.csv", $7)
print header > file
}
{
files[$7]=sprintf("sport-%s.csv", $7)
}
{
print > files[$7]
}
END {
printf("declare -a sportlist=( ")
for (sport in files) {
printf("\"%s\"", sport)
}
printf(" )\n");
}
The idea here is that we store sport names in the array files[], and build filenames out of that array. (You can format the filename inside sprintf() as you see fit.) We step through the file, adding a header line whenever we get a new sport with no recorded filename. Then for non-headers, print to the file based on the sport name.
For your second issue, exporting the array back to something outside of awk, the END block here will output a declare line which can be interpreted by bash. IF you feel lucky, you can eval this awk script inside command expansion, and the declare command will effectively be interpreted by your shell:
eval $(/path/to/awkscript inputfile.csv)
Or, if you subscribe to the school of thought that consiers eval to be evil, you can redirect the awk script's standard output to a temporary file which you source:
/path/to/awkscript inputfile.csv > /tmp/yadda.$$
. /tmp/yadda.$$
(Don't use this temp file, make a real one with mktemp or the like.)
There's no way for any program to modify the environment of the parent shell. Just have the awk script output the names of the files as standard output, and use command substitution to put them in an array.
filesArray=($(awk ... ))
If the files might have spaces in them, you need a different solution; assuming you're on bash 4, you can just be sure to print each file on a separate line and use readarray:
readarray filesArray < <( awk ... )
if the files might have newlines in them, too, then things get tricky...
if your file is not large, you can run another script to get the unique $7 elements, for example
$ awk 'NR>1&&!a[$7]++{print $7}' sports
will print the values, you can change it to your file name format as well, such as
$ awk 'NR>1&&!a[$7]++{print tolower($7)".csv"}' sports
this then can be piped to your other process, here for example to wc
$ awk ... sports | xargs wc
This will do what I THINK you want:
oIFS="$IFS"; IFS=$'\n'
array=( $(awk '{out=$7".csv"; print > out} !seen[out]++{print out}' input_file.csv) )
IFS="$oIFS"
If your input file really is comma-separated instead of space-separated as you show in the sample input in your question then adjust the awk script to suit (You might want to look at GNU awk and FPAT).
If you don't have GNU awk then you'll need to add a bit more code to close the open output files as you go.
The above will fail if you have file names that contain newlines but will be fine for blank chars or other white space.

how to write finding output to same file using awk command

awk '/^nameserver/ && !modif { printf("nameserver 127.0.0.1\n"); modif=1 } {print}' testfile.txt
It is displaying output but I want to write the output to same file. In my example testfile.txt.
Not possible per se. You need a second temporary file because you can't read and overwrite the same file. Something like:
awk '(PROGRAM)' testfile.txt > testfile.tmp && mv testfile.tmp testfile.txt
The mktemp program is useful for generating unique temporary file names.
There are some hacks for avoiding a temporary file, but they rely mostly on caching and read buffers and quickly get unstable for larger files.
Since GNU Awk 4.1.0, there is the "inplace" extension, so you can do:
$ gawk -i inplace '{ gsub(/foo/, "bar") }; { print }' file1 file2 file3
To keep a backup copy of original files, try this:
$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{ gsub(/foo/, "bar") }
> { print }' file1 file2 file3
This can be used to simulate the GNU sed -i feature.
See: Enabling In-Place File Editing
Despite the fact that using a temp file is correct, I don't like it because :
you have to be sure not to erase another temp file (yes you can use mktemp - it's a pretty usefull tool)
you have to take care of deleting it (or moving it like thiton said) INCLUDING when your script crash or stop before the end (so deleting temp files at the end of the script is not that wise)
it generate IO on disk (ok not that much but we can make it lighter)
So my method to avoid temp file is simple:
my_output="$(awk '(PROGRAM)' source_file)"
echo "$my_output" > source_file
Note the use of double quotes either when grabbing the output from the awk command AND when using echo (if you don't, you won't have newlines).
Had to make an account when seeing 'awk' and 'not possible' in one sentence. Here is an awk-only solution without creating a temporary file:
awk '{a[b++]=$0} END {for(c=1;c<=b;c++)print a[c]>ARGV[1]}' file
You can also use sponge from moreutils.
For example
awk '!a[$0]++' file|sponge file
removes duplicate lines and
awk '{$2=10*$2}1' file|sponge file
multiplies the second column by 10.
Try to include statement in your awk file so that you can find the output in a new file. Here total is a calculated value.
print $total, total >> "new_file"
This inline writing worked for me. Redirect the output from print back to the original file.
echo "1" > test.txt
awk '{$1++; print> "test.txt"}' test.txt
cat test.txt
#$> 2

Resources