Using awk to read and create files in all subdirectories - bash

I am trying to parse all files named "README" in all subdirectories (and sub-subdirectories) under my specified directory, and create a new file containing the parsed output in the same directory where each "README" file was found.
#!/bin/bash
FILES=$(find myDirectory -type f -name 'README')
for f in $FILES
do
#this is fine
echo "parsing $f"
#this is not fine
awk -F, 'BEGIN {print "header"};
{print $2;}
END {print "footer";}' $f > outputfile
done
The output file is only being created in my working directory. What I would like this to do is to perhaps redirect the output files into the subdirectories where their corresponding README's were found. Is there a better way than this?
If it helps, README format:
something,something2,something3
nothing1,nothing2,nothing3

Given that you want the output file created in the directory where the README was found, the simplest way is to use the POSIX standard dirname
command:
#!/bin/bash
FILES=$(find myDirectory -type f -name 'README')
for f in $FILES
do
outputfile="$(dirname "$f")/outputfile"
echo "parsing $f into $outputfile"
awk -F, 'BEGIN {print "header"}
{print $2}
END {print "footer"}' "$f" > "$outputfile"
done
This code is not safe if there are spaces or newlines in the directories, but assuming you stick with the portable file name character set (letters, digits, dot, dash and underscore), there'll be no major problems. (It wasn't safe before I made any changes; it still isn't safe. It isn't safe because you used FILES=$(find …) and while you do that, it is pretty much guaranteed to remain unsafe for names with blanks, tabs, newlines in them. There are ways to fix that, but they involve more major surgery.)
If you want, you can study the Bash parameter expansion mechanisms to see how to do it without using dirname.

Related

How to rename a CSV file from a value in the CSV file

I have 100 1-line CSV files. The files are currently labeled AAA.txt, AAB.txt, ABB.txt (after I used split -l 1 on them). The first field in each of these files is what I want to rename the file as, so instead of AAA, AAB and ABB it would be the first value.
Input CSV (filename AAA.txt)
1234ABC, stuff, stuff
Desired Output (filename 1234ABC.csv)
1234ABC, stuff, stuff
I don't want to edit the content of the CSV itself, just change the filename
something like this should work:
for f in ./* ; do new_name=$(head -1 $f | cut -d, -f1); cp $f dir/$new_name
move them into a new dir just in case something goes wrong, or you need the original file names.
starting with your original file before splitting
$ awk -F, '{print > ($1".csv")}' originalFile.csv
and do all in one shot.
This will store the whole input file into the colum1.csv of the inputfile.
awk -F, '{print $0 > $1".csv" }' aaa.txt
In a terminal, changed directory, e.g. cd /path/to/directory that the files are in and then use the following compound command:
for f in *.txt; do echo mv -n "$f" "$(awk -F, '{print $1}' "$f").cvs"; done
Note: There is an intensional echo command that is there for you to test with, and it will only print out the mv command for you to see that it's the outcome you wish. You can then run it again removing just echo from the compound command to actually rename the files as desired via the mv command.

Rename files to new naming convention in bash

I have a directory of files with names formatted like
01-Peterson#2x.png
15-Consolidated#2x.png
03-Brady#2x.png
And I would like to format them like
PETERSON.png
CONSOLIDATED.png
BRADY.png
But my bash scripting skills are pretty weak right now. What is the best way to go about this?
Edit: my bash version is 3.2.57(1)-release
This will work for files that contains spaces (including newlines), backslashes, or any other character, including globbing chars that could cause a false match on other files in the directory, and it won't remove your home file system given a particularly undesirable file name!
for old in *.png; do
new=$(
awk 'BEGIN {
base = sfx = ARGV[1]
sub(/^.*\./,"",sfx)
sub(/^[^-]+-/,"",base)
sub(/#[^#.]+\.[^.]+$/,"",base)
print toupper(base) "." sfx
exit
}' "$old"
) &&
mv -- "$old" "$new"
done
If the pattern for all your files are like the one you posted, I'd say you can do something as simple as running this on your directory:
for file in `ls *.png`; do new_file=`echo $file | awk -F"-" '{print $2}' | awk -F"#" '{n=split($2,a,"."); print toupper($1) "." a[2]}'`; mv $file $new_file; done
If you fancy learning other solutions, like regexes, you can also do:
for file in `ls *.png`; do new_file=`echo $file | sed "s/.*-//g;s/#.*\././g" | tr '[:lower:]' '[:upper:]'`; mv $file $new_file; done
Testing it, it does for example:
mv 01-Peterson#2x.png PETERSON.png
mv 02-Bradley#2x.png BRADLEY.png
mv 03-Jacobs#2x.png JACOBS.png
mv 04-Matts#1x.png MATTS.png
mv 05-Jackson#4x.png JACKSON.png

Applying awk pattern to all files with same name, outputting each to a new file

I'm trying to recursively find all files with the same name in a directory, apply an awk pattern to them, and then output to the directory where each of those files lives a new updated version of the file.
I thought it was better to use a for loop than xargs, but I don't exactly how to make this work...
for f in $(find . -name FILENAME.txt );
do awk -F"\(corr\)" '{print $1,$2,$3,$4}' ./FILENAME.txt > ./newFILENAME.txt $f;
done
Ultimately I would like to be able to remove multiple strings from the file at once using -F, but also not sure how to do that using awk.
Also is there a way to remove "(cor*)" where the * represents a wildcard? Not sure how to do while keeping with the escape sequence for the parentheses
Thanks!
To use (corr*) as a field separator where * is a glob-style wildcard, try:
awk -F'[(]corr[^)]*[)]' '{print $1,$2,$3,$4}'
For example:
$ echo '1(corr)2(corrTwo)3(corrThree)4' | awk -F'[(]corr[^)]*[)]' '{print $1,$2,$3,$4}'
1 2 3 4
To apply this command to every file under the current directory named FILENAME.txt, use:
find . -name FILENAME.txt -execdir sh -c 'awk -F'\''[(]corr[^)]*[)]'\'' '\''{print $1,$2,$3,$4}'\'' "$1" > ./newFILENAME.txt' Awk {} \;
Notes
Don't use:
for f in $(find . -name FILENAME.txt ); do
If any file or directory has whitespace or other shell-active characters in it, the results will be an unpleasant surprise.
Handling both parens and square brackets as field separators
Consider this test file:
$ cat file.txt
1(corr)2(corrTwo)3[some]4
To eliminate both types of separators and print the first four columns:
$ awk -F'[(]corr[^)]*[)]|[[][^]]*[]]' '{print $1,$2,$3,$4}' file.txt
1 2 3 4

How to apply the same awk action to all the files in a folder?

I had written an awk code for deleting all the lines ending in a colon from a file. But now I want to run this particular awk action on a whole folder containing similar files.
awk '!/:$/' qs.txt > fin.txt
awk '{print $3 " " $4}' fin.txt > out.txt
You could wrap your awk command in a loop in your shell such as bash.
myfiles=mydirectory/*.txt
for file in $myfiles
do
b=$(basename "$file" .txt)
awk '!/:$/' "$b.txt" > "$b.out"
done
EDIT: improved quoting as commenters suggested
If you like it better, you can use "${file%.txt}" instead of $(basename "$file" .txt).
Aside: My own preference runs to basename just because man basename is easier for me than man -P 'less -p "^ Param"' bash (when that is the relevant heading on the particular system). Please accept this quirk of mine and let's not discuss info and http://linux.die.net/man/ and whatever.
You could use sed. Just run the below command on the directory in which the files you want to change was actually stored.
sed -i '/:$/d' *.*
This will create new files in an empty directory, with the same name.
mkdir NEWFILES
for file in `find . -name "*name_pattern*"`
do
awk '!/:$/' $file > fin.txt
awk '{print $3 " " $4}' fin.txt > NEWFILES/$file
done
After that you just need to
cp -fr NEWFILES/* .

How to print all file names in a folder with awk or bash?

I would like to print all file names in a folder.How can I do this with awk or bash?
ls -l /usr/bin | awk '{ print $NF }'
:)
find . -type f -maxdepth 1
exclude the -maxdepth part if you want to also do it recursively for subdirectories
Following is a pure-AWK option:
gawk '
BEGINFILE {
print "Processing " FILENAME
};
' *
It may be helpful if you want to use it as part of a bigger AWK script processing multiple files and you want to log which file is currently being processed.
This command will print all the file names:
for f in *; do echo "$f"; done
or (even shorter)
printf '%s\n' *
Alternatively, if you like to print specific file types (e.g., .txt), you can try this:
for f in *.txt; do echo "$f"; done
or (even shorter)
printf '%s\n' *.txt
/bin/ls does this job for you and you may call it from bash.
$> /bin/ls
[.. List of files..]
Interpreting your question you might be interested in iterating over every single file in this directory. This can be done using bash as well:
for f in `ls`; do
echo $f;
done
for f in *; do var=`find "$f" | wc -l`; echo "$f: $var"; done
This will print name of the directory and number of files in it. wc -l here returns count of files +1 (Including directory)
Sample output:
aa: 4
bb: 4
cc: 1
test2.sh: 1

Resources