Read file names from directory in Bash - bash

I need to write a script that reads all the file names from a directory and then depending on the file name, for example if it contains R1 or R2, it will concatenates all the file names that contain, for example R1 in the name.
Can anyone give me some tip how to do this?
The only thing I was able to do is:
#!/bin/bash
FILES="path to the files"
for f in $FILES
do
cat $f
done
and this only shows me that the variable FILE is a directory not the files it has.

To make the smallest change that fixes the problem:
dir="path to the files"
for f in "$dir"/*; do
cat "$f"
done
To accomplish what you describe as your desired end goal:
shopt -s nullglob
dir="path to the files"
substrings=( R1 R2 )
for substring in "${substrings[#]}"; do
cat /dev/null "$dir"/*"$substring"* >"${substring}.out"
done
Note that cat can take multiple files in one invocation -- in fact, if you aren't doing that, you usually don't need to use cat at all.

Simple hack:
ls -al R1 | awk '{print $9}' >outputfilenameR1
ls -al R2 | awk '{print $9}' >outputfilenameR2

Your expectation that
for f in $FILES
will loop over all the file names which are stored in the directory defined by the variable FILES was disappointed by the fact that you had observed that the value of FILES was the only item processed in the for loop.
In order to create a list of files out of the value pointing to a directory it is necessary to provide a pattern for file names which if applied to the file system will give a list of found directory and file names upon evaluation of the pattern by using $FILES.
This can be done by appending of /* to the directory pattern string stored in the variable FILES which is then used to be evaluated to a list of file names using the $-character as directive for the shell to evaluate the value stored in FILES and replace $FILES with a list of found files. The pure * after /* guarantees that all entries in the directory are returned, so the list will contain not only files but also sub-directories if there are any.
In other words if you change the assignment to:
FILES="path to the files/*"
the script will then behave like you have expected it.

Related

Bash shell script: recursively cat TXT files in folders

I have a directory of files with a structure like below:
./DIR01/2019-01-01/Log.txt
./DIR01/2019-01-01/Log.txt.1
./DIR01/2019-01-02/Log.txt
./DIR01/2019-01-03/Log.txt
./DIR01/2019-01-03/Log.txt.1
...
./DIR02/2019-01-01/Log.txt
./DIR02/2019-01-01/Log.txt.1
...
./DIR03/2019-01-01/Log.txt
...and so on.
Each DIRxx directory has a number of subdirectories named by date, which themselves have a number of log files that need to be concatenated. The number of text files to concatenate varies, but could theoretically could be as many as 5. I would like to see the following command performed for each set of files within the dated directories (note that the files must be concatenated in reverse order):
cd ./DIR01/2019-01-01/
cat Log.txt.4 Log.txt.3 Log.txt.2 Log.txt.1 Log.txt > ../../Log.txt_2019-01-01_DIR01.txt
(I understand the above command will give an error that certain files do not exist, but the cat will do what I need of it anyways)
Aside from cding into each directory and running the above cat command, how can I script this into a Bash shell script?
If you just want to concatenate all files in all subdirectories whose name starts with Log.txt, you could do something like this:
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt* > Log.txt_"${date}"_"${dirname}".txt;
done
If you need the files in reverse numerical order, from 5 to 1 and then Log.txt, you can do this:
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt.{5..1} $dir/Log.txt > Log.txt_"${date}"_"${dirname}".txt;
done
That will, as you mention in your question, complain for files that don't exist, but that's just a warning. If you don't want to see that, you can redirect error output (although that might cause you to miss legitimate error messages as well):
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt.{5..1} $dir/Log.txt > Log.txt_"${date}"_"${dirname}".txt;
done 2>/dev/null
Not as comprehensive as the other, but quick and easy. Use find and sort your output however you like (-zrn is --zero-terminated --reverse --numeric-sort) then iterate over it with read.
find . -type f -print0 |
sort -zrn |
while read -rd ''; do
cat "$REPLY";
done >> log.txt

Comparing two directories to produce output

I am writing a Bash script that will replace files in folder A (source) with folder B (target). But before this happens, I want to record 2 files.
The first file will contain a list of files in folder B that are newer than folder A, along with files that are different/orphans in folder B against folder A
The second file will contain a list of files in folder A that are newer than folder B, along with files that are different/orphans in folder A against folder B
How do I accomplish this in Bash? I've tried using diff -qr but it yields the following output:
Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ
I've also tried this
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
but it doesn't give me the scope of results I require. The struggle here is that the data isn't in the correct format, I just want files not directories to show in the text files e.g:
conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
List of files in directory B (new/) that are newer than directory A (old/):
find new -newermm old
This merely runs find and examines the content of new/ as filtered by -newerXY reference with X and Y both set to m (modification time) and reference being the old directory itself.
Files that are missing in directory B (new/) but are present in directory A (old/):
A=old B=new
diff -u <(find "$B" |sed "s:$B::") <(find "$A" |sed "s:$A::") \
|sed "/^+\//!d; s::$A/:"
This sets variables $A and $B to your target directories, then runs a unified diff on their contents (using process substitution to locate with find and remove the directory name with sed so diff isn't confused). The final sed command first matches for the additions (lines starting with a +/), modifies them to replace that +/ with the directory name and a slash, and prints them (other lines are removed).
Here is a bash script that will create the file:
#!/bin/bash
# Usage: bash script.bash OLD_DIR NEW_DIR [OUTPUT_FILE]
# compare given directories
if [ -n "$3" ]; then # the optional 3rd argument is the output file
OUTPUT="$3"
else # if it isn't provided, escape path slashes to underscores
OUTPUT="${2////_}-newer-than-${1////_}"
fi
{
find "$2" -newermm "$1"
diff -u <(find "$2" |sed "s:$2::") <(find "$1" |sed "s:$1::") \
|sed "/^+\//!d; s::$1/:"
} |sort > "$OUTPUT"
First, this determines the output file, which either comes from the third argument or else is created from the other inputs using a replacement to convert slashes to underscores in case there are paths, so for example, running as bash script.bash /usr/local/bin /usr/bin would output its file list to _usr_local_bin-newer-than-_usr_bin in the current working directory.
This combines the two commands and then ensures they are sorted. There won't be any duplicates, so you don't need to worry about that (if there were, you'd use sort -u).
You can get your first and second files by changing the order of arguments as you invoke this script.

Script to copy directory of filenames to .txt file

This should be fairly easy and I understand the logic of it but my shell scripting is rather beginner.
Basically, I have a directory with a hundred files or so, and I want to copy their filenames to a .txt file. One line per filename. I know I'd want a loop for all the files in the directory, copy name to text file, repeat until there are no more files but not sure how to write that out in a .sh file.
(Also, just out of pure curiosity, how would I omit the file extensions? In this case, they're all the same extension but potentially in the future they may not be, and while I need the extensions right now I may not in the future. I'm assuming there might be a flag for this or would I use '.' as a delimiter to stop copying at that point?)
Thanks in advance!
It could be very easy with ls:
ls -1 [directory] > filename.txt
Note the flag -1, it tells ls to output filenames one per line regardless what the output is. Usually ls acts like ls -C if the stdout is a tty, and acts like ls -1 otherwise. Explicitly specifying this flag forces ls to output one per line.
If you want to do it manually, this is an example:
#!/bin/sh
cd [directory]
for i in *
do
echo "$i"
done > filename.txt
To omit extensions, you can use string replacement:
echo "${i%.*}"
For the first part, you can do
ls <dirname> > files.txt
I alias ls to ls -F, so to avoid any extraneous characters in the output, you would do
printf "%s\n" * > ../filename.txt
I put the output txt file in a different directory so the list of files does not include "filename.txt"
If you want to omit file extensions:
printf "%s\n" * | sed 's/\.[^.]*$//' > ../filename.txt

Sort files in directory then execute command on each one of them

I have a directory containing files numbered like this
1>chr1:2111-1111_mask.txt
1>chr1:2111-1111_mask2.txt
1>chr1:2111-1111_mask3.txt
2>chr2:345-678_mask.txt
2>chr2:345-678_mask2.txt
2>chr2:345-678_mask3.txt
100>chr19:444-555_mask.txt
100>chr19:444-555_mask2.txt
100>chr19:444-555_mask3.txt
each file contains a name like >chr1:2111-1111 in the first line and a series of characters in the second line.
I need to sort files in this directory numerically using the number before the > as guide, the execute the command for each one of the files with _mask3 and using.
I have this code
ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
It works, but when I check the list of the strings inside the output file they are like this
>chr19:444-555
>chr1:2111-1111
>chr2:345-678
why?
So... I'm not sure what "Works" here like your question stated.
It seems like you have two problems.
Your files are not in sorted order
The file names have the leading digits removed
Addressing 1, your command ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt here doesn't make a whole lot of sense. You are getting a list of files from ls, and then piping that to sort. That probably gives you the output you are looking for, but then you pipe that to for, which doesn't make any sense.
In fact you can rewrite your entire script to
for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
And you'll have the exact same output. To get this sorted you could do something like:
for f in `ls ./"$INPUT"_temp/*_mask3.txt | sort -n`
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
As for the unexpected truncation, that > character in your file name is important in your bash shell since it directs the stdout of the preceding command to a specified file. You'll need to insure that when you use variable $f from your loop that you stick quotes around that thing to keep bash from misinterpreting the file name a command > file type of thing.

How do I save the outputs to a folder using awk or bash?

This is the continuation of my previous question subtract the values of two columns using awk or bash.
I have 200 files. I would like to save the outputs from each file to a folder.The file names in this folder should be the name of parent files. How can I do this with Awk or Bash?
Simple for loop will do the job:
for i in *; do mkdir "$i"_dir; awk -f your_awk_script.awk "$i" > "$i"_dir/out
Explanation: here this command uses for, which loops over all files in current directory, then it creates a directory with name based on i-th file name using mkdir command.

Resources