Bash shell script: recursively cat TXT files in folders - bash

I have a directory of files with a structure like below:
./DIR01/2019-01-01/Log.txt
./DIR01/2019-01-01/Log.txt.1
./DIR01/2019-01-02/Log.txt
./DIR01/2019-01-03/Log.txt
./DIR01/2019-01-03/Log.txt.1
...
./DIR02/2019-01-01/Log.txt
./DIR02/2019-01-01/Log.txt.1
...
./DIR03/2019-01-01/Log.txt
...and so on.
Each DIRxx directory has a number of subdirectories named by date, which themselves have a number of log files that need to be concatenated. The number of text files to concatenate varies, but could theoretically could be as many as 5. I would like to see the following command performed for each set of files within the dated directories (note that the files must be concatenated in reverse order):
cd ./DIR01/2019-01-01/
cat Log.txt.4 Log.txt.3 Log.txt.2 Log.txt.1 Log.txt > ../../Log.txt_2019-01-01_DIR01.txt
(I understand the above command will give an error that certain files do not exist, but the cat will do what I need of it anyways)
Aside from cding into each directory and running the above cat command, how can I script this into a Bash shell script?

If you just want to concatenate all files in all subdirectories whose name starts with Log.txt, you could do something like this:
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt* > Log.txt_"${date}"_"${dirname}".txt;
done
If you need the files in reverse numerical order, from 5 to 1 and then Log.txt, you can do this:
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt.{5..1} $dir/Log.txt > Log.txt_"${date}"_"${dirname}".txt;
done
That will, as you mention in your question, complain for files that don't exist, but that's just a warning. If you don't want to see that, you can redirect error output (although that might cause you to miss legitimate error messages as well):
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt.{5..1} $dir/Log.txt > Log.txt_"${date}"_"${dirname}".txt;
done 2>/dev/null

Not as comprehensive as the other, but quick and easy. Use find and sort your output however you like (-zrn is --zero-terminated --reverse --numeric-sort) then iterate over it with read.
find . -type f -print0 |
sort -zrn |
while read -rd ''; do
cat "$REPLY";
done >> log.txt

Related

Wondering how to delete the files when it's name increments?

I have a file in the dir as
file3.proto
file2.proto
file1.proto
I want to delete the file1 and file2, the highest number is the latest file that I don't want to delete. How can I achieve this in the shell script?
This below thing does the job but I want to be more dynamic. I don't want to change the shell script every time if the number increments, example if the file is 4 then I need to change 1..3.
ls | grep '.proto' | rm file{1..2}.proto
ls *.proto | head -n -1 | xargs rm
which with these files
file1.proto
file2.proto
file3.proto
executes the command
rm file1.proto file2.proto
UPDATE: Be warned that ls command outputs files in alphabetical order, which is not numerical order... I mean, if you have also a file25.proto, you'll get this output from ls:
file1.proto
file25.proto
file2.proto
file3.proto
So it should be better (if possible) to rename files like file001.proto, depending on the maximum possible number of files present in the folder. This is a common issue with file names ordering...

Script to copy directory of filenames to .txt file

This should be fairly easy and I understand the logic of it but my shell scripting is rather beginner.
Basically, I have a directory with a hundred files or so, and I want to copy their filenames to a .txt file. One line per filename. I know I'd want a loop for all the files in the directory, copy name to text file, repeat until there are no more files but not sure how to write that out in a .sh file.
(Also, just out of pure curiosity, how would I omit the file extensions? In this case, they're all the same extension but potentially in the future they may not be, and while I need the extensions right now I may not in the future. I'm assuming there might be a flag for this or would I use '.' as a delimiter to stop copying at that point?)
Thanks in advance!
It could be very easy with ls:
ls -1 [directory] > filename.txt
Note the flag -1, it tells ls to output filenames one per line regardless what the output is. Usually ls acts like ls -C if the stdout is a tty, and acts like ls -1 otherwise. Explicitly specifying this flag forces ls to output one per line.
If you want to do it manually, this is an example:
#!/bin/sh
cd [directory]
for i in *
do
echo "$i"
done > filename.txt
To omit extensions, you can use string replacement:
echo "${i%.*}"
For the first part, you can do
ls <dirname> > files.txt
I alias ls to ls -F, so to avoid any extraneous characters in the output, you would do
printf "%s\n" * > ../filename.txt
I put the output txt file in a different directory so the list of files does not include "filename.txt"
If you want to omit file extensions:
printf "%s\n" * | sed 's/\.[^.]*$//' > ../filename.txt

Sort files in directory then execute command on each one of them

I have a directory containing files numbered like this
1>chr1:2111-1111_mask.txt
1>chr1:2111-1111_mask2.txt
1>chr1:2111-1111_mask3.txt
2>chr2:345-678_mask.txt
2>chr2:345-678_mask2.txt
2>chr2:345-678_mask3.txt
100>chr19:444-555_mask.txt
100>chr19:444-555_mask2.txt
100>chr19:444-555_mask3.txt
each file contains a name like >chr1:2111-1111 in the first line and a series of characters in the second line.
I need to sort files in this directory numerically using the number before the > as guide, the execute the command for each one of the files with _mask3 and using.
I have this code
ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
It works, but when I check the list of the strings inside the output file they are like this
>chr19:444-555
>chr1:2111-1111
>chr2:345-678
why?
So... I'm not sure what "Works" here like your question stated.
It seems like you have two problems.
Your files are not in sorted order
The file names have the leading digits removed
Addressing 1, your command ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt here doesn't make a whole lot of sense. You are getting a list of files from ls, and then piping that to sort. That probably gives you the output you are looking for, but then you pipe that to for, which doesn't make any sense.
In fact you can rewrite your entire script to
for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
And you'll have the exact same output. To get this sorted you could do something like:
for f in `ls ./"$INPUT"_temp/*_mask3.txt | sort -n`
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
As for the unexpected truncation, that > character in your file name is important in your bash shell since it directs the stdout of the preceding command to a specified file. You'll need to insure that when you use variable $f from your loop that you stick quotes around that thing to keep bash from misinterpreting the file name a command > file type of thing.

How do I write a bash script to copy files into a new folder based on name?

I have a folder filled with ~300 files. They are named in this form username#mail.com.pdf. I need about 40 of them, and I have a list of usernames (saved in a file called names.txt). Each username is one line in the file. I need about 40 of these files, and would like to copy over the files I need into a new folder that has only the ones I need.
Where the file names.txt has as its first line the username only:
(eg, eternalmothra), the PDF file I want to copy over is named eternalmothra#mail.com.pdf.
while read p; do
ls | grep $p > file_names.txt
done <names.txt
This seems like it should read from the list, and for each line turns username into username#mail.com.pdf. Unfortunately, it seems like only the last one is saved to file_names.txt.
The second part of this is to copy all the files over:
while read p; do
mv $p foldername
done <file_names.txt
(I haven't tried that second part yet because the first part isn't working).
I'm doing all this with Cygwin, by the way.
1) What is wrong with the first script that it won't copy everything over?
2) If I get that to work, will the second script correctly copy them over? (Actually, I think it's preferable if they just get copied, not moved over).
Edit:
I would like to add that I figured out how to read lines from a txt file from here: Looping through content of a file in bash
Solution from comment: Your problem is just, that echo a > b is overwriting file, while echo a >> b is appending to file, so replace
ls | grep $p > file_names.txt
with
ls | grep $p >> file_names.txt
There might be more efficient solutions if the task runs everyday, but for a one-shot of 300 files your script is good.
Assuming you don't have file names with newlines in them (in which case your original approach would not have a chance of working anyway), try this.
printf '%s\n' * | grep -f names.txt | xargs cp -t foldername
The printf is necessary to work around the various issues with ls; passing the list of all the file names to grep in one go produces a list of all the matches, one per line; and passing that to xargs cp performs the copying. (To move instead of copy, use mv instead of cp, obviously; both support the -t option so as to make it convenient to run them under xargs.) The function of xargs is to convert standard input into arguments to the program you run as the argument to xargs.

Massive rename of files but keep the same sorting

I have a lot of files in a folder with the same extension (e.g .vtk) and I am using a bash script to massive rename them with sequencial numbers.
Here is the script i use:
n=0;
for file in *.vtk; do
${file} 100_${n}.vtk;
n=$((n+1));
done
After the script's execution, all the files are rename like:
100_1.vtk
100_2.vtk
.
.
.
My problem is that I want to keep the sorting of files exactly the same as it was before. For example, if i had two sequential files named something.vtk and something_else.vtk, I want them after the renaming process, to correspond to 100_1.vtk and 100_2.vtk respectively.
Can you change your for loop from this:
for file in *.vtk; do
to this:
for file in $(ls -1 *.vtk | sort); do
If your filename don't contain spaces, this should work.
You can use sort -kX.Y! X refers to the column and Y to the character.
So, something like following should be fine:
$ ls | sort -k1.5

Resources