bash extract list file and group them into another file - bash

I have a file, dynamically populated, containing some dates, one per line, like this:
20190807
20190806
20190805
20190804
I created a script to read the file line by line, and extract a list of files present into another directory:
FILEMASTER="lista_master"
while IFS= read -r line
do
ls -tr /var/home/test/*_"$line"_*.jpg | head -n2 >> lista_test
done < "$FILEMASTER"
This script is ok, creating a single file (lista_test), containing the last two .jpg file for every date. Output sample:
/var/home/test/MAN_20190804_jolly1.jpg
/var/home/test/CAT_20190804_selly2.jpg
/var/home/test/RET_20190805_jolly1.jpg
/var/home/test/GES_20190805_angyt2.jpg
/var/home/test/TOR_20190806_jolly1.jpg
/var/home/test/GIL_20190806_gally2.jpg
/var/home/test/POE_20190807_frity1.jpg
/var/home/test/TAR_20190807_tally2.jpg
My problem is this:
I should extract different result files )"lista_test1", "lista_test2", "lista_test3", "lista_test4" etc) for every extract line. NOT all files in a single file.

Since you want one files list per date, reuse the variables in your loop. Like this:
Assuming you have a list of dates in a file (named dates_list.txt) like this:
20200101
20200202
20200303
20200404
Then your script could look like this:
while IFS= read -r line
do
ls -tr /var/home/test/*_"$line"_*.jpg | head -n2 > $line.list
done < dates_list.txt
Note that I would put > instead of >> to ensure you do not add the same files over and over each time you run the script.
The result would be:
20200101.list # contain the files *_20200101_*.jpg
20200202.list # contain the files *_20200202_*.jpg
20200303.list # contain the files *_20200303_*.jpg
20200404.list # contain the files *_20200404_*.jpg

Related

How to sort files by modified timestamp in unix for a shellscript to pick them one at a time

I am writing a shell script that picks one file at a time and processes them.
I want the script to pick files in the ascending order of their modified time.
I used the code below to pick .csv files with a particular filename pattern.
for file in /filepath/file*.csv
do
#mystuff
done
But I expect the script to pick .csv files according to the ascending order of their modified time. Please suggest.
Thanks in advance
If you are sure the file names don't contain any "strange" characters, e.g. newline, you could use the sorting capability of ls and read the output with a while read... loop. This will also work for file names that contain spaces.
ls -tr1 /filepath/file*.csv | while read -r file
do
mystuff "$file"
done
Note this solution should be preferred over something like
for file in $(ls -tr /filepath/file*.csv) ...
because this will fail if you have a file name that contains a space due to the word-splitting involved here.
You can return the results of ls -t as an array. (-t sorts by modified time)
csvs=($(ls -t /filepath/file*.csv))
Then apply your for loop.
for file in $csvs
do
#mystuff
done
with your loop "for"
for file in "`ls -tr /filepath/file*.csv`"
do
mystuff $file
done

matching files in while read with for loop

I am trying to combine a for loop inside a while read command. If run alone, this for loop works as needed:
for file in *postp*/*;
do;
ls $file/*/*/sequences/*/*_supercontig.fasta | xargs cat > My_New_File.txt;
done;
However, I only want to cat the files (*.fasta) that are named based on a given input list (Files_to_cat.txt). Here is the code I am trying, but returns an empty file so I have something wrong.
while read -r name;
do;
for file in *postp*/*;
do;
ls $file/*/*/sequences/*/"$name"_supercontig.fasta | xargs cat > My_New_File.txt;
done;
done<Files_to_cat.txt
Note the the list in Files_to_cat.txt matches the prefix of *_supercontig.fasta
Any help would be greatly appreciated.
I can't spot any mistake in the way you use while read.
You probably get this result because you use overwrite redirection > My_New_File.txt. If the last file which is cat-ed to My_New_File.txt is empty, then My_New_File.txt will be empty as well.
I expect what you want to do is either:
Append to file: >> My_New_File.txt;
Have a different file name for each output: > "Copy_of_${file##*/}_$name"
${file##*/} matches */ against the beginning of the string and removes it, so we get the basename of the file. We could as well do ${file//\//-} to replace all slashes by dashes.

save filename and information from the file into a two column txt doc. ubuntu terminal

I have a question regarding the manipulation and creation of text files in the ubuntu terminal. I have a directory that contains several 1000 subdirectories. In each directory, there is a file with the extension stats.txt. I want to write a piece of code that will run from the parent directory, and create a file with the name of all the stats.txt files in the first column, and then returns to me all the information from the 5th line of the same stats.txt file in the next column. The 5th line of the stats.txt file is a sentence of six words, not a single value.
For reference, I have successfully used the sed command in combination with find and cat to make a file containing the 5th line from each stats.txt file. I then used the ls command to save a list of all my subdirectories. I assumed both files would be in alphabetical order of the subdirectories, and thus easy to merge, but I was wrong. The find and cat functions, or at least my implementation of them, resulted in a file that appeared to be random in order (see below). No need to try to remedy this code, I'm open to all solutions.
# loop through subdirectories and save the 5th line of stats.txt as a different file.
for f in ~/*; do [ -d $f ] && cd "$f" && sed -n 5p *stats.txt > final.stats.txt done;
# find the final.stats.txt files and save them as a single file
find ./ -name 'final.stats.txt' -exec cat {} \; > compiled.stats.txt
Maybe something like this can help you get on track:
find . -name "*stats.txt" -exec awk 'FNR==5{print FILENAME, $0}' '{}' + > compiled.stats

Sort files in directory then execute command on each one of them

I have a directory containing files numbered like this
1>chr1:2111-1111_mask.txt
1>chr1:2111-1111_mask2.txt
1>chr1:2111-1111_mask3.txt
2>chr2:345-678_mask.txt
2>chr2:345-678_mask2.txt
2>chr2:345-678_mask3.txt
100>chr19:444-555_mask.txt
100>chr19:444-555_mask2.txt
100>chr19:444-555_mask3.txt
each file contains a name like >chr1:2111-1111 in the first line and a series of characters in the second line.
I need to sort files in this directory numerically using the number before the > as guide, the execute the command for each one of the files with _mask3 and using.
I have this code
ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
It works, but when I check the list of the strings inside the output file they are like this
>chr19:444-555
>chr1:2111-1111
>chr2:345-678
why?
So... I'm not sure what "Works" here like your question stated.
It seems like you have two problems.
Your files are not in sorted order
The file names have the leading digits removed
Addressing 1, your command ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt here doesn't make a whole lot of sense. You are getting a list of files from ls, and then piping that to sort. That probably gives you the output you are looking for, but then you pipe that to for, which doesn't make any sense.
In fact you can rewrite your entire script to
for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
And you'll have the exact same output. To get this sorted you could do something like:
for f in `ls ./"$INPUT"_temp/*_mask3.txt | sort -n`
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
As for the unexpected truncation, that > character in your file name is important in your bash shell since it directs the stdout of the preceding command to a specified file. You'll need to insure that when you use variable $f from your loop that you stick quotes around that thing to keep bash from misinterpreting the file name a command > file type of thing.

How do I write a bash script to copy files into a new folder based on name?

I have a folder filled with ~300 files. They are named in this form username#mail.com.pdf. I need about 40 of them, and I have a list of usernames (saved in a file called names.txt). Each username is one line in the file. I need about 40 of these files, and would like to copy over the files I need into a new folder that has only the ones I need.
Where the file names.txt has as its first line the username only:
(eg, eternalmothra), the PDF file I want to copy over is named eternalmothra#mail.com.pdf.
while read p; do
ls | grep $p > file_names.txt
done <names.txt
This seems like it should read from the list, and for each line turns username into username#mail.com.pdf. Unfortunately, it seems like only the last one is saved to file_names.txt.
The second part of this is to copy all the files over:
while read p; do
mv $p foldername
done <file_names.txt
(I haven't tried that second part yet because the first part isn't working).
I'm doing all this with Cygwin, by the way.
1) What is wrong with the first script that it won't copy everything over?
2) If I get that to work, will the second script correctly copy them over? (Actually, I think it's preferable if they just get copied, not moved over).
Edit:
I would like to add that I figured out how to read lines from a txt file from here: Looping through content of a file in bash
Solution from comment: Your problem is just, that echo a > b is overwriting file, while echo a >> b is appending to file, so replace
ls | grep $p > file_names.txt
with
ls | grep $p >> file_names.txt
There might be more efficient solutions if the task runs everyday, but for a one-shot of 300 files your script is good.
Assuming you don't have file names with newlines in them (in which case your original approach would not have a chance of working anyway), try this.
printf '%s\n' * | grep -f names.txt | xargs cp -t foldername
The printf is necessary to work around the various issues with ls; passing the list of all the file names to grep in one go produces a list of all the matches, one per line; and passing that to xargs cp performs the copying. (To move instead of copy, use mv instead of cp, obviously; both support the -t option so as to make it convenient to run them under xargs.) The function of xargs is to convert standard input into arguments to the program you run as the argument to xargs.

Resources