Bash remove half of the files in the directory - bash

I am trying to remove half of the files in the corpora directory to make my spam filter trained a little bit faster and, in the future, save some space. Normally I would do it by trial and error, but since these files took a while to download etc, plus it's shell (which I am obviously not an expert in), I do not want to mess this up.
I would try something like this:
ls *.* > list
for i in 'cat list'; do rm -f i++; done
But im pretty sure i++ like this isn't a proper way to skip every second item in the list. Perhaps I should use some other loop?
Secondly, there are two types of files in that directory:
0000.* to 1500.*
0000.* to 0250.*
I want to delete half of the first type and half of the second type. Since they're probably sorted a standard way in the list, meaning that from 0000.* to 0250.* they interweave and then after 0.250.* the first type remains only, it might be deleted the wrong way (all from the second type could be deleted).
So IMHO, I should do it like this:
Both types delete 0000.*
Both types skip 0001.*
Both types delete 0002.*
etc.
Do you guys have an idea how to delete these files like above?

If you just want to delete every second file, then you can use a simple alternating state machine. Since *.* will give you the files in sorted order, you can just delete every second file, with something like:
del=1
for fspec in *.* ; do
if [[ ${del} -eq 1 ]] ; then
del=0
echo rm ${fspec}
else
echo ok ${fspec}
del=1
fi
done
If you run that script you'll get a series of alternating lines saying:
rm file1
ok file2
rm file3
ok file4
and so on.
Once you're happy with the behaviour, you can comment out the ok line entirely and remove the echo from the rm line.
However, if your intent is to actually delete all files of the form NNNN.*, where NNNN is in the set {0000, 0002, 0004, ..., 9998}, that can be done more concisely (again, remove the echo when you're happy):
for id in {0000..9998..2} ; do
echo rm -f ${id}.*
done
That 0000 will ensure the strings are four digits long, assuming you have a recent enough bash. If it doesn't, you can just use:
for id in {0..9998..2} ; do
echo rm -f $(printf "%04d" ${id}).*
done
Regardless of the method you choose, I'd be making a backup of the directory you're working in before testing as well.

Related

Using brace expansion to move files on the command line

I have a question concerning why this doesn't work. Probably, it's a simple answer, but I just can't seem to figure it out.
I want to move a couple of files I have. They all have the same filename (let's say file1) but they are all in different directories (lets say /tmp/dir1,dir2 and dir3). If I were to move these individually I could do something along the lines of:
mv /tmp/dir1/file1 /tmp
That works. However, I have multiple directories and they're all going to end up in the same spot....AND I don't want to overwrite. So, I tried something like this:
mv /tmp/{dir1,dir2,dir3}/file1 /tmp/file1.{a,b,c}
When I try this I get:
/tmp/file1.c is not a directory
Just to clarify...this also works:
mv /tmp/dir1/file1 /tmp/file1.c
Pretty sure this has to do with brace expansion but not certain why.
Thanks
Just do echo to understand how the shell expands:
$ echo mv /tmp/{dir1,dir2,dir3}/file1 /tmp/file1.{a,b,c}
mv /tmp/dir1/file1 /tmp/dir2/file1 /tmp/dir3/file1 /tmp/file1.a /tmp/file1.b /tmp/file1.c
Now you can see that your command is not what you want, because in a mv command, the destination (directory or file) is the last argument.
That's unfortunately now how the shell expansion works.
You'll have to probably use an associative array.
!/bin/bash
declare -A MAP=( [dir1]=a [dir2]=b [dir3]=c )
for ext in "${!MAP[#]}"; do
echo mv "/tmp/$ext/file1" "/tmp/file1.${MAP[$ext]}"
done
You get the following output when you run it:
mv /tmp/dir2/file1 /tmp/file1.b
mv /tmp/dir3/file1 /tmp/file1.c
mv /tmp/dir1/file1 /tmp/file1.a
Like with many other languages key ordering is not guaranteed.
${!MAP[#]} returns an array of all the keys, while ${MAP[#]} returns the an array of all the values.
Your syntax of /tmp/{dir1,dir2,dir3}/file1 expands to /tmp/dir1/file /tmp/dir2/file /tmp/dir3/file. This is similar to the way the * expansion works. The shell does not execute your command with each possible combination, it simply executes the command but expands your one value to as many as are required.
Perhaps instead of a/b/c you could differentiate them with the actual number of the dir they came from?
$: for d in 1 2 3
do echo mv /tmp/dir$d/file1 /tmp/file1.$d
done
mv /tmp/dir1/file1 /tmp/file1.1
mv /tmp/dir2/file1 /tmp/file1.2
mv /tmp/dir3/file1 /tmp/file1.3
When happy with it, take out the echo.
A relevant point - brace expansion is not a wildcard. It has nothing to do with what's on disk. It just creates strings.
So, if you create a bunch of files named with single letters or digits, echo ? will wildcard and list them all, but only the ones actually present. If there are files for vowels but not consonants, only the vowels will show. But -
if you say echo {foo,bar,nope} it will output foo bar nope regardless of whether or not any or all of those exist as files or directories, etc.

Wildcard on mv folder destination

I'm writing a small piece of code that checks for .mov files in a specific folder over 4gb and writes it to a log.txt file by name (without an extension). I'm then reading the names into a while loop line by line which signals some archiving and copying commands.
Consider a file named abcdefg.mov (new) and a corresponding folder somewhere else named abcdefg_20180525 (<-*underscore timestamp) that also contains a file named abcedfg.mov (old).
When reading in the filename from the log.txt, I strip the extension to store the variable "abcdefg" ($in1) and i'm using that variable to locate a folder elsewhere that contains that matching string at the beginning.
My problem is with how the mv command seems to support a wild card in the "source" string, but not in the "destination" string.
For example i can write;
mv -f /Volumes/Myshare/SourceVideo/$in1*/$in1.mov /Volumes/Myshare/Archive
However a wildcard on the destination doesn't work in the same way. For example;
mv -f /Volumes/Myshare/Processed/$in1.mov Volumes/Myshare/SourceVideo/$in1*/$in1.mov
Is there an easy fix here that doesn't involve using another method?
Cheers for any help.
mv accepts a single destination path. Suppose that $in1 is abcdfg, and that $in1* expands to abcdefg_20180525 and abcdefg_20180526. Then the command
mv -f /dir1/$in1 /dir2/$in1*/$in1.mov
will be equivalent to:
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir2/abcdefg_20180525/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
Moreover, because the destination file is the same in all three cases, the first two files will be overwritten by the third.
You should create a precise list and do a precise copy instead of using wild cards.
This is what I would probably do, generate a list of results in a file with FULL path information, then read those results in another function. I could have used arrays but I wanted to keep it simple. At the bottom of this script is a function call to scan for files of EXT mp4 (case insensitive) then writes the results to a file in tmp. then the script reads the results from that file in another function and performs some operation (mv etc.). Note, if functions are confusing , you can just remove the function name { } and name calls and it becomes a normal script again. functions are really handy, learn to love them!
#!/usr/bin/env bash
readonly SIZE_CHECK_LIMIT_MB="10M"
readonly FOLDER="/tmp"
readonly DESTINATION_FOLDER="/tmp/archive"
readonly SAVE_LIST_FILE="/tmp/$(basename $0)-save-list.txt"
readonly EXT="mp4"
readonly CASE="-iname" #change to -name for exact ext type upper/lower
function find_files_too_large() {
> ${SAVE_LIST_FILE}
find "${FOLDER}" -maxdepth 1 -type f "${CASE}" "*.${EXT}" -size +${SIZE_CHECK_LIMIT_MB} -print0 | while IFS= read -r -d $'\0' line ; do
echo "FOUND => $line"
echo "$line" >> ${SAVE_LIST_FILE}
done
}
function archive_large_files() {
local read_file="${SAVE_LIST_FILE}"
local write_folder="$DESTINATION_FOLDER"
if [ ! -s "${read_file}" ] || [ ! -f "${read_file}" ] ;then
echo "No work to be done ... "
return
fi
while IFS= read -r line ;do
echo "mv $line $write_folder" ;sleep 1
done < "${read_file}"
}
# MAIN (this is where the script starts) We just call two functions.
find_files_too_large
archive_large_files
it might be easier, i think, to change the filenames to the folder name initially. So abcdefg.mov would be abcdefg_timestamp.mov. I can always strip the timestamp from the filename easy enough after its copied to the right location. I was hoping i had a small syntax issue but i think there is no easy way of doing what i thought i could...
I think you have a basic misunderstanding of how wildcards work here. The mv command doesn't support wildcards at all; the shell expands all wildcards into lists of matching files before they get passed to the mv command as wildcards. Furthermore, the mv command doesn't know if the list of arguments it got came from wildcards or not, and the shell doesn't know anything about what the command is going to do with them. For instance, if you run the command grep *, the grep command just gets a list of names of files in the current directory as arguments, and will treat the first of them as a regex pattern ('cause that's what the first argument to grep is) to search the rest of the files for. If you ran mv * (note: don't do this!), it will interpret all but the last filename as sources, and the last one as a destination.
I think there's another source of confusion as well: when the shell expands a string containing a wildcard, it tries to match the entire thing to existing files and/or directories. So when you use Volumes/Myshare/SourceVideo/$in1*/$in1.mov, it looks for an already-existing file in a matching directory; AIUI the file isn't there yet, there's no match. What it does in that case is pass the raw (unexpanded) wildcard-containing string to mv as an argument, which looks for that exact name, doesn't find it, and gives you an error.
(BTW, should there be a "/" at the front of that pattern? I assume so below.)
If I understand the situation correctly, you might be able to use this:
mv -f /Volumes/Myshare/Processed/$in1.mov /Volumes/Myshare/SourceVideo/$in1*/
Since the filename isn't supplied in the second string, it doesn't look for existing files by that name, just directories with the right prefix; mv will automatically retain the filename from the source.
However, I'll echo #Sergio's warning about chaos from multiple matches. In this case, it won't overwrite files (well, it might, but for other reasons), but if it gets multiple matching target directories it'll move all but the last one into the last one (along with the file you meant to move). You say you're 100% certain this won't be a problem, but in my experience that means that there's at least a 50% chance that something you'd never have thought of will go ahead and make it happen anyway. For instance, is it possible that $in1 could wind up empty, or contain a space, or...?
Speaking of spaces, I'd also recommend double-quoting all variable references. You want the variables inside double-quotes, but the wildcards outside them (or they won't be expanded), like this:
mv -f "/Volumes/Myshare/Processed/$in1.mov" "/Volumes/Myshare/SourceVideo/$in1"*/

Append to list of files in bash

so I'm trying to get a simple bash script to continuously read a directory and update a list of files to play through a command. However, I'm having some trouble thinking out the logic in it. What I need to do is put the current items in the directory into the list, have each item in the directory run through a program, and when a new item comes in, just append it to the list. I'm attempting to use inotifywait but can't seem to think of the proper logic. I may need it to run in the background, as the process that is running on these files will run before inotifywait is read again, at which point it will not pick up any new files that have been added as it only checks when it runs. Here's the code so hopefully it makes more sense.
#!/bin/bash
#Initial check to see if files are converted.
if [ ! -d "/home/pi/rpitx/converted" ]; then
echo "Converted directory does not exist, cannot play!"
exit 1
fi
CYAN='\e[36m'
NC='\e[39m'
LGREEN='\e[92m'
#iterate through directory first and act upon each item
for f in $FILES
do
echo -e "${CYAN}Now playing ${f##*/}...${NC}"
#Figure out a way to always watch directory even when it is playing
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to |
while read path action file; do
echo -e "${LGREEN}New file found: ${CYAN}${file}${NC}"
FILES+=($file)
done
# take action on each file. $f store current file name
sudo ./rpitx -m RF -i "${f}" -f 101100
done
exit 0
So for example. if rpitx is currently playing something, and a file is converted, it won't pick up the latest file and add it to the list, nor will it make it since it's always reading. Is there a way to get inotifywait to run in the background of this script somehow? Thanks.
This is actually quite a difficult problem to get 100% perfect, but it is possible to get pretty close.
It is easy to get all the files in a directory, and it is easy to use inotifywait to get iteratively informed of new files being placed into the directory. The issue is getting the two to be consistent. If inotifywait isn't started until all the files have been processed (or even just listed), then you might miss new files created between the listing and the invocation of inotifywait. If, on the other hand, you start inotifywait first, then a file created after the invocation of inotifywait and the extraction of the current file list will be listed twice.
Since it is easier to filter duplicates than notice orphans, the recommended approach is the second one.
As a first approximation, we could ignore the duplicate problem on the assumption that the window of vulnerability is pretty short and so it is probably unlikely to happen. This simplifies the code, but it's not that difficult to track and eliminate duplicates: we could, for example, store each filename as the key in an associative array, ignoring the file if the key already exists.
We need three processes: one to execute inotifywait; one to produce the list of initial files; and one to handle each file as it is identified. So the basic structure of the code will be:
list_new_files |
{ list_existing_files; pass_through; } |
while read action file; do
handle -r "$action" "$file"
done
Note that the second process first produces the existing files, and then calls pass_through, which reads from standard input and writes to standard output, thus passing through the files being discovered by list_new_files. Since pipes have a finite capacity, it is possible that the execution of list_existing_files will block a few times (if there are lots of existing files and handling them takes a long time), so when pass_through finally gets executed, it could have quite a bit of queued-up input to pass through. That doesn't matter, unless the first pipe also fills up, which will happen if a large number of new files are created. And that still won't matter as long as inotifywait doesn't lose notifications while it is blocked on a write. (This may actually be a problem, since the manpage for inotifywait on my system includes in the "BUGS" section the note, "It is assumed the inotify event queue will never overflow." We could fix the problem by inserting another process which carefully buffers inotifywait's output, but that shouldn't be necessary unless you intend to flood the directory with lots of files.)
Now, let's examine each of the functions in turn.
list_new_files could be just the call to inotifywait from your original script:
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to
Listing existing files is also easy. Here's one simple solution:
printf "%s\n" /home/pi/rpitx/converted/*
However, that will print out the full file path, which is different from the output from inotifywait. To make them the same, we cd into the directory in order to do the listing. Since we might not actually want to change the working directory, we use a subshell by surrounding the commands inside parentheses:
( cd /home/pie/rpitx/converted; printf "%s\n" *; )
The printf just prints its arguments each on a separate line. Since glob-expansions are not word-split or recursively glob-expanded, this is safe against whitespace or metacharacters in filenames, except newline characters. Filenames with newline characters are pretty rare; for now, I'll ignore the issue but I'll indicate how to handle it at the end.
Even with the change indicated above, the output from these two commands is not compatible: the first one outputs three things on each line (directory, action, filename), and the second one just one thing (the filename). In the listing below, you'll see how we modify the format to printf and introduce a format for inotifywait in order to make the outputs fully compatible, with the "action" for existing files set to EXISTING.
pass_through could, in theory, just be cat, and that's how I've coded it below. However, it is important that it operate in line-buffered mode; otherwise, nothing will happen until "enough" files have been written by list_existing_files. On my system, cat in this configuration works perfectly; if that doesn't work for you or you don't want to count on it, you could write it explicitly as a while read loop:
pass_through() {
while read -r line; do echo "$line"; done
}
Finally, handle is essentially the code from the original post, but modified a bit to take the new format into account, and to do the right thing with action EXISTING.
# Colours. Note the use of `$'...'` to actually store the code,
# thereby avoiding the need to later reinterpret backslash sequences
CYAN=$'\e[36m'
NC=$'\e[39m'
LGREEN=$'\e[92m'
converted=/home/pi/rpitx/converted
list_new_files() {
inotifywait -m "$converted" -e create -e moved_to --format "%e %f"
}
# Note the use of ( ) around the body instead of { }
# This is the same as `{( ... )}'; it makes the `cd` local to the function.
list_existing_files() (
cd "$converted"
printf "EXISTING %s\n" *
)
# Invoked as `handle action filename`
handle() {
case "$1" in
EXISTING)
echo "${CYAN}Now playing ${2}...${NC}"
;;
*)
echo "${LGREEN}New file found: ${CYAN}${file}${NC}"
;;
esac
sudo ./rpitx -m RF -i "${f}" -f 101100
}
# Put everything together
list_new_files |
{ list_existing_files; cat; } |
while read -r action file; do handle "$action" "$file"; done
What if we thought a filename might have a newline character in it? There are two "safe" characters which could be used to delimit the filenames, in the sense that they cannot appear inside a filename. One is /, which can obviously appear in a path, but cannot appear in a simple filename, which is what we're working with here. The other one is the NUL character, which cannot appear inside a filename at all, but can sometimes be a bit annoying to deal with.
Normally, faced with this problem, we would use a NUL, but that depends on the various utilities we're using allowing the separation of data with NUL instead of newline. That's not the case for inotifywait, which always outputs a newline after a notification line. So in this case it seems simpler to use a /. First we modify the formats:
inotifywait -m "$converted" -e create -e moved_to --format "%e %f/"
printf "%s/\n" *
Now, when we're reading the lines, we need to read until we find a line ending with / (and remember to remove it). read doesn't allow two-character line terminators, so we need to accumulate the lines ourselves:
while read -r action file; do
# If file doesn't end with a slash, we need to read another line
while [[ file != */ ]] && read -r line; do
file+=$'\n'"$line"
done
# Remember to remove the trailing slash
handle "$action" "${file%/}"
done

How to rename files with ordering using Bash {thisfile.jpg -> newfile1.jpg, thatfile.jpg -> newfile2.jpg}

Let's say I have 100 jpg files.
DSC_0001.jpg
DSC_0002.jpg
DSC_0003.jpg
....
DSC_0100.jpg
And I want to rename them like
summer_trip_1.jpg
summer_trip_2.jpg
summer_trip_3.jpg
.....
summer_trip_100.jpg
So I want these properties to be modified:
1. filename
2. order of the file(as the order by date files were created)
How could I achieve this by using bash? Like:
for file in *.jpg ; do mv blah blah blah ; done
Thank you!
It's very simple: have a variable and increment it at each step.
Example:
cur_number=1
prefix="summer_trip_"
suffix=""
for file in *.jpg ; do
echo mv "$file" "${prefix}${cur_number}${suffix}.jpg" ;
let cur_number=$cur_number+1 # or : cur_number=$(( $cur_number + 1 ))
done
and once you think it's ready, take out the echo to let the mv occur.
If you prefer them to be ordered by file date (usefull, for example, when mixing photos from several cameras, of if on yours the "numbers" rolled over):
change
for file in *.jpg ; do
into
for file in $( ls -1t *.jpg ) ; do
Note that that second example will only work if your original filenames don't have space (and other weird characters) in them, which is fine with almost all cameras I know about.
Finally, instead of ${cur_number} you could use $(printf '%04s' "${cur_number}") so that the new number has leading zeros, making sorting much easier.
How about using rename ?
rename DSC_ summer_trip_ *.jpg
See man page of rename
This works if your original numbers are all padded with zeroes to the same length:
i=0; for f in DSC_*.jpg; do mv "$f" "summer_trip_$((++i)).jpg"; done
If I understand your goal correctly:
So I want these properties to be modified: 1. filename 2. order of the file(as the order by date files were created)
if numbers of renamed files shall increment in order by file creation date, then use the following for loop:
for file in $(ls -t -r *.jpg); do
-t sorts by mtime (last modification time, not exactly creation time, newest first) and -r reverses the order listing oldest first. Just in case if original .jpg file numbers are not in the same order as pictures were taken.
As it was mentioned previously, this won't work if file names have whitespaces. If your files have spaces please try modifying IFS variable before for loop:
IFS=$'\n'
It will force split of 'ls' command results on newlines only (not whitespaces). Also it would fail if there is a newline in a file name (rather exotic IMHO :). Changing IFS may have some subtle effects further in your script, so you can save old one and restore after the loop.

whats the correct way to loop this

I have a script where inotifywait is piped into a while loop that executes the following logic.
cp "$S3"/2/post2.png "$S3";
mv "$S3"/1/post1.png "$S3"/2/post2.png;
cp "$S3"/3/post3.png "$S3";
mv "S3"/post2.png "$S3"/3/post3.png;
so forth and so on..... then at the end of the script...
mv "$dir"/$file "$S3"/1/post1.png
That line represents a fresh post, the above is the rotation of older post.
I can can hand code the iterations all the way down to 100+, but I would like to program more efficiently and save time.
So, what's some correct ways to loop this?
I think a better mechanism would list the directories in "$S3" in reverse numeric order, and arrange to process them like that. It isn't clear if the 100 directories are all present or whether they need to be created. We'll assume that directories 1..100 might exist, and directory N will always and only contain postN.png.
I'm assuming that there are no spaces, newlines or other awkward characters in the file paths; this means that ls can be used without too much risk.
for dirnum in $(cd "$S3"; ls */*.png | sed 's%/.*%%' | sort -nr)
do
next=$(($dirnum + 1))
mv "$S3/$dirnum/post$dirnum.png" "$S3/$next/post$next.png"
done
The cd "$S3" means I don't get a possibly long pathname included in the output; the ls */*.png lists the files that exist; the sed removes the file name and slash, leaving just a list of directory numbers containing files; and the sort puts the directories in reverse numeric order.
The rest is straight-forward, given the assumption that the necessary directories already exist. It would not be hard to add [ -d "$S3/$next" ] || mkdir -p "$S3/$next" before moving the file. Clearly, after the loop you can use your final command:
mv "$dir/$file" "$S3/1/post1.png"
Note that I've enclosed complete names in double quotes; it generally leads to fewer nasty surprises if something acquires spaces unexpectedly.
Try this:
for i in $(ls -r1 "$3"); do
mkdir -p "$3/$((i+1))"
mv "$3/$i/post$i.png" "$3/$((i+1))/post$((i+1)).png"
done
mv "$dir"/$file "$S3"/1/post1.png
The loop will iterate through all directories in reverse order and move the files.

Resources