How to delete files with spaces 20 days are more senior, having left the last 4 even if they are more senior than 20 days? - bash

How to delete files with spaces 20 days are more senior, having left the last 4 even if they are more senior than 20 days?
Examples of files (where in name 1 - the oldest file):
reckless 1.txt
reckless 2.txt
reckless 3.txt
reckless 4.txt
reckless 5.txt
reckless 6.txt
reckless 7.txt
reckless 8.txt
confidence1.txt
confidence2.txt
confidence3.txt
choke-1.txt
choke-2.txt
choke-3.txt
choke-4.txt
choke-5.txt
choke-6.txt
choke-7.txt
choke-8.txt
choke-9.txt
choke-10.txt
cruel_1_1.txt
cruel_1_2.txt
cruel_1_3.txt
cruel_1_4.txt
cruel_1_5.txt
cruel_2_1.txt
cruel_2_2.txt
cruel_2_3.txt
cruel_2_4.txt
cruel_2_5.txt
cruel_2_6.txt
cruel_2_7.txt
level_1.txt
It turns out that should be deleted from this list:
reckless 1.txt
reckless 2.txt
reckless 3.txt
reckless 4.txt
choke-1.txt
choke-2.txt
choke-3.txt
choke-4.txt
choke-5.txt
choke-10.txt
cruel_1_1.txt
cruel_2_1.txt
cruel_2_2.txt
cruel_2_3.txt
Tried something similar from here: https://stackoverflow.com/a/20034914
But did not understand how to make sorting according to a necessary condition.

Assuming each filename does not contain newline characters,
please try the following:
while IFS= read -r line; do
array=()
while IFS= read -r file; do
[[ $file =~ ^[0-9]+\ (.*)$ ]] && array+=("${BASH_REMATCH[1]}")
done < <(
find . -name "$line*.txt" -type f -mtime +20 -printf "%T# %p\n" | sort
)
for (( i=0; i<${#array[#]}-4; i++ )); do
rm -- "${array[$i]}"
done
done < <(
for f in *.txt; do
[[ $f =~ ^(.*)[-_\ ]+[0-9]+\.txt$ ]] && echo "${BASH_REMATCH[1]}"
done | uniq
)
[Explanations]
First of all, it removes the suffix numbers and extensions from the filenames
to extract a root name as a group name.
Next it finds individual group files which are older than 20 days and
sort them by mtime.
Then it removes older files preserving the latest up to four files in the
group.
If the filenames do contain newline characters, small modification and -z
option support of sort command will answer the requirement.

Related

check the content of the files if they are same

i have many .txt files(namely 1.txt, 2.txt 3.txt ...etc) saved in a directory and I want to check if the content of the files inside the directory are same or not
All files should be compared with other file if the content is same then print yes, if content not same print no
For Example:
1.txt
a
b
c
2.txt
a
b
c
3.txt
1
2
3
expected output when compare two file 1.txt 2.txt
1.txt 2.txt yes
expected output when compare two file 1.txt 3.txt
1.txt 3.txt no
expected output when compare two file 2.txt 3.txt
2.txt 3.txt no
I tried the script
#!/bin/sh
for file in /home/nir/dat/*.txt
do
echo $file
diff $file $file+1
done
But here problem is it doesnot give the output.Please suggest a better solution thanks.
Something like this in bash:
for i in *
do
for j in *
do
if [[ "$i" < "$j" ]]
then
if cmp -s "$i" "$j"
then
echo $i $j equal
else
echo $i $j differ
fi
fi
done
done
Output:
1.txt 2.txt equal
1.txt 3.txt differ
2.txt 3.txt differ
One idea using an array of the filenames, and borrowing jamesbrown's cmp solution:
# load list of files into array flist[]
flist=(*)
# iterate through all combinations; '${#flist[#]}' ==> number of elements in array
for ((i=0; i<${#flist[#]}; i++))
do
for ((j=i+1; j<${#flist[#]}; j++))
do
# default status = "no" (ie, are files the same?)
status=no
# if files are different this generates a return code of 1 (aka true),
# so the follow-on assignment (status=yes) is executed
cmp -s "${flist[${i}]}" "${flist[${j}]}" && status=yes
echo "${flist[${i}]} ${flist[${j}]} ${status}"
done
done
For the 3 files listed in the question this generates:
1.txt 2.txt yes
1.txt 3.txt no
2.txt 3.txt no

Splitting CSV file into text files

I have a CSV file of the form:
1,frog
2,truck
3,truck
4,deer
5,automobile
and so on, for about 50 000 entries. I want to create 50 000 separate .txt files named with the number before the comma and containing the word after the comma, like so:
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile
and so on.
This is the script I've written so far, but it does not work properly:
#!/bin/bash
folder=/home/data/cifar10
for file in $(find "$folder" -type f -iname "*.csv")
do
name=$(basename "$file" .txt)
while read -r tag line; do
printf '%s\n' "$line" >"$tag".txt
done <"$file"
rm "$file"
done
The issue is in your inner loop:
while read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You need to set IFS to , so that tag and line are parsed correctly:
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You can use shopt -s globstar instead of find, with Bash 4.0+. This will be immune to word splitting and globbing, unlike plain find:
shopt -s globstar nullglob
for file in /home/data/cifar10/**/*.csv; do
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
done
Note that the name set through name=$(basename "$file" .txt) statement is not being used in your code.
An awk alternative:
awk -F, '{print $2 > $1 ".txt"}' file.csv
awk 'BEGIN{FS=","} {print $1".txt contains: "$2}' file
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile

Compare File headers using Awk or Cmp

I have many flat files in 1 directory.
Each file has a header and some data in it.
I want to compare header of one file with all the other files available in that directory.
This can be achieved using shell scripting but i want to do it using a single line code.
I tried it using the awk command but it is comparing the whole file not just the header.
for i in `ls -1 *a*` ; do cmp a.dat $i ; done
Can someone please let me know how can i do that?
Also if it can be achieved using awk.
I just need to check whether the header is matching or not.
I would try this: grab the first line of every file, extract the unique lines, and count them. The result should be one.
number_uniq=$( sed '1q' * | sort -u | wc -l )
That won't tell you which file is different.
files=(*)
reference_header=$( sed '1q' "${files[0]}" )
for file in "${files[#]:1}"; do
if [[ "$reference_header" != "$( sed '1q' "$file" )" ]]; then
echo "wrong header: $file"
fi
done
For what you describe, you can use md5 or cksum to take a signature of the bytes in the header.
Given 5 files (note that File 4.txt does not match):
$ for fn in *.txt; do echo "$fn:"; cat "$fn"; printf "\n\n"; done
File 1.txt:
what a great ride! it is a lovely day
/tmp/files/File 1.txt
File 2.txt:
what a great ride! it is a lovely day
/tmp/files/File 2.txt
File 3.txt:
what a great ride! it is a lovely day
/tmp/files/File 3.txt
File 4.txt:
what an awful ride! it is a horrible day
/tmp/files/File 4.txt
reference.txt:
what a great ride! it is a lovely day
/tmp/files/reference.txt
You can use md5 to get a signature and the check if they other files are the same.
First get the reference signature:
$ sig=$(head -1 reference.txt | md5)
$ echo $sig
549560de062a87ec69afff37abe18d8f
Then loop through the files:
for fn in *.txt; do
if [[ "$sig" != "$(head -1 "$fn" | md5)" ]]; then
echo "header of \"$fn\" does not match";
fi;
done
Prints:
header of "File 4.txt" does not match

How can I merge before move files?

I have some files (few millions) and I keep file list in files.txt like this:
/home/user/1.txt
/home/user/2.txt
/home/user/3.txt
/home/user/4.txt
/home/user/5.txt
I need to move all, but before move I must merge too.
I can move like this:
#!/bin/bash
for files in $(cat files.txt); do
mv $files /home/user/hop/
done
I can merge all with cat * but I need to merge by twos, like this:
1.txt and 2.txt merge --> 1.txt and move.
3.txt and 4.txt merge --> 3.txt and move.
5.txt --> 5.txt and move.
But I must merge before move, in /home/user/, not in /home/user/hop/
How can I do this?
You can use $ cat file1 file2 file3 file4 file5 file6 > out.txt after you moved them, with this you can also set the order of the files to be merged.
Also works for binaries.
You can use this script:
while read -r f; do
if ((++i % 2)); then
p="$f"
else
cat "$f" >> "$p"
mv "$p" /home/user/hop/
rm "$f"
unset p
fi
done < list.txt
[[ -n $p ]] && mv "$p" /home/user/hop/

How to zero pad numbers in file names in Bash?

What is the best way, using Bash, to rename files in the form:
(foo1, foo2, ..., foo1300, ..., fooN)
With zero-padded file names:
(foo00001, foo00002, ..., foo01300, ..., fooN)
It's not pure bash, but much easier with the Perl version of rename:
rename 's/\d+/sprintf("%05d",$&)/e' foo*
Where 's/\d+/sprintf("%05d",$&)/e' is the Perl replace regular expression.
\d+ will match the first set of numbers (at least one number)
sprintf("%05d",$&) will pass the matched numbers to Perl's sprintf, and %05d will pad to five digits
In case N is not a priori fixed:
for f in foo[0-9]*; do
mv "$f" "$(printf 'foo%05d' "${f#foo}")"
done
I had a more complex case where the file names had a postfix as well as a prefix. I also needed to perform a subtraction on the number from the filename.
For example, I wanted foo56.png to become foo00000055.png.
I hope this helps if you're doing something more complex.
#!/bin/bash
prefix="foo"
postfix=".png"
targetDir="../newframes"
paddingLength=8
for file in ${prefix}[0-9]*${postfix}; do
# strip the prefix off the file name
postfile=${file#$prefix}
# strip the postfix off the file name
number=${postfile%$postfix}
# subtract 1 from the resulting number
i=$((number-1))
# copy to a new name with padded zeros in a new folder
cp ${file} "$targetDir"/$(printf $prefix%0${paddingLength}d$postfix $i)
done
Pure Bash, no external processes other than 'mv':
for file in foo*; do
newnumber='00000'${file#foo} # get number, pack with zeros
newnumber=${newnumber:(-5)} # the last five characters
mv $file foo$newnumber # rename
done
The oneline command that I use is this:
ls * | cat -n | while read i f; do mv "$f" `printf "PATTERN" "$i"`; done
PATTERN can be for example:
rename with increment counter: %04d.${f#*.} (keep original file extension)
rename with increment counter with prefix: photo_%04d.${f#*.} (keep original extension)
rename with increment counter and change extension to jpg: %04d.jpg
rename with increment counter with prefix and file basename: photo_$(basename $f .${f#*.})_%04d.${f#*.}
...
You can filter the file to rename with for example ls *.jpg | ...
You have available the variable f that is the file name and i that is the counter.
For your question the right command is:
ls * | cat -n | while read i f; do mv "$f" `printf "foo%d05" "$i"`; done
To left-pad numbers in filenames:
$ ls -l
total 0
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 010
-rw-r--r-- 1 victoria victoria 0 Mar 28 18:09 050
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 050.zzz
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 10
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 1.zzz
$ for f in [0-9]*.[a-z]*; do tmp=`echo $f | awk -F. '{printf "%04d.%s\n", $1, $2}'`; mv "$f" "$tmp"; done;
$ ls -l
total 0
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 0001.zzz
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:23 0050.zzz
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 010
-rw-r--r-- 1 victoria victoria 0 Mar 28 18:09 050
-rw-r--r-- 1 victoria victoria 0 Mar 28 17:24 10
Explanation
for f in [0-9]*.[a-z]*; do tmp=`echo $f | \
awk -F. '{printf "%04d.%s\n", $1, $2}'`; mv "$f" "$tmp"; done;
note the backticks: `echo ... $2}\` (The backslash, \, immediately above just splits that one-liner over two lines for readability)
in a loop find files that are named as numbers with lowercase alphabet extensions: [0-9]*.[a-z]*
echo that filename ($f) to pass it to awk
-F. : awk field separator, a period (.): if matched, separates the file names as two fields ($1 = number; $2 = extension)
format with printf: print first field ($1, the number part) as 4 digits (%04d), then print the period, then print the second field ($2: the extension) as a string (%s). All of that is assigned to the $tmp variable
lastly, move the source file ($f) to the new filename ($tmp)
The following will do it:
for ((i=1; i<=N; i++)) ; do mv foo$i `printf foo%05d $i` ; done
EDIT: changed to use ((i=1,...)), thanks mweerden!
My solution replaces numbers, everywhere in a string
for f in * ; do
number=`echo $f | sed 's/[^0-9]*//g'`
padded=`printf "%04d" $number`
echo $f | sed "s/${number}/${padded}/";
done
You can easily try it, since it just prints transformed file names (no filesystem operations are performed).
Explanation:
Looping through list of files
A loop: for f in * ; do ;done, lists all files and passes each filename as $f variable to loop body.
Grabbing the number from string
With echo $f | sed we pipe variable $f to sed program.
In command sed 's/[^0-9]*//g', part [^0-9]* with modifier ^ tells to match opposite from digit 0-9 (not a number) and then remove it it with empty replacement //. Why not just remove [a-z]? Because filename can contain dots, dashes etc. So, we strip everything, that is not a number and get a number.
Next, we assign the result to number variable. Remember to not put spaces in assignment, like number = …, because you get different behavior.
We assign execution result of a command to variable, wrapping the command with backtick symbols `.
Zero padding
Command printf "%04d" $number changes format of a number to 4 digits and adds zeros if our number contains less than 4 digits.
Replacing number to zero-padded number
We use sed again with replacement command like s/substring/replacement/. To interpret our variables, we use double quotes and substitute our variables in this way ${number}.
The script above just prints transformed names, so, let's do actual renaming job:
for f in *.js ; do
number=`echo $f | sed 's/[^0-9]*//g'`
padded=`printf "%04d" $number`
new_name=`echo $f | sed "s/${number}/${padded}/"`
mv $f $new_name;
done
Hope this helps someone.
I spent several hours to figure this out.
This answer is derived from Chris Conway's accepted answer but assumes your files have an extension (unlike Chris' answer). Just paste this (rather long) one liner into your command line.
for f in foo[0-9]*; do mv "$f" "$(printf 'foo%05d' "${f#foo}" 2> /dev/null)"; done; for f in foo[0-9]*; do mv "$f" "$f.ext"; done;
OPTIONAL ADDITIONAL INFO
This script will rename
foo1.ext > foo00001.ext
foo2.ext > foo00002.ext
foo1300.ext > foo01300.ext
To test it on your machine, just paste this one liner into an EMPTY directory.
rm * 2> /dev/null; touch foo1.ext foo2.ext foo1300.ext; for f in foo[0-9]*; do mv "$f" "$(printf 'foo%05d' "${f#foo}" 2> /dev/null)"; done; for f in foo[0-9]*; do mv "$f" "$f.ext"; done;
This deletes the content of the directory, creates the files in the above example and then does the batch rename.
For those who don't need a one liner, the script indented looks like this.
for f in foo[0-9]*;
do mv "$f" "$(printf 'foo%05d' "${f#foo}" 2> /dev/null)";
done;
for f in foo[0-9]*;
do mv "$f" "$f.ext";
done;
Here's a quick solution that assumes a fixed length prefix (your "foo") and fixed length padding. If you need more flexibility, maybe this will at least be a helpful starting point.
#!/bin/bash
# some test data
files="foo1
foo2
foo100
foo200
foo9999"
for f in $files; do
prefix=`echo "$f" | cut -c 1-3` # chars 1-3 = "foo"
number=`echo "$f" | cut -c 4-` # chars 4-end = the number
printf "%s%04d\n" "$prefix" "$number"
done

Resources