SVN Status show files in descending order (date modified) - sorting

Does anyone know how to sort the output of 'svn st' to show the files in descending order? In essence, the equivalent of 'ls -lt'
I've been running 'find ./ -mtime -1 -print' to see what files I've changed in the last day, but I'd like to know if there's a way to use svn to show me a list of SVN files that I've changed in descending order.
I've been working on a project for about 2-months, all which are local edits, 100+ files that I'd like to sort based on the time I've edited them.

svn status | while read -N 8 status && read file; do
mtime=$(stat -c %Y "$file" 2>/dev/null || echo 0)
printf '%010d\t%s%s\n' "$mtime" "$status" "$file"
done | sort -rn | cut -f 2-
The while loop separates the file names from the status indicators and then prepends each line with the files' modification times. This output is then piped to sort to order them by modification time. Finally, cut removes the timestamps, leaving the original output but in sorted order.
Deleted files end up on the bottom since the time you deleted them is unknown. If you want them on top change the echo 0 to echo 999999999.

svn stat | grep "^M" | awk '{print $NF}' | xargs ls -lt
This gets all modified files and runs ls -lt on the batch.

Related

Count unique words in all text files in directory, and delete those having less than 2?

This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E '\w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E '\w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
You could do this …
test $(grep -o -E '\w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.

Retain the latest file sets in a directory for a given file pattern

I have multiple set of files in a ftp folder and each set contains a text and a marker file.
Here I need to get the latest set of files having below file pattern from a given directory based on its arrival time.
File format:
<FileName>_<FileID>_<Date>_<TimeStamp>.csv
<FileName>_<FileID>_<Date>_<TimeStamp>.mrk
File1 has three sets coming at different times:
file1_123_20180306_654321.csv
file1_123_20180306_654321.mrk
file1_123_20180306_866321.csv
file1_123_20180306_866321.mrk
file1_123_20180306_976321.csv
file1_123_20180306_976321.mrk
File2 has two sets coming at different times:
file2_456_20180306_277676.csv
file2_456_20180306_277676.mrk
file2_456_20180306_788988.csv
file2_456_20180306_788988.mrk
If it's a single file I'm able to do the below command but in case if its a set I need help.
ls -t *123*.mrk | head -1
ls -t *123*.csv | head -1
I need to retain only the latest set of files (from file1 and file2) and move the other files into a different folder.
Expected output:
file1_123_20180306_976321.csv
file1_123_20180306_976321.mrk
file2_456_20180306_788988.csv
file2_456_20180306_788988.mrk
How would I do this using shell or python2.6? Any help is much appreciated.
If a more or less exact answer already exists to this question please point to that.
You may use this awk to get the latest file entry for each set from your two files:
printf '%s\0' *_*_*_*.csv *_*_*_*.mrk |
awk -v RS='\0' -v ORS='\0' -F '[_.]' 'NF{a[$1,$2,$3,$NF]=$0}
END{for (i in a) print a[i]}' |
xargs -0 -I {} echo mv '{}' /dest/dir
Output:
mv file2_456_20180306_788988.csv /dest/dir
mv file1_123_20180306_976321.mrk /dest/dir
mv file1_123_20180306_976321.csv /dest/dir
mv file2_456_20180306_788988.mrk /dest/dir
When you're satisfied with the output, you can remove echo before mv command to move these files into a destination directory.

How do I print oldest file and include its timestamp from a directory via UNIX

I have a number of files in a directory, and I would like to print the oldest file along with its timestamp.
The following command is giving me the correct filename, however it does not show me the timestamp.
ls -ltr | head -1
EDIT: non-gnu based system.
You can use stat to print modification timestamps in seconds since EPOCH and human readable form. Thereafter you can use numerical sort on first field and finally cut to discard first field.
stat -c $'%Y\t%y\t%n' * | sort -nk1 | cut -f 2-
EDIT: TO use to gnu find, you can use:
find . -maxdepth 1 -printf '%T#\t%t\t%p\0' | sort -z -nk1 | cut -z -f2- | head -z -n1; echo
ls -ltr | head -1 | awk '{OFS="\t"} {print $6, $7, $8, $9}'
awk's print statement takes an index,
If the output of your ls -ltr is
-rw-r--r-- 1 abc abc 185 Dec 19 11:23 testfile.csv
then
$6: date (19)
$7: month (Dec)
$8: time (11:23)
$9: file name (testfile.csv)
index starts with 1 from left to right.
Note: https://www.gnu.org/s/gawk/manual/gawk.html awk is crazy powerful.
None of the answers so far (aside from anubhava's edit, which I either overlooked or was added after I last looked) accommodate file names that contain newlines. However, you don't actually need to see the timestamp for each file to determine which one is the oldest in bash. There is a conditional operator, -ot, to determine if one file is older than another.
oldest=
for f in *; do
[[ -z $oldest || $f -ot $oldest ]] && oldest=$f
done
ts=$(stat -f %m "$oldest") # Get the timestamp for the oldest file
stat is not covered by the POSIX specification, so you'll need to consult your documentation to determine the correct call for your system. (BSD stat shown here.)

Bash script to store list of files in an array with number of occurrences of each word in all files

So far, my bash script takes in two arguments...input which can be a file or a directory, and output, which is the output file. It finds all files recursively and if the input is a file it finds all occurrences of each word in all the files found and list them in the output file with the number on the left and the word on the right sorted from greatest to least. Right now it is also counting numbers as words which it shouldn't do...how can I have it only find all occurrences of valid words and no numbers? Also, in the last if statement...if the input is a directory, I am having trouble getting it to do the same thing I had it do for the file. It needs to find all files in that directory, and if there is another directory in that directory, it needs to find all files in it and so on. Then it needs to count all occurrences of each word in all files and store them to the output file just as in the case for a file. I was thinking to store them in an array, but I'm not sure if its the best way, and my syntax is off because its not working...so I would like to know how can I do this? Thanks!
#!/bin/bash
INPUT="$1"
OUTPUT="$2"
ARRAY=();
# Check that there are two arguments
if [ "$#" -ne 2 ]
then
echo "Usage: $0 {dir-name}";
exit 1
fi
# Check that INPUT is different from OUTPUT
if [ "$INPUT" = "$OUTPUT" ]
then
echo "$INPUT must be different from $OUTPUT";
fi
# Check if INPUT is a file...if so, find number of occurrences of each word
# and store in OUTPUT file sorted in greatest to least
if [ -f "$INPUT" ]
then
for name in $INPUT; do
if [ -f "$name" ]
then
xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
fi
done
# If INPUT is a directory, find number of occurrences of each word
# and store in OUTPUT file sorted in greatest to least
elif [ -d "$INPUT" ]
then
find $name -type f > "${ARRAY[#]}"
for name in "${ARRAY[#]}"; do
if [ -f "$name" ]
then
xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
fi
done
fi
I don't recommend you specifying the output file, because you must to more validity checking for it, e.g.
the output shouldn't exists (if you don't want allow the overwrite)
if you want allow the overwrite, if the output exists, it must be an plain file
and so on..
it is better to have a possibility to use more input directories/files as arguments
therefore is better (an it is more bash-ish) produces output to standard output and you can redirect it to file at invocation, like
bash wordcounter.sh files or directories more the one to count words > to_some_file
e.g
bash worcounter.sh some_dir >result.txt
#or
bash wordcounter.sh file1.txt file2.txt .... fileN.txt > result2.txt
#or
bash wordcounter.sh dir1 file1 dir2 file2 >result2.txt
the whole wordcounter.sh could be the next:
for arg
do
find "$arg" -type f -print0
done |xargs -0 grep -hoP '\b[[:alpha:]]+\b' |sort |uniq -c |sort -nr
where:
the find will search plain files the for all arguments
and on the the generated file-list will run the counting script
The script sill has some drawbacks, e.g. will try count words in the image-files too and like, maybe in the next question in this serie you will ask for it ;)
EDIT
If you really want two argument script e.g. script where_to_search output (what isn't very bash-like), put the above script into the function, and do whatever you want, e.g:
#!/bin/bash
wordcounter() {
for arg
do
find "$arg" -type f -print0
done |xargs -0 grep -hoP '\b[[:alpha:]]+\b' |sort |uniq -c |sort -nr
}
where="$1"
output="$2"
#do here the necessary checks
#...
#and run the function
wordcounter "$where" > "$output"
#end of script

How to loop over files in natural order in Bash?

I am looping over all the files in a directory with the following command:
for i in *.fas; do some_code; done;
However, I get them in this order
vvchr1.fas
vvchr10.fas
vvchr11.fas
vvchr2.fas
...
instead of
vvchr1.fas
vvchr2.fas
vvchr3.fas
...
what is natural order.
I have tried sort command, but to no avail.
readarray -d '' entries < <(printf '%s\0' *.fas | sort -zV)
for entry in "${entries[#]}"; do
# do something with $entry
done
where printf '%s\0' *.fas yields a NUL separated list of directory entries with the extension .fas, and sort -zV sorts them in natural order.
Note that you need GNU sort installed in order for this to work.
With option sort -g it compares according to general numerical value
for FILE in `ls ./raw/ | sort -g`; do echo "$FILE"; done
0.log
1.log
2.log
...
10.log
11.log
This will only work if the name of the files are numerical. If they are string you will get them in alphabetical order. E.g.:
for FILE in `ls ./raw/* | sort -g`; do echo "$FILE"; done
raw/0.log
raw/10.log
raw/11.log
...
raw/2.log
You will get the files in ASCII order. This means that vvchr10* comes before vvchr2*. I realise that you can not rename your files (my bioinformatician brain tells me they contain chromosome data, and we simply don't call chromosome 1 "chr01"), so here's another solution (not using sort -V which I can't find on any operating system I'm using):
ls *.fas | sed 's/^\([^0-9]*\)\([0-9]*\)/\1 \2/' | sort -k2,2n | tr -d ' ' |
while read filename; do
# do work with $filename
done
This is a bit convoluted and will not work with filenames containing spaces.
Another solution: Suppose we'd like to iterate over the files in size-order instead, which might be more appropriate for some bioinformatics tasks:
du *.fas | sort -k2,2n |
while read filesize filename; do
# do work with $filename
done
To reverse the sorting, just add r after -k2,2n (to get -k2,2nr).
You mean that files with the number 10 comes before files with number 3 in your list? Thats because ls sorts its result very simple, so something-10.whatever is smaller than something-3.whatever.
One solution is to rename all files so they have the same number of digits (the files with single-digit in them start with 0 in the number).
while IFS= read -r file ; do
ls -l "$file" # or whatever
done < <(find . -name '*.fas' 2>/dev/null | sed -r -e 's/([0-9]+)/ \1/' | sort -k 2 -n | sed -e 's/ //;')
Solves the problem, presuming the file naming stays consistent, doesn't rely on very-recent versions of GNU sort, does not rely on reading the output of ls and doesn't fall victim to the pipe-to-while problems.
Like #Kusalananda's solution (perhaps easier to remember?) but catering for all files(?):
array=("$(ls |sed 's/[^0-9]*\([0-9]*\)\..*/\1 &/'| sort -n | sed 's/^[^ ]* //')")
for x in "${array[#]}";do echo "$x";done
In essence add a sort key, sort, remove sort key.
EDIT: moved comment to appropriate solution
use sort -rh and the while loop
du -sh * | sort -rh | grep -P "avi$" |awk '{print $2}' | while read f; do fp=`pwd`/$f; echo $fp; done;

Resources