#bash recursive concatenation over multiple folders per folder

#bash recursive concatenation over multiple folders per folder - bash

I have system that creates a folder per day with a txt file generated every 10 minutes.
I need to write a bash script that runs from the start folder over each day merges all txt files into one file per day and writes this file with the into a destination folder.
the last solution I had was something like this
for i in $dirm;
do
ls -1U | find . -name "*.txt" | xargs cat *.txt > all
cut -c 1-80 $i/all > $i/${i##*/}
.....
done
for some reason i can't get the loop right to go through each folder. this finds all .txt. but not per folder. the cut thing is i only need the first 80 chars.
probably a really easy problem but i can't get my head around it.

I assume $dirm is the directory list, then you should find from $i and not from current directory (.)
for i in $dirm;
do
ls -1U | find $i -name "*.txt" | xargs cat *.txt > all
cut -c 1-80 $i/all > $i/${i##*/}
.....
done

I think you're trying to combine the output of ls and find. To do that, piping one command into the other does not work. Instead, run them together in a subshell:
(ls; find) | xargs...

Related

Batch rename all files in a directory to basename-sequentialnumber.extention

I have a directory containing .jpg files, currently named photo-1.jpg, photo-2.jpg etc. There are about 20,000 of these files, sequentially numbered.
Sometimes I delete some of these files, which creates gaps in the file naming convention.
Can you guys help me with a bash script that would sequentially rename all the files in the directory to eliminate the gaps? I have found many posts about renaming files and tried a bunch of things, but can't quite get exactly what I'm looking for.
For example:
photo-1.jpg
photo-2.jpg
photo-3.jpg
Delete photo-2.jpg
photo-1.jpg
photo-3.jpg
run script to sequentially rename all files
photo-1.jpg
photo-2.jpg
done

With find and sort.
First check the output of
find directory -type f -name '*.jpg' | sort -nk2 -t-
If the output is not what you expected it to be, meaning the order of sorting is not correct, then It might have something to do with your locale. Add the LC_ALL=C before the sort.
find directory -type f -name '*.jpg' | LC_ALL=C sort -nk2 -t-
Redirect it to a file so it can be recorded, add a | tee output.txt after the sort
Add the LC_ALL=C before the sort in the code below if it is needed.
#!/bin/sh
counter=1
find directory -type f -name '*.jpg' |
sort -nk2 -t- | while read -r file; do
ext=${file##*[0-9]} filename=${file%-*}
[ ! -e "$filename-$counter$ext" ] &&
echo mv -v "$file" "$filename-$counter$ext"
counter=$((counter+1))
done # 2>&1 | tee log.txt
Change the directory to the actual name of your directory that contains the files that you need to rename.
If your sort has the -V flag/option then that should work too.
sort -nk2 -t- The -n means numerically sort. -k2 means the second field and the -t- means the delimiter/separator is a dash -, can be written as -t -, caveat, if the directory name has a dash - as well, sorting failure is expected. Adjust the value of -k and it should work.
ext=${file##*[0-9]} is a Parameter Expansion, will remain only the .jpg
filename=${file%-*} also a Parameter Expansion, will remain only the photo plus the directory name before it.
[ ! -e "$filename-$counter$ext" ] will trigger the mv ONLY if the file does not exists.
If you want some record or log, remove the comment # after the done
Remove the echo if you think that the output is correct

plus 1 to filename

My task is log rotation and I can't find any command which can some number from filename with 1.
For example, I have some files with name: wrapper.log.1, wrapper.log.2.
I need to rename and move that files to other directory and get wrapper_1.log, wrapper_2.log. After file was moved it should be deleted from origin directory.
It is possible, that in new folder there are files with the same name.
So, I should get last file and plus 1 to its filename like wrapper_(2+1).log.
For whole my task I found something like
find . -name "wrapper.log.*"
mkdir $(date '+ %d.%m.%y')
find . -name "wrapper.log.*" |sort -r |head -n1 | sed -E 's/(.log)(.[0-9])/_$(2+1)\1/'
But, of course, it doesn`t work after second line.
And, in future, it needs to be in bash.
P.S: Also, I think, It is possible to create just new file in new folder with timestamp or smth like that as postfix.
For example:
folder file
01.01.19 wrapper_00_00_01
wrapper_00_01_07
wrapper_01_10_53
wrapper_13_07_11
02.01.19
wrapper_01_00_01
wrapper_03_01_07
wrapper_05_10_53
wrapper_13_07_11

To find the highest number of the wrapper_ log files:
find . -type f -name "*.log" -exec basename {} \; | ggrep -Po '(?<=wrapper_)[0-9]' | sort -rn | head -n 1
I'm using grep's pearl switch to do a look-behind for "wrapper_", then reverse sorting the numbers found and taking the first one. If you want to generate a new file name, I'd use awk, e.g:
find . -type f -name "*.log" -exec basename {} \; | ggrep -Po '(?<=wrapper_)[0-9]' | sort -rn | head -n 1 | awk '{print "wrapper_"$1 + 1".log" }'
This will produce a file name with the next number in the sequence.

I don't understand your question entirely but I know that using a dollar-sign and double brackets you can execute a calculation:
Prompt>echo $((1+1))
2

Finally, I found two solutions.
First, it`s bash, smth like this
#!/bin/bash
#DECLARE
FILENAME=$1
DATE=$(date '+%d.%m.%y')
SRC_DIR="/usr/local/apache-servicemix-6.1.0/data/log"
DEST_DIR="/mnt/smxlog/$HOSTNAME"
#START
mkdir -m 777 "$DEST_DIR/$DATE"
if [ -d "$DEST_DIR/$DATE" ]
then
for f in $(find "$SRC_DIR/" -name "$FILENAME.log.*")
do
TIME=$(date '+%H_%M_%S.%3N')
NEW_FILENAME="$FILENAME-$TIME.log"
NEW_DEST_WITH_FILE="$DEST_DIR/$DATE/$NEW_FILENAME"
mv $f $NEW_DEST_WITH_FILE
gzip "$NEW_DEST_WITH_FILE"
done
else
exit 1
fi
#END
And the second variant is using log4j logger properties, but it should to upload to servicemix system folder log4j-1.2.17_fragment.jar and apache-log4j-extras-1.2.17_fragment. May be it is possible to upload them as bundle, I didn`t try.
Both jar use different API. There are
https://logging.apache.org/log4j/1.2/apidocs/index.html?overview-summary.html and
http://logging.apache.org/log4j/companions/apidocs/index.html?overview-summary.html
And properties will be
log4j.logger.wrapper.log=DEBUG, wrapper
log4j.additivity.logger.wrapper.log=false
log4j.appender.wrapper=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.wrapper.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
#This setting should be used with commented line log4j.appender.wrapper.File=... if it needs to zip to target directory immediately
#log4j.appender.wrapper.rollingPolicy.FileNamePattern=/mnt/smxlog/${env:HOSTNAME}/wrapper.%d{HH:mm:ss}.log.gz
#Or it is possible to log and zip in the same folder, and after that with cron replace zipped files to required folder
log4j.appender.wrapper.rollingPolicy.FileNamePattern=${karaf.data}/log/wrapper.%d{HH:mm:ss}.log.gz
log4j.appender.wrapper.File=${karaf.data}/log/wrapper.log
log4j.appender.wrapper.triggeringPolicy=org.apache.log4j.rolling.SizeBasedTriggeringPolicy
#Size in bytes
log4j.appender.wrapper.triggeringPolicy.MaxFileSize=1000000
log4j.appender.wrapper.layout=org.apache.log4j.PatternLayout
log4j.appender.wrapper.layout.ConversionPattern=%d{dd-MM-yyyy_HH:mm:ss} %-5p [%t] - %m%n
log4j.appender.wrapper.Threshold=DEBUG
log4j.appender.wrapper.append=true

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?

I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.

You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.

Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Can we store the creation date of a folder (not file) using bash script?

Actually I’m a newbie at Bash and I’m learning with some hands on.. I used the following stat command:
find "$DIRECTORY"/ -exec stat \{} --printf="%w\n" \; | sort -n -r | head -n 1 > timestamp.txt
where DIRECTORY is any path say, c:/some/path . It contains a lot of folders. I need to extract the creation date of the latest created folder and store it in a variable for further use. Here I started by storing it in a txt file. But the script never completes. It stays stuck at the point it reaches this command line. Please help. I'm using cygwin. I had used --printf="%y\n" to extract last Modified date of the latest folder and it had worked fine.

The command is okay (save for escaped \{} which I believe is a mistake in the post). It only seems so that it never finishes, but given enough time, it'll finish.
Direct approach - getting the path
The main bottleneck lies in executing stat for each file. Spawning process under Cygwin is extremely slow, and executing one for each of possibly thousands of files is totally infeasible. The only way to circumvent this is not spawning processes like this.
That said, I see few areas for improvement:
If you need only directories like the title of your post suggests, you can pass -type d to your find command to filter out any files.
If you need only modification time (see what means directory modification time on Linux here, I guess this may be similar in Cygwin), you can use find's built in facilities rather than stat's like this:
find "$DIRECTORY"/ -type d -printf '%TY-%Tm-%Td %TH:%TM:%TS %Tz %p\n' \
| sort -nr \
| head -n1 \
| cut -f4 -d' '
Example line before we cut the path with cut - most of stuff in -printf is used to format the date:
2014-09-25 09:41:50.3907590000 +0200 ./software/sqldeveloper/dataminer/demos/obe
After cut:
./software/sqldeveloper/dataminer/demos/obe
It took 0.7s to scan 560 directories and 2300 files.
The original command from your post took 28s without -type d trick, and 6s with -type d trick when ran on the same directory.
Last but not least, if $DIRECTORY is empty, your command will prune whole directory tree, which will take massive amount of time.
Another approach - getting just the date
If you only need creation date of a subdirectory within a directory (e.g. not the path to the directory), you can probably just use stat:
stat --printf '%Y' "$DIRECTORY"/
I'm not sure whether this includes file creations as well, though.
Alternative approaches
Since getting the last created folder is clearly expensive, you could also either:
Save the directory name somewhere when creating said directory, or
Use naming convention such as ddddyymm-name-of-directory which doesn't require any extra syscalls - just find -type d|....

You could do with a -type d option to include only the directories from the current folder, and as discussed in the comments section if you need the output from the stat in just yyyy-mm-dd format, use awk as below.
find "$DIRECTORY"/ -type d -exec stat \{} --printf="%w\n" \; | sort -n -r | head -n 1 | awk '{print $1}'
To store the value in a bash variable:-
$ myvar=$(find "$DIRECTORY"/ -type d -exec stat \{} --printf="%w\n" \; | sort -n -r | head -n 1 | awk '{print $1}')
$ echo $myvar
2016-05-20

Renaming multiple files in one line of shell

Problem
In a directory there are files of the format: *-foo-bar.txt.
Example directory:
$ ls *-*
asdf-foo-bar.txt ghjk-foo-bar.txt l-foo-bar.txt tyui-foo-bar.txt
bnm-foo-bar.txt iop-foo-bar.txt qwer-foo-bar.txt zxcv-foo-bar.txt
Desired directory:
$ ls *.txt
asdf.txt bnm.txt ghjk.txt iop.txt l.txt qwer.txt tyui.txt zxcv.txt
Solution 1
The first solution that came to my mind looks somewhat like this ugly hack:
ls *-* | cut -d- -f1 | sed 's/.*/mv "\0-foo-bar.txt" "\0.txt"/' > rename.sh && sh rename.sh
The above solution creates a script, on the fly, to rename the files. This solution also tries to parse the output of ls which is not a good thing to do as per http://mywiki.wooledge.org/ParsingLs.
Solution 2
This problem can be solved more elegantly with a shell script like this:
for i in *-*
do
mv "$i" "`echo $i | cut -f1 -d-`.txt"
done
The above solution uses a loop to rename the files.
Question
Is there a way to solve this problem in a single line such that we do not have to explicitly script a loop, or generate a script, or invoke a new or the current shell (i.e. avoid sh, bash, ., etc. commands)?

Have you tried the rename command?
For example:
rename 's/-foo-bar//' *-foo-bar.txt
If you don't have that available, I would use find, sed, and xargs:
find . -maxdepth 1 -mindepth 1 -type f -name '*-foo-bar.txt' | sed 's/-foo-bar.txt//' | xargs -I{} mv {}-foo-bar.txt {}.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio