Join all files in a directory, with a separator - bash

I have a directory containing hundreds of files (each having several chars). I want to join them into a single file with a separator, "|".
I tried
find . -type f | (while read line; do; cat $line; echo "|"; done;) > output.txt
But that created an infinite loop.

You can exclude output.txt from the output of find using -not -name output.txt (or as you already pointed out in the comments below, simply place the output file outside the target directory).
For example:
find . -type f -not -name output.txt -exec cat {} \; -exec echo "|" \; > output.txt
I've also taken the liberty to replace your while/cat/echo with a couple of -exec params so we can do the whole thing using a single find call.

*To answer the title of the question, since it's the first in google results (the output.txt problem is actually unrelated):
This is what I use to join .jar files to run Java app with files in lib/:
EntityManagerStoreImpl
ondra#lenovo:~/work/TOOLS/JawaBot/core$ ls
catalog.xml nbactions.xml nb-configuration.xml pom.xml prepare.sh resources run.sh sql src target workdir
ondra#lenovo:~/work/TOOLS/JawaBot/core$ echo `ls -1` | sed 's/\W/:/g'
catalog:xml:nbactions:xml:nb:configuration:xml:pom:xml:prepare:sh:resources:run:sh:sql:src:target:workdir
The file listing may be of course replaced with find ... or anything.
The echo is there to replace newlines with spaces.
Final form:
java -cp $(echo `ls -1 *.jar` | sed 's/\W/:/g') com.foo.Bar

I reused Ondra's answer, but with absolute path instead.
Command :
echo $( \find '/home/user/[path-to-webapp]/WEB-INF/lib' -name '*.jar' -print0) |  \
sed 's#\.jar/#.jar:#g'
Note: I use # as sed's separator to not match the last jar in the list.
Results:
/home/user/[path-to-webapp]/WEB-INF/lib/jar1.jar:home/user/[path-to-webapp]/WEB-INF/lib/jar2.jar[...
and so
on...]:/home/user/[path-to-webapp]/WEB-INF/lib/last-jar.jar
Then, I can use this output in a javac -classpath command.

Related

plus 1 to filename

My task is log rotation and I can't find any command which can some number from filename with 1.
For example, I have some files with name: wrapper.log.1, wrapper.log.2.
I need to rename and move that files to other directory and get wrapper_1.log, wrapper_2.log. After file was moved it should be deleted from origin directory.
It is possible, that in new folder there are files with the same name.
So, I should get last file and plus 1 to its filename like wrapper_(2+1).log.
For whole my task I found something like
find . -name "wrapper.log.*"
mkdir $(date '+ %d.%m.%y')
find . -name "wrapper.log.*" |sort -r |head -n1 | sed -E 's/(.log)(.[0-9])/_$(2+1)\1/'
But, of course, it doesn`t work after second line.
And, in future, it needs to be in bash.
P.S: Also, I think, It is possible to create just new file in new folder with timestamp or smth like that as postfix.
For example:
folder file
01.01.19 wrapper_00_00_01
wrapper_00_01_07
wrapper_01_10_53
wrapper_13_07_11
02.01.19
wrapper_01_00_01
wrapper_03_01_07
wrapper_05_10_53
wrapper_13_07_11
To find the highest number of the wrapper_ log files:
find . -type f -name "*.log" -exec basename {} \; | ggrep -Po '(?<=wrapper_)[0-9]' | sort -rn | head -n 1
I'm using grep's pearl switch to do a look-behind for "wrapper_", then reverse sorting the numbers found and taking the first one. If you want to generate a new file name, I'd use awk, e.g:
find . -type f -name "*.log" -exec basename {} \; | ggrep -Po '(?<=wrapper_)[0-9]' | sort -rn | head -n 1 | awk '{print "wrapper_"$1 + 1".log" }'
This will produce a file name with the next number in the sequence.
I don't understand your question entirely but I know that using a dollar-sign and double brackets you can execute a calculation:
Prompt>echo $((1+1))
2
Finally, I found two solutions.
First, it`s bash, smth like this
#!/bin/bash
#DECLARE
FILENAME=$1
DATE=$(date '+%d.%m.%y')
SRC_DIR="/usr/local/apache-servicemix-6.1.0/data/log"
DEST_DIR="/mnt/smxlog/$HOSTNAME"
#START
mkdir -m 777 "$DEST_DIR/$DATE"
if [ -d "$DEST_DIR/$DATE" ]
then
for f in $(find "$SRC_DIR/" -name "$FILENAME.log.*")
do
TIME=$(date '+%H_%M_%S.%3N')
NEW_FILENAME="$FILENAME-$TIME.log"
NEW_DEST_WITH_FILE="$DEST_DIR/$DATE/$NEW_FILENAME"
mv $f $NEW_DEST_WITH_FILE
gzip "$NEW_DEST_WITH_FILE"
done
else
exit 1
fi
#END
And the second variant is using log4j logger properties, but it should to upload to servicemix system folder log4j-1.2.17_fragment.jar and apache-log4j-extras-1.2.17_fragment. May be it is possible to upload them as bundle, I didn`t try.
Both jar use different API. There are
https://logging.apache.org/log4j/1.2/apidocs/index.html?overview-summary.html and
http://logging.apache.org/log4j/companions/apidocs/index.html?overview-summary.html
And properties will be
log4j.logger.wrapper.log=DEBUG, wrapper
log4j.additivity.logger.wrapper.log=false
log4j.appender.wrapper=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.wrapper.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
#This setting should be used with commented line log4j.appender.wrapper.File=... if it needs to zip to target directory immediately
#log4j.appender.wrapper.rollingPolicy.FileNamePattern=/mnt/smxlog/${env:HOSTNAME}/wrapper.%d{HH:mm:ss}.log.gz
#Or it is possible to log and zip in the same folder, and after that with cron replace zipped files to required folder
log4j.appender.wrapper.rollingPolicy.FileNamePattern=${karaf.data}/log/wrapper.%d{HH:mm:ss}.log.gz
log4j.appender.wrapper.File=${karaf.data}/log/wrapper.log
log4j.appender.wrapper.triggeringPolicy=org.apache.log4j.rolling.SizeBasedTriggeringPolicy
#Size in bytes
log4j.appender.wrapper.triggeringPolicy.MaxFileSize=1000000
log4j.appender.wrapper.layout=org.apache.log4j.PatternLayout
log4j.appender.wrapper.layout.ConversionPattern=%d{dd-MM-yyyy_HH:mm:ss} %-5p [%t] - %m%n
log4j.appender.wrapper.Threshold=DEBUG
log4j.appender.wrapper.append=true

bash: rename files in indeterminate sub-subfolders based on grandparent directory name

Bash newbie here trying to insert the name of a folder into certain files inside that folder.
The problem is that these files are in subfolders of subfolders of the main directory, and the names of each level are different in each case.
For example, the main folder interviews may contain John Doe and under John Doe is a directory Images with a file Screenshot.jpg. But there might also be John Smith with a folder Etc in which is 12_Screenshot 2.jpg.
I want to rename all these files containing Screenshot inserting John Doe or John Smith before the filename.
I tried adapting a couple of scripts I found and ran them from the interviews directory:
for i in `ls -l | egrep '^d'| awk '{print $10}'`; do find . -type f -name "*Screenshot*" -exec sh -c 'mv "$0" "${i}${0}"' '{}' \; done
after which the terminal gives the caret prompt, as if I'm missing something. I also tried
find -regex '\./*' -type d -exec mv -- {}/*/*Screenshot* {}/{}.jpg \; -empty -delete
which returns find: illegal option -- r
The fact that the second one theoretically moves the file up to the parent folder is not a problem since I'll have to do this eventually anyways.
The following script will work as desired :
dir=$1
find $dir -name "*Screenshot*" -type f | while read file
do
base=$(basename $file)
dirpath=$(dirname $file)
extr=$(echo $file | awk -F/ '{print $(NF-2)}') #extracts the grandparent directory
mv $file $dirpath/$extr-$base
done
As #loneswap mentioned, this must be invoked as a script. So if your main directory is mainDir, then you would invoke it as so...
./script mainDir
For each directory in current working directory, recursively find files containing string "screenshot" (case insensitive due to OSX). Split the found path into parent part (always present at least in form './') and file name, produce two lines one original file path, second one original folder + modified target file name. Execute mv command via xargs using two arguments (separate by newline to allow whitespaces in paths):
for i in `ls -l | sed -n '/^d\([^[:space:]]\+[[:space:]]\+\)\+\([^[:space:]]\+\)$/s//\2/p'`; do
find "$i" -type f -iname "*Screenshot*" \
| sed -n '\!^\(\([^/]\+/\)\+\)\([^/]\+\)$!s!!\1\3\n\1'$i'\3!p' \
| xargs -d '\n' -n 2 mv;
done
Drawback: xargs on OSX does not know --no-run-if-empty, so for directories that do not contain files with "screenshot" string empty mv is invoked. Proper option needs to be added (don't have access to OSX man pages) or xargs ... 2>&/dev/null to ignore all errors...

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

Copying list of files to a directory

I want to make a search for all .fits files that contain a certain text in their name and then copy them to a directory.
I can use a command called fetchKeys to list the files that contain say 'foo'
The command looks like this : fetchKeys -t 'foo' -F | grep .fits
This returns a list of .fits files that contain 'foo'. Great! Now I want to copy all of these to a directory /path/to/dir. There are too many files to do individually , I need to copy them all using one command.
I'm thinking something like:
fetchKeys -t 'foo' -F | grep .fits > /path/to/dir
or
cp fetchKeys -t 'foo' -F | grep .fits /path/to/dir
but of course neither of these works. Any other ideas?
If this is on Linux/Unix, can you use the find command? That seems very much like fetchkeys.
$ find . -name "*foo*.fit" -type f -print0 | while read -r -d $'\0' file
do
basename=$(basename $file)
cp "$file" "$fits_dir/$basename"
done
The find command will find all files that match *foo*.fits in their name. The -type f says they have to be files and not directories. The -print0 means print out the files found, but separate them with the NUL character. Normally, the find command will simply return a file on each line, but what if the file name contains spaces, tabs, new lines, or even other strange characters?
The -print0 will separate out files with nulls (\0), and the read -d $'\0' file means to read in each file separating by these null characters. If your files don't contain whitespace or strange characters, you could do this:
$ find . -name "*foo*.fit" -type f | while read file
do
basename=$(basename $file)
cp "$file" "$fits_dir/$basename"
done
Basically, you read each file found with your find command into the shell variable file. Then, you can use that to copy that file into your $fits_dir or where ever you want.
Again, maybe there's a reason to use fetchKeys, and it is possible to replace that find with fetchKeys, but I don't know that fetchKeys command.
Copy all files with the name containing foo to a certain directory:
find . -name "*foo*.fit" -type f -exec cp {} "/path/to/dir/" \;
Copy all files themselves containing foo to a certain directory (solution without xargs):
for f in `find . -type f -exec grep -l foo {} \;`; do cp "$f" /path/to/dir/; done
The find command has very useful arguments -exec, -print, -delete. They are very robust and eliminate the need to manually process the file names. The syntax for -exec is: -exec (what to do) \;. The name of the file currently processed will be substituted instead of the placeholder {}.
Other commands that are very useful for such tasks are sed and awk.
The xargs tool can execute a command for every line what it gets from stdin. This time, we execute a cp command:
fetchkeys -t 'foo' -F | grep .fits | xargs -P 1 -n 500 --replace='{}' cp -vfa '{}' /path/to/dir
xargs is a very useful tool, although its parametrization is not really trivial. This command reads in 500 .fits files, and calls a single cp command for every group. I didn't tested it to deep, if it doesn't go, I'm waiting your comment.

How to create a backup of files' lines containing "foo"

Basically I have a directory and sub-directories that needs to be scanned to find .csv files. From there I want to copy all lines containing "foo" from the csv's found to new files (in the same directory as the original) but with the name reflecting the file it was found in.
So far I have
find -type f -name "*.csv" | xargs egrep -i "foo" > foo.csv
which yields one backup file (foo.csv) with everything in it, and the location it was found in is part of the data. Both of which I don't want.
What I want:
For example if I have:
csv1.csv
csv2.csv
and they both have lines containing "foo", I would like those lines copied to:
csv1_foo.csv
csv2_foo.csv
and I don't anything extra entered in the backups, other than the full line containing "foo" from the original file. I.e. I don't want the original file name in the backup data, which is what my current code does.
Also, I suppose I should note that I'm using egrep, but my example doesn't use regex. I will be using regex in my search when I apply it to my specific scenario, so this probably needs to be taken into account when naming the new file. If that seems too difficult, an answer that doesn't account for regex would be fine.
Thanks ahead of time!
try this if helps it anyway.
find -type f -name "*.csv" | xargs -I {} sh -c 'filen=`echo {} | sed 's/.csv//' | sed "s/.\///"` && egrep -i "foo" {} > ${filen}_foo.log'
You can try this:
$ find . -type f -exec grep -H foo '{}' \; | perl -ne '`echo $2 >> $1_foo` if /(.*):(.*)/'
It uses:
find to iterate over files
grep to print file path:line tuples (-H switch)
perl to echo those line to the output files (using backslashes, but it could be done prettier).
You can also try:
find -type f -name "*.csv" -a ! -name "*_foo.csv" | while read f; do
grep foo "$f" > "${f%.csv}_foo.csv"
done

Resources