I have a problem with writing the output of a program to a different directory when I loop different files as variables as inputs. I run this in the command line. The problem is that I do not know how to "tell" the program to put the output with a changed filename into another directory than the input directory.
Here is the command, although it is a bioinformatic tool which requires specific input file formats. I am sorry that I could not give a better example. Nonetheless, the program is called computeMatrix in a software-tool box called deeptools2.
command:
for f in ~/my/path/*spc_files*; do computeMatrix reference-point--referencePoint center --regionsFileName /target/region.bed --binSize 500 --scoreFileName "$f" **--outFileName "$f.matrix"** ; done \
So far, I tried to use the command basename to just get the filename and then change the directory before that. However I could not figure out:
if this is combinable
what is the correct order of the commands (e.g.:
outputFile='basename"$f"', "~/new/targetDir/'basename$f'")
Probably there are other options to solve the problem which I could not think of/ find.
I'm trying to get this code to process all files in a directory : https://github.com/kieranjol/ifi-ffv1/blob/master/ifi-ffv1.sh
I run it in the terminal and add path to file ./ifi-ffv1.sh /path/to/file.mov. How can I get it to move on to the next? I'll also need to make sure that it only processes AV files, such as .avi/.mkv/*.mov etc.
I've tried using while loops with shift but I can't get that to work either.
I've tried adding a specific path like here but I'm failing http://www.cyberciti.biz/faq/unix-loop-through-files-in-a-directory/
I've tried this https://askubuntu.com/a/315338 and it keeps looping the same file rather than moving on to the next one. http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html this didn't help me either.
I know this is going to be a horribly simple solution but I'm very new to this.
You don't actually have any kind of loop in your code. You need to do something like
for file in path/to/*.avi path/to/*.avg
do
./ifi-ffv1.sh "$file"
done
which will loop through all the specified files and substitute each one for $1
You can put whatever file names you want instead of the path/to/*.avi path/to/*.avg. If you cd to the directory first, you can leave out the paths, and just use *.avi *.avg
To do it all in one script, do something like this:
cd <your directory>
for file in *.avi *.avg
do
<your existing script here>
done
replacing all the $1's in your script with "$file" (not duplicating any quotes you already have, of course)
This is not a duplicate as I want to achieve the result without using Windows tools as mentioned in this post https://superuser.com/questions/78434/tool-for-deleting-directories-with-path-names-too-long-for-normal-delete
I am deleting SVN ignored files that Maven has created (something like this)
jaxb/com.mycompany.test.jaxb.upload-document/target
jaxb/com.mycompany.test.jaxb.fpml/target
jaxb/com.mycompany.test.jaxb.fixml/target
jaxb/com.mycompany.test.jaxb.trade-header/target
jaxb/com.mycompany.test.jaxb.schema-versioning/target
test-risk/target
I am using the following command but it fails when the path is too long on Windows.
echo "Deleting ignored files"
svn stat --no-ignore | awk '$1=="I" { print $0 }'|awk '{$1=""; print $0}'|tr '\\' '/'|while read file;do rm -rf "$file" ;done;
This is run as part of a script that runs on Unix as well as Windows which is why I can't use any of the previously mentioned tools.
What I need to do is write a routine that checks to see if there are any errors due to long file length and recursively rename the directories from the top down to the bottom with single characters in an attempt to shorten the path.
I'm not sure how to check for the errors ("Directory not empty" and " File or path name too long").
Ideally I'd like to fail the "rm -rf" immediately rather than throwing the error hundreds of times.
Finally, I'm not sure what the most efficient way to recursively rename the directories is.
EDIT: If I just assume that all directories are going to cause the issue I could get the roots cd to them in turn and recursively rename the sub-dirs. This way I don't have to worry about catching errors from rm.
I have been working on how to verify that millions of files that were on file system A have infact been moved to file system B. While working on a system migration, it became evident that all the files needed to be audited to prove that the files have been moved. The files were initially moved via rsync, which does provide logs, although not in a format that is helpful for doing an audit. So, I wrote this script to index all the files on System A:
#!/bin/bash
# Get directories and file list to be used to verify proper file moves have worked successfully.
LOGDATE=`/usr/bin/date +%Y-%m-%d`
FILE_LIST_OUT=/mounts/A_files_$LOGDATE.txt
MOUNT_POINTS="/mounts/AA mounts/AB"
touch $FILE_LIST_OUT
echo TYPE,USER,GROUP,BYTES,OCTAL,OCTETS,FILE_NAME > $FILE_LIST_OUT
for directory in $MOUNT_POINTS; do
# format: type,user,group,bytes,octal,octets,file_name
gfind $directory -mount -printf "%y","%u","%g","%s","%m","%p\n" >> $FILE_LIST_OUT
done
The file indexing works fine and takes about two hours to index ~30 million files.
On side B is where we run into issues. I have written a very simple shell script that reads the index file, tests to see if the file is there, and then counts up how many files are there, but it's running out of memory while looping through the 30 million lines on indexed file names. Effectively doing this little bit of code below through a while loop, and counters to increment for files found and not found.
if [ -f "$TYPE" "$FILENAME" ] ; then
print file found
++
else
file not found
++
fi
My questions are:
Can a shell script do this type of reporting from such a large list. A 64 bit unix system ran out of memory while trying to execute this script. I have already considered breaking up the input script into smaller chunks to make it faster. Currently it can
If as shell script is inappropriate, what would you suggest?
You just used rsync, use it again...
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn’t affect the data that goes into the file-lists, and thus it doesn’t affect deletions. It just limits the files that the receiver requests to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to continue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hierarchy (when it is used properly), using --ignore existing will ensure that the already-handled files don’t get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that this option is only looking at the existing files in the destination hierarchy itself.
That will actually fix any problems (at least in the same sense that any diff-list on file-exist tests could fix problem. Using --ignore-existing means rsync only does the file-exist tests (so it'll construct the diff list as you request and use it internally). If you just want information on the differences, check --dry-run, and --itemize-changes.
Lets say you have two directories, foo and bar. Let's say bar has three files, 1,2, and 3. Let's say that bar, has a directory quz, which has a file 1. The directory foo is empty:
Now, here is the result,
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/
>f+++++++++ 1
>f+++++++++ 2
>f+++++++++ 3
cd+++++++++ quz/
>f+++++++++ quz/1
Note, you're not interested in the cd+++++++++ -- that's just showing you that rsync issued a chdir. Now, let's add a file in foo called 1, and let's use grep to remove the chdir(s),
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/ | grep -v '^cd'
>f+++++++++ 2
>f+++++++++ 3
>f+++++++++ quz/1
f is for file. The +++++++++ means the file doesn't exist in the DEST dir.
Here is the bonus, remove --dry-run, and, it'll go ahead and make the changes for you.
Have you considered a solution such as kdiff3, which will diff directories of files ?
Note the feature for version 0.9.84
Directory-Comparison: Option "Full Analysis" allows to show the number
of solved vs. unsolved conflicts or deltas vs. whitespace-changes in
the directory tree.
There is absolutely no problem reading a 30 million line file in a shell script. The reason why your process failed was most likely that you tried to read the file entirely into memory, e.g. by doing something wrong like for i in $(cat file).
The correct way of reading a file is:
while IFS= read -r line
do
echo "Something with $line"
done < someFile
A shell script is inappropriate, yes. You should be using a diff tool:
diff -rNq /original /new
If you're not particular about the solution being a script, you could also look into meld, which would let you diff directory trees quite easily and you can also set ignore patterns if you have any.
Is there a way to tell when a file was moved to a certain directory?
I'm being asked why a script of mine did not find a file in a certain directory. The file was created last January but I suspect it was placed in the directory after the script was run. Is there a way for me to confirm my suspicion?
Viewing the file properties gives me the created, modified, and accessed times, and the first two do not change when moving files from one directory to another.
EDIT: I have cygwin installed, if that helps at all. Is there a unix way of determining when a directory entry was created?
If the file in question can be shown to have been the last file added to that directory, you can look at the last modified date of the directory itself, since directories are modified when files are inserted into them. Otherwise, I don't hold much hope.
If you're on Windows XP or 2000 or higher, you should be able to use dir /tc to get the creation time of the file (which will be when it was copied to the directory). Under Cygwin, you can use ls -lc.
Using wmic and or creating a layer for yourself really helps when using cyging. For example a function like this will return everything in the actual windows properties dialog for a file...
finfo() { [[ -f "$(cygpath "$#")" ]] || { echo "bad-file";return 1;}; echo "$(wmic datafile where name=\""$(echo "$(cygpath -wa "$#")"|sed 's/\\/\\\\/g')"\" get /value)"|sed 's/\r//g;s/^M$//;/^$/d'|awk -F"=" '{print $1"=""\033[1m"$2"\033[0m"}';}
This way regardless of how the file was touched you have multiple ways of knowing.
CMD Line FU Info link