Merge 2 svn directories into one directory automatically - shell

I have 2 directories which have the similiar structure.
./projectA
/directory1
/file1
/directory2
/file2
/file3
/file4
/directory3
/directoryA
./projectB
/directory1
/directory2
/file3
/directory3
/directoryB
I would like to merge projectA into projectB. If a directory or file existing in A and not existing in B, do svn copy from A to B to the according target. If a file in A has an according file in B, echo a warning. How can I do this automatically? It can be shell scripts or tools....
Thanks

With this command you get all files that are in projectA but not in projectB, ignoring the .svn folder:
diff -qr projectA projectB --exclude=.svn | grep "^Only in projectA:" | cut -d: -f2 | sed 's/^ *//g
With this command you get all files that exist in both folders and that differ (i.e. files you might need to check before copying):
diff -qr projectA projectB --exclude=.svn | grep "^Files " | cut -d" " -f2 | sed 's!projectA!!g'
The second command will not work with files that have spaces in them, though.
Now that you've got two lists with the file names you need, you can easily write a small script that does the right thing with them.

Related

Copy files matching both sub directories and file names from strings in file

I want to copy files from multiple sub directories to a new directory within the main directory called copiedFiles/. I only want to copy files that can be matched to strings in the file strs2bMatchd.csv. The names of the sub directories also matches the first part of the strings to be matched (see example below).
The main directory with sub directories looks like this
main_dir/
strs2bMatchd.csv
1111/
1111_aaa1_x873.csv
1111_aaa2_x874.csv
1111_ddd1_x443.csv
1111_ddd2_x444.csv
1112/
1112_bbb1_x912.csv
1112_bbb2_x913.csv
1112_fff1_x664.csv
1112_fff2_x665.csv
1113/
1113_ccc1_x912.csv
1113_ccc2_x913.csv
The files to be copied should match the strings in strs2bMatchd.csv file
cat strs2bMatchd.csv
1111_aaa1
1111_aaa2
1112_bbb1
1112_bbb2
1113_ccc1
1113_ccc2
This is the expected result
main_dir/
strs2bMatchd.csv
1111/
1111_aaa1_x873.csv
1111_aaa2_x874.csv
1111_ddd1_x443.csv
1111_ddd2_x444.csv
1112/
1112_bbb1_x912.csv
1112_bbb2_x913.csv
1112_fff1_x664.csv
1112_fff2_x665.csv
1113/
1113_ccc1_x912.csv
1113_ccc2_x913.csv
copiedFiles/
1111_aaa1_x873.csv
1111_aaa2_x874.csv
1112_bbb1_x912.csv
1112_bbb2_x913.csv
1113_ccc1_x912.csv
1113_ccc2_x913.csv
As an alternative, consider
M=main_dir
mkdir -p $M/copiedFiles
find $M | grep -F -v "$M/copiedFiles" | grep -Ff $M/strs2bMatch.csv | xargs cp -t $M/copiedFiles/
It will execute the find once.
If the files may contain space or other special characters, consider using the safe version (NUL terminated strings) - for find (-print0), grep (-z -Z) xargs (-0).
Update 1: OP indicate his version of cp does not have -t. Alternative solution, without this options is posted.
I can not test, please try to solve problem using man, etc.
M=main_dir
mkdir -p $M/copiedFiles
find $M | grep -F -v "$M/copiedFiles" | grep -Ff $M/strs2bMatch.csv | xargs -I{} cp {} $M/copiedFiles/

Compare two folders and copy missing files to new folder based on sequence numbers using bash

I have two folders with sequentially numbered files. Folder "Originals" contains all the files, but some are missing in folder "Modified". Is there a way to copy those that are missing in folder "Modified" from folder "Originals" to a new folder using bash?
The files are different in content and filename, but are related in their filename's ending numbering.
The files are still images - .png - from a video that have been modified using Imagemagick. Ten folders each contain 15000 images, with about 100 missing irregularly from each "Modified" folder due to errors while processing with Imagemagick.
Originals:
xy_abc_00000.png
xy_abc_00001.png
xy_abc_00002.png
.
.
xy_abc_15000.png
Modified:
zz_def_00000.png
zz_def_00002.png
.
.
zz_def_14999.png
list="$(diff <(ls -X Originals | sed "s:^.*[^0-9]\([0-9]*.png\)$:\1:") \
<(ls -X Modified | sed "s:^.*[^0-9]\([0-9]*.png\)$:\1:"))"
for file in $(grep "^<" <<<"$list" | cut -d" " -f2); do
cp Originals/xy_abc_$file Modified/zz_def_$file
done
Not overly elegant, would break on names with spaces, but still suitable for the task described in the OP post.

find - grep taking too much time

First of all I'm a newbie with bash scripting so forgive me if i'm making easy mistakes.
Here's my problem. I needed to download my company's website. I accomplish this using wget with no problems but because some files have the ? symbol and windows doesn't like filenames with ? I had to create a script that renames files and also update the source code of all files that calls the rename file.
To accomplish this I use the following code:
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
grep -rl "$SUBSTRING" * | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
done
This is having 2 problems.
This is taking way too long, I've waited more than 5 hours and is still going.
It looks like is doing a append in the source code because when i stop the script and search for changes the URL is repeated like 4 times ( or more ).
Thanks all for your comments, i will try the 2 separete step and see, also, just as FYI, there are 3291 files that were downloaded with wget, still thinking that using bash scripting is prefer over other tools for this?
Seems odd that a file would have ? in it. Website URLs have ? to indicate passing of parameters. wget from a website also doesn't guarantee you're getting the site, especially if server side execution takes place, like php files. So, I suspect as wget does its recursiveness, it's finding url's passing parameters and thus creating them for you.
To really get the site, you should have direct access to the files.
If I were you, I'd start over and not use wget.
You may also be having issues with files or directories with spaces in their name.
Instead of that line with xargs, you're already doing one file at a time, but grepping for all recursively. Just do the sed on the new file itself.
Ok, here's the idea (untested):
in the first loop, just move the files and compose a global sed replacement file
once it is done, just scan all the files and apply sed with all the patterns at once, thus saving a lot of read/write operations which are likely to be the cause of the performance issue here
I would avoid to put the current script in the current directory or it will be processed by sed, so I suppose that all files to be processed are not in the current dir but in data directory
code:
sedfile=/tmp/tmp.sed
data=data
rm -f $sedfile
# locate ourselves in the subdir to preserve the naming logic
cd $data
# rename the files and compose the big sedfile
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
echo "s/$SUBSTRING/$NEWSTRING/g" >> $sedfile
done
# now apply the big sedfile once on all the files:
# if you need to go recursive:
find . -type f | xargs sed -i -f $sedfile
# if you don't:
sed -i -f $sedfile *
Instead of using grep, you can use the find command or ls command to list the files and then operate directly on them.
For example, you could do:
ls -1 /path/to/files/* | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
Here's where I got the idea based on another question where grep took too long:
Linux - How to find files changed in last 12 hours without find command

Keep latest file and move older files to a folder using UNIX Shell script

I have many files (File Format: ABC_YYYYMMDD.TXT) in my folder.
- ABC_20150101.TXT
- ABC_20150201.TXT
- ABC_20150301.TXT
- ABC_20150501.TXT
I need output as below.
- ABC_20150101.TXT - Moved to a folder name ARCHV in current path.
- ABC_20150201.TXT - Moved to a folder name ARCHV in current path.
- ABC_20150301.TXT - Moved to a folder name ARCHV in current path.
- ABC_20150501.TXT - Kept in the current path, since it is latest.
That is latest file kept in the current folder itself. But other files will be moved to another folder in the present working directory named /ARCHV.
Please let me know the UNIX statement do the task.
Thanks
Here is a quick solution, which relies on some installed programs:
$ find -maxdepth 1 -type f -iname 'ABC*.TXT' -printf '%T#|%p\n' | sort -r -n | tail -n +2 | cut -d'|' -f2 | xargs -i mv {} ARCHV
find prints the filenames with a preceeding unix timestamp
sort sorts them by timestamp
tail removes the first (most recent file)
cut takes the filenames only (removes the timestamp)
xargs mv moves the files
quick and dirty:
ls -1|awk 'p{printf "mv %s /archive\n",$0}{p=$0}'|sh
test this line under the directory containting those ABC_.... files
ls without any sorting options will sort the list by name.
pipe the result to the awk, skip the last line (file)
remove the ending |sh will see the output generated by the command. If everything is ok, add the |sh will make those commands get executed
I see your example file name doesn't have spaces. If they do contain spaces, change the mv %s into mv \"%s\"
The target archive in my one-liner was named as /archive, you can change it into the right one.

Batch to rename files with metadata name

I recently accidentally formatted a 2TB hard drive mac os jounaled!
I was able to recover files with Data Rescue 3, the only problem is the program didn't gave me the files as they were, root tree, and name.
For example I had
|-Music
||-Enya
|||-Sonadora.mp3
|||-Now we are free.mp3
|-Documents
||-CV.doc
||-LetterToSomeone.doc
...and so on
And now I got
|-MP3
||-M0001.mp3
||-M0002.mp3
|-DOCUMENTS
||-D0001.doc
||-D0002.doc
So with a huge amount of data it would take me centuries to manually open, see what is it and rename.
Is there some batch which can scan all my subfolders and take the previous name? By metadata perhaps?
Or do you know a better tool which will keep the same name and path of files (doesn't matter if must pay, ther's always a solution for that :P)
Thank you
My contribution for you music at least...
The idea is to go through all of the MP3 files found, and distributed them based on their ID3 tags.
I'd do something like :
for i in `find /MP3 -type f -iname "*.mp3"`;
do
ARTIST=`id3v2 -l $i | grep TPE1 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Artist
ALBUM=`id3v2 -l $i | grep TALB | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Album title
TRACK_NUM=`id3v2 -l $i | grep TRCK | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets the track ID/position, like "2/13"
TR_TITLE=`id3v2 -l $i | grep TIT2 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # Track title
mkdir -p /MUSIC/$ARTIST/$ALBUM/;
cp $i /MUSIC/$ARTIST/$ALBUM/$TRACK_NUM.$TR_TITLE.mp3
done
Basically :
* It looks for all ".mp3" files in /MP3
* then analyses each file's ID3 tags, and parses them to fill 4 variables, using "id3v2" tool (you'll need to install it first). The tags are cleaned to get only the value, sed is used to trim the leading spaces that might pollute.
* then creates (if needed), a tree in /MUSIC/ with Artist name and album name
* then copies the input files to the new tree, and renames it thanks to the tags.

Resources