How do I generate a diff or patch file only showing the differences of the files included in another patch file? - bash

I was banging my head into my wall looking for an easy way to do this but couldn't find anything online, so I figured I'd share the solution I came up with.
This is useful for when you need to break apart specific diff files.

So while working on a bigger company project that required the issue being broken into multiple tickets, I found myself needing to create separate diff/patch files for each ticket, even though I was working on the same branch with quite a few file changes. Each patch file needed to only contain the differences for specific files, which you can do with:
git diff master FILENAME_X FILENAME_Y FILENAME_Z (etc.)
but that was quite time consuming to do manually for each file every single time I needed to generate a patch, and it includes the possibility of me forgetting files every time.
The following command will create a diff/patch file of all files included in a different patch file:
`cat NAME_OF_INPUT_DIFF.diff | grep "+++" | awk -F "+++ b/" '{if (NR == 1)printf "git diff master " $2; else printf " " $2}'` > NAME_OF_OUTPUT_DIFF.diff
The cat NAME_OF_INPUT_DIFF.diff | grep "+++" breaks the previous diff into only the lines with the names of files that were added/modified, then awk -F "+++ b/" breaks that into just the parsed names of the files, stripping the preceding characters. The rest creates the lengthy git command to generate a diff of specific files, and wrapping it all in backticks makes it be evaluated on execution. Then > NAME_OF_OUTPUT_DIFF.diff outputs the resulting diff to a file.
Anyways, I hope this helps someone like it did me! Quite a timesaver.

Related

Replace/sync only certain lines using Bash, SSH and rsync

I am looking for a quick and dirty one-liner to sync only certain settings in remote config files. Need to preserve what's unique and sync generic settings. Example:
Config1.conf:
HOSTNAME=COMP1
IP=10.10.13.10
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
Remote-Config2.txt:
HOSTNAME=COMP2
IP=10.10.13.11
LOCATION=FOO
BUILDING=BAR
ROOM=BAZ
I need to sync or copy replace only the bottom 3 lines over ssh. The line numbers are predictable, by the way. Always lines 4,5 and 6 in this case.
Here's a working idea that is missing one piece (a standard replacement for the non-standard utility I used to replace the vars in the local conf):
for var in $(ssh root#10.10.8.12 'sed -n "4,6p" /etc/conf1.conf');do <missing piece> ${var/=*}=${var/*=} local-conf.conf; done
So this uses variable expansion and a non-standard utility but needs like a sed or Perl routine to replace the info in the local conf.
Update
The last line of code actually works. Tested and works! However -- the missing piece is a custom non-standard utility. I'm asking if someone can think of something, using standard Linux tools, to replace that.
One solution would be to take the left side and match, then replace the right side. This is basically what that utility does. Looks for the variable in the conf then sets it. Using variable expansion is one way (shown).
Here's an alternative solution that does not require the command to have special knowledge of the file contents:
Take a copy of the files you want to sync. Then, in the copy, deliberately vandalise (arbitrarily modify) the lines you do not want synced. It doesn't matter what they say as long as there are the same number of lines and they'll never match the actual file contents. Have some fun. This becomes your base version. Your example might look like this:
HOSTNAME=foo
IP=bar
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
rsync the remote files into a temporary location. This is the remote version.
For each file, take a three-way diff.
diff3 -3 <localfile> <basefile> <remotefile>
The output of diff3 is an "ed script" that decribes what edits to make to the local file so that it would look like the remote file.
The -3 option tells it to only output the non-conflicting differences. This is why we vandalised the base files in the first place: so those lines would have conflicts.
Once you have the ed script for a file, you can visually check it, if you choose, and then apply the update using patch:
cat <ed-script> | patch --ed <localfile>
So, to do this recursively, you might have:
cd $localdir
for file in `find . -type f`; do
diff3 -3 "$file" "$basedir/$file" "$remotedir/$file" | patch --ed "$file"
done
You probably need to add some checks that the base and remote files actually exist.

Using S3cmd, how do I get the first and last file in a folder?

I'm doing some processing on Hive. Usually, the result of this process is a folder (on S3), with multiple files (named with some random letters and numbers, in order) that I can just 'cat' together.
But for reports, I only need the first and the last file in the folder. Now, if the files number in the hundreds, I can simply download it via the web-gui.
But if it's in the thousands, scrolling down is a pain. Not to mention, Amazon loads things on the fly when needed, as opposed to showing it all.
I tried s3cmd get but my experience with that is basic at best. I end up downloading the contents of the entire folder.
As far as I know one can pipe in extra commands, but I'm not sure how to do that.
So, how do I use s3cmd get to download only the last file in a specific folder?
Thanks.
I guess this command should work for you,
s3cmd get $(s3cmd ls s3://bucket_name/folder_name/ | tail -1 | awk '{ print $4 }')
tail -1 will pick the last line in folder listing and awk '{ print $4 }' will pick the name of the file(fourth field).
For first file just replace tail -1 with head -1

How can I debug .gitignore file handling?

I’m having lots of trouble convincing git to ignore files in my project.
Basically, sometimes it works, sometimes it just seems to ignore the .gitignore file for no obvious reason. (By “seems” I mean that there are patterns in it that look as if they should exclude something, but that something is not excluded.)
There’s a 'git check-ignore' command, but it only says which pattern matched a file. But I can’t find any option to make it say which patterns it’s found and where, nor why those patterns do not match a file.
Is there a way to do this kind of debugging?
P.S. There is a single issue which I did find, and I’m mentioning it here in case it helps others:
I was adding patterns using “echo pattern >> .gitignore”, which at least on my system results in spaces at the of the line (i.e., everything between “echo” and “>>” is echoed in the file, except for the first space character after “echo”).
Git does not trim those spaces when matching patterns, so for the command above it wouldn’t match a file named “pattern” but it would match “pattern{space}”.
I think most of my issues stem from this. Those extra spaces are hard to notice, so I’d still like a debug command that makes sure I notice them, if there is one.
Edit #1:
Yes, I did try -v. For example:
> mkdir test
> touch test/file.txt
> echo test >> .gitignore
> git check-ignore -v test/file.txt
(nothing is printed)
> echo test>> .gitignore
> git check-ignore -v test/cuc.txt
.gitignore:8:test test/cuc.txt
Note the extra space in the first echo line, which makes it enter “test[space]” as a pattern. As I mentioned, “check-ignore” tells you what matched, but it doesn’t tell you what didn’t nor why.

move files/directpries with similar names, matching rule

We were moving about 4TB of files to a Windows Server08R2 from a Mac Sever. A lot of the filenames wouldn't go because of incompatible characters. We ran a renaming tool to fix the problem and copied again.
My problem is now I have a lot of folders that have very similar names like "O'Neil" and "O_Neil". In fact as far as I can tell they all conform to this rule.
There are too many to do by hand, I'm thinking of writing a script but I have limited experience with scripting. I'd like to compare the modify date or the filesize and merge or move the folders to an archive and leave one set in place. I'm not sure about best practice in this situation.
1) In theory what is the best practice, merge by date, archive the smaller versions?
2) In practice how can I go about fixing this are there tools? Script ideas?
Any help is hugely appreciated.
find /path -type f -print0 | xargs -0 md5sum |
awk '
{
if ($1 in seen)
printf "duplicate: %s and %s\n", $2, seen[$1]
else
seen[$1] = $2
}
'
Removing files with duplicate content from single directory [Perl, or algorithm]

method for merging two files, opinion needed

Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.

Resources