I have a tricky question about how to keep the latest log data as my server reposted it two times
This is the result after I grep from my folder :(i have tons of data, just to keep it simpler)
...
20150630-201427.csv:20150630,CFIIASU,233,96.21786,0.44644,
20150630-201427.csv:20150630,CFIIASU_AU,65,90.71109,0.28569
20150630-201427.csv:20150630,CFIIASU_CN,68,102.19569,0.10692
20150630-201427.csv:20150630,CFIIASU_ID,37,98.02484,0.27775
20150630-201427.csv:20150630,CFIIASU_KR,39,98.42257,0.83055
20150630-201427.csv:20150630,CFIIASU_TH,24,99.94482,0.20743
20150701-151654.csv:20150630,CFIIASU,233,96.21450,0.44294
20150701-151654.csv:20150630,CFIIASU_AU,65,90.71109,0.28569
20150701-151654.csv:20150630,CFIIASU_CN,68,102.16538,0.07723
20150701-151654.csv:20150630,CFIIASU_ID,37,98.02484,0.27775
20150701-151654.csv:20150630,CFIIASU_KR,39,98.42257,0.83055
20150701-151654.csv:20150630,CFIIASU_TH,24,99.94482,0.20743
...
The data actually came from many csv files, I only pick two csv files to make the example, and here are some explainations of this:
the example came from two files 20150630-201427.csv and 20150701-151654.csv, and it has 4 columns which correspond to date, datanme, data_column1, data_column2, data_column3.
these line have the same data date 20150630 and the same dataname CFIIASU,CFIIASU_AU...etc, but the numbers in the fourth and fifth column (which are data_column2 and data_column3) are different.
How could i keep the data of 20150701-151654.csv based on the file's name and data date and apply it on my whole data set?
To make it more clearly. I'd like to keep the lines of "the latest csv" and since the latest csv is corresponding to the file's name, which in this example is 2015070. but when it comes to my whole data set i need to handle with so many 20xxxxxx.csv that i can't check it one by one.
for the example, i made this should end up like this:
20150701-151654.csv:20150630,CFIIASU,233,96.21450,0.44294
20150701-151654.csv:20150630,CFIIASU_AU,65,90.71109,0.28569
20150701-151654.csv:20150630,CFIIASU_CN,68,102.16538,0.07723
20150701-151654.csv:20150630,CFIIASU_ID,37,98.02484,0.27775
20150701-151654.csv:20150630,CFIIASU_KR,39,98.42257,0.83055
20150701-151654.csv:20150630,CFIIASU_TH,24,99.94482,0.20743
Thanks in advance.
Your question isn't clear but it sounds like this might be what you're trying to do (print all lines from the last csv mentioned in the input file):
$ tac file | awk -F':' 'NR>1 && $1!=prev{exit} {print; prev=$1}' | tac
20150701-151654.csv:20150630,CFIIASU,233,96.21450,0.44294
20150701-151654.csv:20150630,CFIIASU_AU,65,90.71109,0.28569
20150701-151654.csv:20150630,CFIIASU_CN,68,102.16538,0.07723
20150701-151654.csv:20150630,CFIIASU_ID,37,98.02484,0.27775
20150701-151654.csv:20150630,CFIIASU_KR,39,98.42257,0.83055
20150701-151654.csv:20150630,CFIIASU_TH,24,99.94482,0.20743
or maybe this (print the last line seen for every 20150630,CFIIASU etc. pair in the input file):
$ tac file | awk -F'[:,]' '!seen[$2,$3]++' | tac
20150701-151654.csv:20150630,CFIIASU,233,96.21450,0.44294
20150701-151654.csv:20150630,CFIIASU_AU,65,90.71109,0.28569
20150701-151654.csv:20150630,CFIIASU_CN,68,102.16538,0.07723
20150701-151654.csv:20150630,CFIIASU_ID,37,98.02484,0.27775
20150701-151654.csv:20150630,CFIIASU_KR,39,98.42257,0.83055
20150701-151654.csv:20150630,CFIIASU_TH,24,99.94482,0.20743
What I'm trying to do is to compare the content of two different files. I don't know what I'm doing wrong, but things I searched online regarding diff command didn't work.
For example if fileA content is this:
AAA:111
BBB:222
CCC:333
And fileB content is:
AAA:111
BBB:222
All I want to see as an output is the difference which is CCC:333. No "<" no ">", just plainly CCC:333. I want to use this later in the bash script I'm working on.
Also would it matter if those files were reversed? I mean if it was fileB containing CCC:333?
I don't know if it matters, but the files I'm working on are MAC addresses.
Is the diff command I was trying to use case sensitive?
You can use two diff options as follows :
diff --changed-group-format='%<' --unchanged-group-format='' fileA fileB
If anyone else was looking for those answers I only wanted to add that they both work!
The sort and uniq solution by Cyrus will show you differences in those two files (if the difference would be that they both have lines aaa and bbb, but only one would have xxx and the other had yyy, it would print out those two lines xxx and yyy).
The diff command solution by Philippe can give you different output because it depends if you put fileA first then fileB or if you will put fileB first and then fileA.
Test it yourself.
Correct me if I'm wrong please!
Thank you for your help.
I was banging my head into my wall looking for an easy way to do this but couldn't find anything online, so I figured I'd share the solution I came up with.
This is useful for when you need to break apart specific diff files.
So while working on a bigger company project that required the issue being broken into multiple tickets, I found myself needing to create separate diff/patch files for each ticket, even though I was working on the same branch with quite a few file changes. Each patch file needed to only contain the differences for specific files, which you can do with:
git diff master FILENAME_X FILENAME_Y FILENAME_Z (etc.)
but that was quite time consuming to do manually for each file every single time I needed to generate a patch, and it includes the possibility of me forgetting files every time.
The following command will create a diff/patch file of all files included in a different patch file:
`cat NAME_OF_INPUT_DIFF.diff | grep "+++" | awk -F "+++ b/" '{if (NR == 1)printf "git diff master " $2; else printf " " $2}'` > NAME_OF_OUTPUT_DIFF.diff
The cat NAME_OF_INPUT_DIFF.diff | grep "+++" breaks the previous diff into only the lines with the names of files that were added/modified, then awk -F "+++ b/" breaks that into just the parsed names of the files, stripping the preceding characters. The rest creates the lengthy git command to generate a diff of specific files, and wrapping it all in backticks makes it be evaluated on execution. Then > NAME_OF_OUTPUT_DIFF.diff outputs the resulting diff to a file.
Anyways, I hope this helps someone like it did me! Quite a timesaver.
I am trying automate a redundant deployment process in my project. In order to achieve that I am trying to get the difference between two branches using "git diff" -- Someway and I am able to achieve that using the following command.
git diff <BRANCH_NAME1> -- common_folder_name/ <BRANCH_NAME2> -- common_folder_name/ > toStoreResponse.txt`
Now the response that I get, looks something like below:
diff --git a/cmc-database/common/readme.txt b/cmc-database/common/readme.txt
index 7820f3d..5a0e484 100644
--- a/cmc-database/common/readme.txt
+++ b/cmc-database/common/readme.txt
## -1 +1,5 ##
-This folder contains common database scripts.
\ No newline at end of file
+This folder contains common database scripts.
+TEST STTESA
\ No newline at end of file
So here in the above response only line/text that is a new line or the difference between the two branches is TEST STTESA and I want to store only that much of text in some different text file using shell / git way.
i.e a file named readme.txt which will only contain TEST STTESA content.
Work around Solution:
I have found a workaround to filter the response - but however it is not 100% what I am looking for. Command looks like below:
git diff <Branch_Name1> -- common-directory/ <Branch_Name2> -- common-directory/ | grep -v common-directory | grep -v index | grep -v # | grep -v \\
The above command returns below response:
-This folder contains common database scripts.
+This folder contains common database scripts.
+TEST STTESA
But I want to be able to store only the difference which is TEST STTESA
As you can easily realize, your solution won't work every time. The grep -v parts make it unportable.
Here is a "step0" solution : You want to match lines that start with a "+" or a "-" and then neither a "+" nor a "-". Use grep for that !
git diff ... | grep "^+[^+]\|^-[^-]"
Some explanation :
First, the \| part in the middle is an "or" statement.
Then, each side starts with a ^ which refers to the beginning of the line. And finally, after the first character, we want to reject some characters, using the [^...] syntax.
The line above translates to English as "Run the diff, and find all the lines that either start with a +, followed by something that is not a +, OR start with a -, followed by something that is not a -.
This will not work properly if you remove a line that started with a -. Nor if you add a line that starts with a +.
For such scenarii, I would tinkle with git diff --color and grep some [32m for the fun.
--diff-filter=[ACDMRTUXB*]
Select only files that are
A Added
C Copied
D Deleted
M Modified
R Renamed
T have their type (mode) changed
U Unmerged
X Unknown
B have had their pairing Broken
and * All-or-none
Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.