Removing Lines of Text That Exist in Another File [duplicate] - windows

This question already has answers here:
How to remove the lines which appear on file B from another file A?
(12 answers)
Closed 9 years ago.
I’ve got two text files, each with several hundred lines. Some of the lines exist in both files, and I want to remove those so that they exist in only one of the files. Basically, I want to reduce them to get a unique set of lines. The catch is that I can’t sort them (they are stripped-down dumps of my Chromium history).
What is the easiest way to do this?
I tried WinDiff, but that gave incorrect results. I figure that I could knock together a PHP script in a while, but am hoping that there is an easier way (preferably a command-line tool).

Well, I ended up writing a PHP script after all.
I read both files into a string, then exploded the strings into arrays using \r\n as the delimiter. I then iterated through the arrays to remove any elements that exist, and finally dumped them back out to a file.
The only problem was that by trying to refactor the stripping routine to a function, I found that passing the array that gets changed (elements removed) by reference caused it to slow down to the point of needing to be Ctrl-C’d, so I just passed by value and returned the new array (counterintuitive). Also, using unset to delete the elements was slow no matter what, so I just set the element to an empty string and skipped those during the dump.

If you have a bash shell (cygwin), the following shell commands would remove all lines that appear in both files from a.txt:
comm -12 <(sort a.txt|uniq) <(sort b.txt|uniq) | while read dupe; do dupe_escaped=$(echo "$dupe" | sed 's/[][\.*^$/]/\\&/g'); sed -e "/${dupe_escaped}/d" -i a.txt; done

Related

Replace a whole line using sed [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 4 years ago.
I am very new to this all and have used this website to help me find the answers i'm looking for.
I want to replace a line in multiple files across multiple directories. However I have struggled to do this.
I have created multiple directories 'path_{0..30}', each directory has the same 'input' file, and another file 'opt_path_rx_00i.xyz' where i corresponds to the directory that the file is in (i = {0..30}).
I need to be able to change one of the lines (line 7) in the input file, so that it changes with the directory that the input file is in (path_{0..30}). The line is:
pathfile opt_path_rx_00i.xyz
Where i corresponds to the directory that the file is in (i={0..30})
However, i'm struggling to do this using sed. I manage to change the line for each input file in the respective directories, but i'm unable to ensure that the number i changes with the directory. Instead, the input file in each directory just changes line 7 to:
pathfile opt_path_rx_00i.xyz
where i, in this case, is the letter i, and not the numbers {0..30}.
I'll show what i've done below in order to make more sense.
for i in {0..30}
do
sed -i '7s/.*/pathfile-opt_path_rx_00$i.xyz/' path_$i/input
done
What I want to happen is, for example in directory path_3, line 7 in the input file will be:
pathfile opt_path_rx_003.xyz
Any help would be much appreciated
can you try with double quotes
for i in {0..30}; do
sed -i "7s/.*/pathfile-opt_path_rx_00$i.xyz/" "path_$i/input"
done

Removing Timestamp from multiple files [duplicate]

This question already has answers here:
remove date from filename but keep the file extension
(2 answers)
Closed 4 years ago.
Is there a quick and clever way to remove various timestamps from multiple files with different names? The timestamp format always remains the same, although the values differ. An example of my files would be...
A_BB_CC_20180424_134312
A_B_20180424_002243
AA_CC_DD_E_20180424_223422
C_DD_W_E_D_20180423_000001
with the expected output
A_BB_CC
A_B
AA_CC_DD_E
C_DD_W_E_D
Notice the last file has a different timestamp, I don't mind if this is a day specific timestamp removal or all, or two variations. My problem is I can't think of the code for an ever changing time value :(
Thanks in advance
EDIT - Adding edit in to show why this is not a duplicate as Tripleee thinks. His duplicate link is for files with the same prefix, my question is about files with different names so the answer is different.
Using parameter expansion %% bashism that removes the end of the filename:
for i in /my/path/*; do mv "$i" "${i%%_2018*}"; done
This relies on the timestamp that start with 2018...
Using awk:
for i in /my/path/*; do mv "$i" $(awk -v FS=_ 'NF-=2' OFS="_" <<< "$i"); done
This awk script is based on the field separator _. It prints the filename without the last 2 field representing the timestamp.
In order to rename a set of files and apply regular expressions in the renaming process you can use the rename command.
So in your example:
rename 's#_[0-9]*_[0-9]*##' *_[0-9]*
This renames all files in the current directory ending with _ followed by digits.
It cuts away all _ followed by digits followed by _ followed by digits.

sed rewrite leaving file blank [duplicate]

This question already has answers here:
Find and replace in file and overwrite file doesn't work, it empties the file
(12 answers)
Closed 6 years ago.
I am working on a bash script and running into a problem with sed leaving the file that I am using it to clean blank.
Here is the blocks that define the file and the function that I created to clean the file:
# Define Review Log file
reviewlog=/home/serverreview-$(date +%d%^b%y).txt
# Bleachs the Review Log of the color customization
bleach ()
{
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" $reviewlog >> $reviewlog
}
Using the >> it does append the info to the bottom of the file as it is supposed to. However if I use:
bleach ()
{
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" $reviewlog > $reviewlog
}
It leaves the output file totally blank
This is normal as sed reads as it writes. The first write will truncate the file. This ends reading and leaves an empty file.
In some condition when a tool reads a buffer before writing it would work for small inputs. But this is nothing you dont want to depend on. so if the tool does not have a in-place overwrite option don't use it.
You can write to a temp outfile and rename it over the infile or rename the in-file (or open the in-file, then delete it) and then write to the expected location. Otherwise you have to ensure to read everything into memory.
sed -i with and without extension does work the same. See for example https://robots.thoughtbot.com/sed-102-replace-in-place which describes the -i variations.

use sed with for loop to delete lines from text file

I am essentially trying to use sed to remove a few lines within a text document. To clean it up. But I'm not getting it right at all. Missing something and I have no idea what...
#!/bin/bash
items[0]='X-Received:'
items[1]='Path:'
items[2]='NNTP-Posting-Date:'
items[3]='Organization:'
items[4]='MIME-Version:'
items[5]='References:'
items[6]='In-Reply-To:'
items[7]='Message-ID:'
items[8]='Lines:'
items[9]='X-Trace:'
items[10]='X-Complaints-To:'
items[11]='X-DMCA-Complaints-To:'
items[12]='X-Abuse-and-DMCA-Info:'
items[13]='X-Postfilter:'
items[14]='Bytes:'
items[15]='X-Original-Bytes:'
items[16]='Content-Type:'
items[17]='Content-Transfer-Encoding:'
items[18]='Xref:'
for f in "${items[#]}"; do
sed '/${f}/d' "$1"
done
What I am thinking, incorrectly it seems, is that I can setup a for loop to check each item in the array that I want removed from the text file. But it's simply not working. Any idea. Sure this is basic and simple and yet I can't figure it out.
Thanks,
Marek
Much better to create a single sed script, rather than generate 19 small scripts in sequence.
Fortunately, generating a script by joining the array elements is moderately easy in Bash:
regex=$(printf '\|%s' "${items[#]}")
regex=${regex#'\|'}
sed "/^$regex/d" "$1"
(Notice also the addition of ^ to the final regex -- I assume you only want to match at beginning of line.)
Properly, you should not delete any lines from the message body, so the script should leave anything after the first empty line alone:
sed "1,/^\$/!b;/$regex/d" "$1"
Add -i if you want in-place editing of the target file.

method for merging two files, opinion needed

Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.

Resources