How to find specific characters between two files? - cmd

I have two files (file1, file2) and I want to make a third one showing their differences using cmd like this:
file1 :qwertyuiop
file2 :qwartyuioa
file3:chr(3)=e a
chr(10)=p a
Any good ideas?

Related

Making a list from data in a few variable bash script [duplicate]

I want to merge different lists with delimiter "-".
first list has 2 words
$ cat first
one
who
second list has 10000 words
$ cat second
languages
more
simple
advanced
home
expert
......
......
test
nope
i want two list merge, same ...
$cat merge-list
one-languages
one-more
....
....
who-more
....
who-test
who-nope
....
Paste should do the trick.
paste is a Unix command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.
Example
paste -d - file1 file2
EDIT:
I just saw that your two files have different length. Unfortunately paste is not
helping with these kinds of problems. But you could of course use something like this:
for i in `cat file1`; do
for j in `cat file2`; do
echo $i-$j
done
done

Combine .csv files on Mac OSX terminal does not use a new line in between

I have multiple csv files that I wish to merge into one.
a.csv
Field1,Field2,Field3
1,2,3
4,5,6
b.csv
Field4,Field5,Field6
7,8,9
10,11,12
When I run the following command on Mac OSX Terminal
cat *.csv >merged.csv
The files get concatenated as follows -
Field1,Field2,Field3
1,2,3
4,5,6Field4,Field5,Field6
7,8,9
10,11,12
However I would like the concatenation to take place in a seperate line.
Field1,Field2,Field3
1,2,3
4,5,6
Field4,Field5,Field6
7,8,9
10,11,12
How can this be done best?
cat *.csv + new line >merged.csv
The problem is that your first file (and probably the rest as well) doesn't have a newline at the end of the last line. In unix-style text files, every line is supposed to have a newline terminator at the end. Result: when you catenate the files together, there's no terminator at the end of the "4,5,6" line, so "Field4,Field5,Field6" gets treated as part of the same line.
Fortunately, there's a pretty simple solution: use something that processes (and appends) files line-by-line rather than just blindly sticking them together. Here's an example using awk:
awk '{print $0}' *.csv
BTW, I wouldn't recommend using the format somecmd *.csv >merged.csv, because merged.csv can wind up being both an input and output, leading to weird results. Whether this happens (and whether it matters) is complicated, but it's best to just avoid the issue by using a more specific wildcard pattern, putting the input and output in different directories, or something like that.

Comparing two files with accented characters (Mac OS / Terminal)

Goal: create a file listing all lines not found in either file
OS: Mac OS X, using Terminal
Problem: lines contain accented characters (UTF-8) and comparison doesn't seem to work
I've used the following command for comparing both files:
comm -13 <(sort file1) <(sort file2) > file3
That command works fine except with lines in files containing accented characters. Would you have any solutions?
One non-optimal thing I've tried is to replace all accented characters with non-accented ones with sed -i but that didn't seem to work on one of my two files, so I assume one file is weirdly encoded (in fact, ü is displayed u¨ when opening the file in TextMate but correctly as ü in TextEdit – I had generated that file using find Photos/ -type f > list_photos.txt to scroll through all filenames which contain accented characters... maybe I should add another parametre to the find command in the first place?). Any thoughts about this as well?
Many thanks.
Update:
I manually created text files with accented characters. The comm command worked without requiring LC_ALL. So the issue must be with the output of filenames into a text file (find command).
Test file A:
Istanbul 001 Mosquée Süleymaniye.JPG
Istanbul 002 Mosquée Süleymaniye.JPG
Test file B:
Istanbul 001 Mosquée Süleymaniye.JPG
Istanbul 002 Mosquée Süleymaniye - Angle.JPG
Istanbul 003 Ville.JPG
Comparison produces expected results. But it's when I create automatically those files, I instead get Su¨leymaniye for instance in the text file. When I don't generate an output file, the terminal however shows me the correct word Süleymaniye.
Many, many thanks for looking into it. Much appreciated.
You need to set the ENVIRONMENT for comm.
ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE, and LC_CTYPE environment variables affect
the execution of comm as described in environ(7).
For example:
LC_COLLATE=C comm -13 <(sort file1) <(sort file2) > file3
or
LC_ALL=C comm -13 <(sort file1) <(sort file2) > file3

Bash script compare values from 2 files and print output values from one file

I have two files like this;
File1
114.4.21.198,cl_id=1J3W7P7H0S3L6g85900g736h6_101ps
114.4.21.205,cl_id=1O3M7A7Q0S3C6h85902g7b3h7_101pf
114.4.21.205,cl_id=1W3C7Z7W0U3J6795197g177j9_117p1
114.4.21.213,cl_id=1I3A7J7N0M3W6e950i7g2g2i0_1020h
File2
cl_id=1B3O7M6C8T4O1b559i2g930m0_1165d
cl_id=1X3J7M6J0W5S9535180h90302_101p5
cl_id=1G3D7X6V6A7R81356e3g527m9_101nl
cl_id=1L3J7R7O0F0L74954h2g495h8_117qk
cl_id=1L3J7R7O0F0L74954h2g495h8_117qk
cl_id=1J3W7P7H0S3L6g85900g736h6_101ps
cl_id=1W3C7Z7W0U3J6795197g177j9_117p1
cl_id=1I3A7J7N0M3W6e950i7g2g2i0_1020h
cl_id=1Q3Y7Q7J0M3E62953e5g3g5k0_117p6
I want to compare cl_id values that exist on file1 but not exist on file2 and print out the first values from file1 (IP Address).
it should be like this
114.4.21.198
114.4.21.205
114.4.21.205
114.4.21.213
114.4.23.70
114.4.21.201
114.4.21.211
120.172.168.36
I have tried awk,grep diff, comm. but nothing come close. Please tell the correct command to do this.
thanks
One proper way to that is this:
grep -vFf file2 file1 | sed 's|,cl_id.*$||'
I do not see how you get your output. Where does 120.172.168.36 come from.
Here is one solution to compare
awk -F, 'NR==FNR {a[$0]++;next} !a[$1] {print $1}' file2 file1
114.4.21.198
114.4.21.205
114.4.21.205
114.4.21.213
Feed both files into AWK or perl with field separator=",". If there are two fields, add the fields to a dictionary/map/two arrays/whatever ("file1Lines"). If there is just one field (this is file 2), add it to a set/list/array/whatever ("file2Lines"). After reading all input:
Loop over the file1Lines. For each element, check whether the key part is present in file2Lines. If not, print the value part.
This seems like what you want to do and might work, efficiently:
grep -Ff file2.txt file1.txt | cut -f1 -d,
First the grep takes the lines from file2.txt to use as patterns, and finds the matching lines in file1.txt. The -F is to use the patterns as literal strings rather then regular expressions, though it doesn't really matter with your sample.
Finally the cut takes the first column from the output, using , as the column delimiter, resulting in a list of IP addresses.
The output is not exactly the same as your sample, but the sample didn't make sense anyway, as it contains text that was not in any of the input files. Not sure if this is what you wanted or something more.

Find same words in two text files

I have two text files and each contains more than 50 000 lines. I need to find same words that are in both text files. I tried COMM command but I got answer that "file 2 is not in sorted order". I tried to sort file by command SORT but it doesn´t work. I´m working in Windows. It doesn´t have to be solved in command line. It can be solved in some program or something else. Thank you for every idea.
If you want to sort the files you will have to use some sort of external sort (like merge sort) so you have enough memory. As for another way you could go through the first file and find all the words and store them in a hashtable, then go through the second file and check for repeated words. If the words are actual words and not gibberish the second method will work and be easier. Since the files are so large you may not want to use a scripting language but it might work.
If the words are not on their own line, then comm can not help you.
If you have a set of unix utilities handy, like Cygwin, (you mentioned comm, so you may have have others as well) you can do:
$ tr -cs "[:alpha:]" "\n" < firstFile | sort > firstFileWords
$ tr -cs "[:alpha:]" "\n" < secondFile | sort > secondFileWords
$ comm -12 firstFileWords secondFileWords > commonWords
The first two lines convert the words in each file in to a single word on each line, it also sorts the file.
If you're only interested in individual words, you can change sort to sort -u to make get the unique set.

Resources