check the content of the files if they are same - bash

i have many .txt files(namely 1.txt, 2.txt 3.txt ...etc) saved in a directory and I want to check if the content of the files inside the directory are same or not
All files should be compared with other file if the content is same then print yes, if content not same print no
For Example:
1.txt
a
b
c
2.txt
a
b
c
3.txt
1
2
3
expected output when compare two file 1.txt 2.txt
1.txt 2.txt yes
expected output when compare two file 1.txt 3.txt
1.txt 3.txt no
expected output when compare two file 2.txt 3.txt
2.txt 3.txt no
I tried the script
#!/bin/sh
for file in /home/nir/dat/*.txt
do
echo $file
diff $file $file+1
done
But here problem is it doesnot give the output.Please suggest a better solution thanks.

Something like this in bash:
for i in *
do
for j in *
do
if [[ "$i" < "$j" ]]
then
if cmp -s "$i" "$j"
then
echo $i $j equal
else
echo $i $j differ
fi
fi
done
done
Output:
1.txt 2.txt equal
1.txt 3.txt differ
2.txt 3.txt differ

One idea using an array of the filenames, and borrowing jamesbrown's cmp solution:
# load list of files into array flist[]
flist=(*)
# iterate through all combinations; '${#flist[#]}' ==> number of elements in array
for ((i=0; i<${#flist[#]}; i++))
do
for ((j=i+1; j<${#flist[#]}; j++))
do
# default status = "no" (ie, are files the same?)
status=no
# if files are different this generates a return code of 1 (aka true),
# so the follow-on assignment (status=yes) is executed
cmp -s "${flist[${i}]}" "${flist[${j}]}" && status=yes
echo "${flist[${i}]} ${flist[${j}]} ${status}"
done
done
For the 3 files listed in the question this generates:
1.txt 2.txt yes
1.txt 3.txt no
2.txt 3.txt no

Related

How to delete files with spaces 20 days are more senior, having left the last 4 even if they are more senior than 20 days?

How to delete files with spaces 20 days are more senior, having left the last 4 even if they are more senior than 20 days?
Examples of files (where in name 1 - the oldest file):
reckless 1.txt
reckless 2.txt
reckless 3.txt
reckless 4.txt
reckless 5.txt
reckless 6.txt
reckless 7.txt
reckless 8.txt
confidence1.txt
confidence2.txt
confidence3.txt
choke-1.txt
choke-2.txt
choke-3.txt
choke-4.txt
choke-5.txt
choke-6.txt
choke-7.txt
choke-8.txt
choke-9.txt
choke-10.txt
cruel_1_1.txt
cruel_1_2.txt
cruel_1_3.txt
cruel_1_4.txt
cruel_1_5.txt
cruel_2_1.txt
cruel_2_2.txt
cruel_2_3.txt
cruel_2_4.txt
cruel_2_5.txt
cruel_2_6.txt
cruel_2_7.txt
level_1.txt
It turns out that should be deleted from this list:
reckless 1.txt
reckless 2.txt
reckless 3.txt
reckless 4.txt
choke-1.txt
choke-2.txt
choke-3.txt
choke-4.txt
choke-5.txt
choke-10.txt
cruel_1_1.txt
cruel_2_1.txt
cruel_2_2.txt
cruel_2_3.txt
Tried something similar from here: https://stackoverflow.com/a/20034914
But did not understand how to make sorting according to a necessary condition.
Assuming each filename does not contain newline characters,
please try the following:
while IFS= read -r line; do
array=()
while IFS= read -r file; do
[[ $file =~ ^[0-9]+\ (.*)$ ]] && array+=("${BASH_REMATCH[1]}")
done < <(
find . -name "$line*.txt" -type f -mtime +20 -printf "%T# %p\n" | sort
)
for (( i=0; i<${#array[#]}-4; i++ )); do
rm -- "${array[$i]}"
done
done < <(
for f in *.txt; do
[[ $f =~ ^(.*)[-_\ ]+[0-9]+\.txt$ ]] && echo "${BASH_REMATCH[1]}"
done | uniq
)
[Explanations]
First of all, it removes the suffix numbers and extensions from the filenames
to extract a root name as a group name.
Next it finds individual group files which are older than 20 days and
sort them by mtime.
Then it removes older files preserving the latest up to four files in the
group.
If the filenames do contain newline characters, small modification and -z
option support of sort command will answer the requirement.

Splitting CSV file into text files

I have a CSV file of the form:
1,frog
2,truck
3,truck
4,deer
5,automobile
and so on, for about 50 000 entries. I want to create 50 000 separate .txt files named with the number before the comma and containing the word after the comma, like so:
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile
and so on.
This is the script I've written so far, but it does not work properly:
#!/bin/bash
folder=/home/data/cifar10
for file in $(find "$folder" -type f -iname "*.csv")
do
name=$(basename "$file" .txt)
while read -r tag line; do
printf '%s\n' "$line" >"$tag".txt
done <"$file"
rm "$file"
done
The issue is in your inner loop:
while read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You need to set IFS to , so that tag and line are parsed correctly:
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You can use shopt -s globstar instead of find, with Bash 4.0+. This will be immune to word splitting and globbing, unlike plain find:
shopt -s globstar nullglob
for file in /home/data/cifar10/**/*.csv; do
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
done
Note that the name set through name=$(basename "$file" .txt) statement is not being used in your code.
An awk alternative:
awk -F, '{print $2 > $1 ".txt"}' file.csv
awk 'BEGIN{FS=","} {print $1".txt contains: "$2}' file
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile

Compare File headers using Awk or Cmp

I have many flat files in 1 directory.
Each file has a header and some data in it.
I want to compare header of one file with all the other files available in that directory.
This can be achieved using shell scripting but i want to do it using a single line code.
I tried it using the awk command but it is comparing the whole file not just the header.
for i in `ls -1 *a*` ; do cmp a.dat $i ; done
Can someone please let me know how can i do that?
Also if it can be achieved using awk.
I just need to check whether the header is matching or not.
I would try this: grab the first line of every file, extract the unique lines, and count them. The result should be one.
number_uniq=$( sed '1q' * | sort -u | wc -l )
That won't tell you which file is different.
files=(*)
reference_header=$( sed '1q' "${files[0]}" )
for file in "${files[#]:1}"; do
if [[ "$reference_header" != "$( sed '1q' "$file" )" ]]; then
echo "wrong header: $file"
fi
done
For what you describe, you can use md5 or cksum to take a signature of the bytes in the header.
Given 5 files (note that File 4.txt does not match):
$ for fn in *.txt; do echo "$fn:"; cat "$fn"; printf "\n\n"; done
File 1.txt:
what a great ride! it is a lovely day
/tmp/files/File 1.txt
File 2.txt:
what a great ride! it is a lovely day
/tmp/files/File 2.txt
File 3.txt:
what a great ride! it is a lovely day
/tmp/files/File 3.txt
File 4.txt:
what an awful ride! it is a horrible day
/tmp/files/File 4.txt
reference.txt:
what a great ride! it is a lovely day
/tmp/files/reference.txt
You can use md5 to get a signature and the check if they other files are the same.
First get the reference signature:
$ sig=$(head -1 reference.txt | md5)
$ echo $sig
549560de062a87ec69afff37abe18d8f
Then loop through the files:
for fn in *.txt; do
if [[ "$sig" != "$(head -1 "$fn" | md5)" ]]; then
echo "header of \"$fn\" does not match";
fi;
done
Prints:
header of "File 4.txt" does not match

How can I merge before move files?

I have some files (few millions) and I keep file list in files.txt like this:
/home/user/1.txt
/home/user/2.txt
/home/user/3.txt
/home/user/4.txt
/home/user/5.txt
I need to move all, but before move I must merge too.
I can move like this:
#!/bin/bash
for files in $(cat files.txt); do
mv $files /home/user/hop/
done
I can merge all with cat * but I need to merge by twos, like this:
1.txt and 2.txt merge --> 1.txt and move.
3.txt and 4.txt merge --> 3.txt and move.
5.txt --> 5.txt and move.
But I must merge before move, in /home/user/, not in /home/user/hop/
How can I do this?
You can use $ cat file1 file2 file3 file4 file5 file6 > out.txt after you moved them, with this you can also set the order of the files to be merged.
Also works for binaries.
You can use this script:
while read -r f; do
if ((++i % 2)); then
p="$f"
else
cat "$f" >> "$p"
mv "$p" /home/user/hop/
rm "$f"
unset p
fi
done < list.txt
[[ -n $p ]] && mv "$p" /home/user/hop/

How do I use the `join` UNIX command? Nothing is ever output

I feel like I'm misunderstanding a very basic part of the join command, because I cannot get it to work (running OS X).
echo "testing" > 1.txt
echo "text" > 2.txt
join 1.txt 2.txt
No output.
Shouldn't it have "testing text" as the result?
You don't have a join because you don't have 'matching fields'. You need something like
echo "1 testing" > 1.txt
echo "1 text" > 2.txt
join 1.txt 2.txt
to create 1 testing text because it 'joins' (or matches) on the 1
For what you're trying to do, use paste, not join:
> echo "testing" > 1.txt
> echo "text" > 2.txt
> paste 1.txt 2.txt
testing text

Resources