Taking line intersection of several files

Taking line intersection of several files - bash

I see comm can do 2 files and diff3 can do 3 files. I want to do for more files (5ish).
One way:
comm -12 file1 file2 >tmp1
comm -12 tmp1 file3 >tmp2
comm -12 tmp2 file4 >tmp3
comm -12 tmp3 file5
This process could be turned into a script
comm -12 $1 $2 > tmp1
for i in $(seq 3 1 $# 2>/dev/null); do
comm -12 tmp`expr $i - 2` $(eval echo '$'$i) >tmp`expr $i - 1`
done
if [ $# -eq 2 ]; then
cat tmp1
else
cat tmp`expr $i - 1`
fi
rm tmp*
This seems like poorly written code, even to a newbie like me, is there a better way?

It's quite a bit more convoluted than it has to be. Here's another way of doing it.
#!/bin/bash
# Create some temp files to avoid trashing and deleting tmp* in the directory
tmp=$(mktemp)
result=$(mktemp)
# The intersection of one file is itself
cp "$1" "$result"
shift
# For each additional file, intersect with the intermediate result
for file
do
comm -12 "$file" "$result" > "$tmp" && mv "$tmp" "$result"
done
cat "$result" && rm "$result"

Related

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files?

As of now how i am writing the script is to count the number of lines for the 2 files.
Then i put it though condition if it is greater than the old.
However, i am not sure how to compare it based on percentage of the old files.
I there a better way to design the script.
#!/bin/bash
declare -i new=$(< "$(ls -t file name*.csv | head -n 1)" wc -l)
declare -i old=$(< "$(ls -t file name*.csv | head -n 2)" wc -l)
echo $new
echo $old
if [ $new -gt $old ];
then
echo "okay";
else
echo "fail";

If you need to check for x% max diff line, you can count the number of '<' lines in the diff output. Recall the the diff output will look like.
+ diff node001.html node002.html
2,3c2,3
< 4
< 7
---
> 2
> 3
So that code will look like:
old=$(wc -l < file1)
diff1=$(diff file1 file2 | grep -c '^<')
pct=$((diff1*100/(old-1)))
# Check Percent
if [ "$pct" -gt 60 ] ; then
...
fi

Execute a program over all pairs of files in a directory using bash script

I have a directory with a bunch of files. I need to create a bash file to qsub and run a program over all pairs of all files:
for $file1, $file2 in all_pairs
do
/path/program -i $file1 $file2 -o $file1.$file2.result
done
So I could do:
qsub script.sh
to get:
file1.file2.result
file1.file3.result
file2.file3.result
for directory with:
file1
file2
file3

The following is probably the easiest:
the pair a-b is different from b-a:
set -- file1 file2 file3 file4 ...
for f1; do
for f2; do
/path/program -i "$f1" "$f2" -o "$f1.$f2.result"
done
done
the pair a-b is equal to b-a:
set -- file1 file2 file3 file4 ...
for f1; do
shift
for f2; do
/path/program -i "$f1" "$f2" -o "$f1.$f2.result"
done
done

You can do it as in every other programming language:
files=(file1 file2 file3) # or use a glob to list the files automatically, for instance =(*)
max="${#files[#]}"
for ((i=0; i<max; i++)); do
for ((j=i+1; j<max; j++)); do
echo -i "${files[i]}" "${files[j]}" -o "${files[i]}${files[j]}.result"
done
done
Replace echo with /path/program when you are happy with the result

How to copy files that having line number > 100 to other folder

I want to copy files that having 100 lines or more to another folder:
$ cd "/home/john/folder a"
$ wc -l *
10 file1.txt
50 file2.txt
100 file3.txt
150 file4.txt
I want to copy file file3.txt and file4.txt (files that having 100 lines or more) to folder /home/john/folder b.
Can someone help me please? Thank you very much.

Try this:
declare -i numfile
for f in *; do
numfile=$([ -f "$f" ] && cat "$f" | wc -l )
[ $numfile -ge 100 ] && cp "$f" otherdir
done
For each files in current directory, numfile is assigned the number of lines of the file.
If numfile is greater or equal to 100, file is copied to otherdir.
Edit :
As William Pursell mentionned, a more robust approach would be to test if the item is a file before executing comparison and copy commands:
for f in *; do
if [ -f "$f" ]; then
[ "$(wc -l < $f)" -ge 100 ] && cp "$f" otherdir;
fi
done

Here is one more:
# Assuming that we are in source folder ...
cp $(wc -l *|grep -Eo '[0-9]{3,} (.*)'|head -n -1|cut -d ' ' -f 2) /dev/null "/home/john/folder b"
The head removes the total line printed by wc, and the /dev/null takes care of the case that you don't have any files matching the criterium.
Of course this solution - as the other ones presented here - will get you into problems, if your source directory contains that many files, that the maximum command line length will be exceeded.

Try something like this (POSIX sh):
#!/bin/sh
SOURCE_FOLDER="/home/john/folder a"
COPY_TO="/home/john/folder b"
for dir in "$SOURCE_FOLDER" "$COPY_TO"; do
if [ ! -d "$dir" ]; then
echo "Directory ${dir} does not exist." >&2
exit 1
fi
done
if [ "x`ls -A "$SOURCE_FOLDER"`" = "x" ]; then
echo "Directory '${SOURCE_FOLDER}' is empty." >&2
exit 1
fi
for file in "$SOURCE_FOLDER"/*; do
LINES=`wc -l < "$file"`
echo "File ${file} has ${LINES} lines..."
if [ "$LINES" -ge 100 ]; then
echo "Copying ${file}..."
cp -a "$file" "${COPY_TO}/"
fi
done
Here's a Bash version for systems with Bash (you said Unix, not Linux, so you probably want the top version):
#!/bin/bash
SOURCE_FOLDER="/home/john/folder a"
COPY_TO="/home/john/folder b"
for dir in "$SOURCE_FOLDER" "$COPY_TO"; do
if [[ ! -d "$dir" ]]; then
echo "Directory ${dir} does not exist." >&2
exit 1
fi
done
if [[ -z "$(ls -A "$SOURCE_FOLDER")" ]]; then
echo "Directory '${SOURCE_FOLDER}' is empty." >&2
exit 1
fi
for file in "$SOURCE_FOLDER"/*; do
LINES="$(wc -l < "$file"')"
echo "File ${file} has ${LINES} lines..."
if [[ "$LINES" -ge 100 ]]; then
echo "Copying ${file}..."
cp -a "$file" "${COPY_TO}/"
fi
done
I tested it like this:
$ mkdir "folder a"
$ mkdir "folder b"
$ chmod +x script.sh
$ cd folder\ a/
$ seq 1 1000 > file1.txt
$ seq 1 1000 > file2.txt
$ seq 1 100 > file4.txt
$ seq 1 100 > file3.txt
$ seq 1 99 > file4.txt
$ seq 1 1 > file5.txt
$ seq 1 20 > file6.txt
$ cd ..
$ ./script.sh
File /.../dev/scratch/stack/folder a/file1.txt has 1000 lines...
Copying /.../dev/scratch/stack/folder a/file1.txt...
File /.../dev/scratch/stack/folder a/file2.txt has 1000 lines...
Copying /.../dev/scratch/stack/folder a/file2.txt...
File /.../dev/scratch/stack/folder a/file3.txt has 100 lines...
Copying /.../dev/scratch/stack/folder a/file3.txt...
File /.../dev/scratch/stack/folder a/file4.txt has 99 lines...
File /.../dev/scratch/stack/folder a/file5.txt has 1 lines...
File /.../dev/scratch/stack/folder a/file6.txt has 20 lines...
I encourage you to step through this line by line, figuring out how each part of the script works, to help you with similar tasks in the future.
I'll explain a few things here:
wc -l < "$file" gives us only the line count of the file, without the filename.
[ "$LINES" -ge 100 ] is true if there are at least 100 lines in the file.
echo "..." >&2 outputs a line to standard error rather than standard output.
cp -a copys a file while retaining all of it's attributes, such as owner, permissions, and modification time.
Make sure to quote all variables unless you have a good reason not to, to prevent issues with whitespace.

Control if a new file incoming in a folder with "comm"

I'm using a comm in a infinite cycle for view if a new file incoming in a folder, but i not have difference from 2 files but for example if incominig file "a" i view in output:
a a.out a.txt b.txt test.cpp testshell.sh
a.out a.txt b.txt test.cpp testshell.sh
my Code is this:
#! /bin/ksh
ls1=$(ls);
echo $ls1 > a.txt;
while [[ 1 > 0 ]] ; do
ls2=$(ls);
echo $ls2 > b.txt;
#cat b.txt;
#sort b.txt > b.txt;
#diff -u a.txt b.txt;
#diff -a --suppress-common-lines -y a.txt b.txt
comm -3 a.txt b.txt;
printf "\n";
ls1=$ls2;
echo $ls1 > a.txt;
#cat a.txt;
#sleep 2;
#sort a.txt > a.txt;
done
THANKS

#! /bin/ksh
set -vx
PreCycle="$( ls -1 )"
while true
do
ThisCycle="$( ls -1 )"
echo "${PreCycle}${ThisCycle}" | uniq
PreCycle="${ThisCycle}"
sleep 10
done
give add and removed difference but without use of file. Could directly give new file same way but uniq -f 1 failed (don't understand why) when used on list prefixed by + and - depending of source

Nested for loop comparing files

I am trying to write a bash script that looks at two files with the same name, each in a different directory.
I know this can be done with diff -r, however, I would like to take everything that is in the second file that is not in the first file and output it into an new file (also with the same file name)
I have written a (nested) loop with a grep command but it's not good and gives back a syntax error:
#!/bin/bash
FILES=/Path/dir1/*
FILES2=/Path/dir2/*
for f in $FILES
do
for i in $FILES2
do
if $f = $i
grep -vf $i $f > /Path/dir3/$i
done
done
Any help much appreciated.

try this
#!/bin/bash
cd /Path/dir1/
for f in *; do
comm -13 <(sort $f) <(sort /Path/dir2/$f) > /Path/dir3/$f
done
if syntax in shell is
if test_command;then commands;fi
commands are executed if test_command exit code is 0
if [ $f = $i ] ; then grep ... ; fi
but in your case it will be more efficient to get the file name
for i in $FILES; do
f=/Path/dir2/`basename $i`
grep
done
finally, maybe this will be more efficient than grep -v
comm -13 <(sort $f) <(sort $i)
comm -13 will get everything which is in the second and not in first ; comm without arguments generates 3 columns of output : first is only in first, second only in second and third what is common.
-13 or -1 -3 removes first and third column

#!/bin/bash
DIR1=/Path/dir1
DIR2=/Path/dir2
DIR3=/Path/dir3
for f in $DIR1/*
do
for i in $DIR2/*
do
if [ "$(basename $f)" = "$(basename $i)" ]
then
grep -vf "$i" "$f" > "$DIR3/$(basename $i)"
fi
done
done

This assumes no special characters in filenames. (eg, whitespace. Use double quotes if that is unacceptable.):
a=/path/dir1
b=/path/dir2
for i in $a/*; do test -e $b/${i##*/} &&
diff $i $b/${i##*/} | sed -n '/^< /s///p'; done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Taking line intersection of several files - bash

Related

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files?

Execute a program over all pairs of files in a directory using bash script

How to copy files that having line number > 100 to other folder

Control if a new file incoming in a folder with "comm"

Nested for loop comparing files

Categories

Resources