Merge files with sort -m and give error if files not pre-sorted? - sorting

need some help out here.
I have two files,
file1.txt >
5555555555
1111111111
7777777777
file2.txt >
0000000000
8888888888
2222222222
4444444444
3333333333
when I run,
$ sort -m file1.txt file2.txt > file-c.txt
the output file-c.txt get the merged within file1 and file2 but it is not sorted.
file-c.txt >
0000000000
5555555555
1111111111
7777777777
8888888888
2222222222
4444444444
3333333333
When it happens I need an error saying that the files (file1 and file2) is not sorted and the merge can't merge the files before it has been sorted. So when I run $ sort -m file1.txt file2.txt > file-c.txt I have to get an error saying that it cannot merge file1 and file2 to file-c because they are not yet sorted.
Hope you guys understand me :D

If I understand what you're asking, you could do this:
DIFF1=$(diff <(cat file1.txt) <(sort file1.txt))
DIFF2=$(diff <(cat file2.txt) <(sort file2.txt))
if [ "$DIFF1" != "" ]; then
echo 'file1 is not sorted'
elif [ "$DIFF2" != "" ]; then
echo 'file2 is not sorted'
else
sort -m file1.txt file2.txt
fi
This works in Bash (and other shells) and does the following:
Set the DIFF1 variable to the output of a diff of a cat and a sort of file1 (this will be empty if the cat and sort are the same meaning if the file is sorted
Set the DIFF2 variable in the same manner as DIFF1 but for file2
Do a simple if .. elif .. else to check and see whether file1 AND file2 are sorted, and if so do a command line sort of the two
Is this what you were looking for?
EDIT: Alternately per #twalberg if your version of sort supports it, you can do this:
if ! sort -c file1.txt
then echo 'file1 is not sorted'
elif ! sort -c file2.txt
then echo 'file2 is not sorted'
else
sort -m file1.txt file2.txt
fi

Related

join file based on two columns for ALL files in directory

I have four files in my directory: say a.txt; b.txt; c.txt; d.txt. I would like to join every file with all other files based on two common columns (i.e. join a.txt with b.txt, c.txt and d.txt; join b.txt with a.txt, c.txt and d.txt; join c.txt with a.txt, b.txt and d.txt). To do this for two of the files I can do:
join -j 2 <(sort -k2 a.txt) <(sort -k2 b.txt) > a_b.txt
How do I write this in a loop for all files in the directory? I've tried the code below but that's not working.
for i j in *; do join -j 2 <(sort -k2 $i) <(sort -k2 $j) > ${i_j}.txt
Any help/direction would be helpful! Thank you.
This might be a way to do it:
#!/bin/bash
files=( *.txt )
for i in "${files[#]}";do
for j in "${files[#]}";do
if [[ "$i" != "$j" ]];then
join -j 2 <(sort -k2 "$i") <(sort -k2 "$j") > "${i%.*}_$j"
fi
done
done

Bash script to add numbers from all files (each containing an integer) in a directory

I have many .txt files in a directory. Each file has only an integer.
How to write a bash script to add these integers and save the output to a file?
Just loop through the files extracting its integers and then sum them:
grep -ho '[0-9]*' files* | awk '{sum+=$1} END {print sum}'
Explanation
grep -ho '[0-9]*' files* extract numbers from the files whose name matches files*. We use -h to prevent getting the file name of the match and -o to just get the match, not the whole line.
awk '{sum+=$1} END {print sum}' loop through the values coming from grep and sum them. Finally, print the result.
Test
$ tail a*
==> a1 <==
hello 23 asd
asdfasfd
==> a2 <==
asdfasfd
is 15
==> a3 <==
$ grep -ho '[0-9]*' a* | awk '{sum+=$1} END {print sum}'
38
You can cat your files and then sum up using awk:
cat *.txt | awk '{x+=$0}END{print x}' > test.txt
test.txt should contain the sum.
Create some test files:
$ for f in {a,b,c,d}.txt; do
> echo $RANDOM > "$f"
> done
$ cat *.txt
18419
25511
31919
28810
Sum it using Bash:
$ i=0;
$ for f in *.txt; do
> ((i+=$(<"$f")));
> done
$ echo $i
104659

Control if a new file incoming in a folder with "comm"

I'm using a comm in a infinite cycle for view if a new file incoming in a folder, but i not have difference from 2 files but for example if incominig file "a" i view in output:
a a.out a.txt b.txt test.cpp testshell.sh
a.out a.txt b.txt test.cpp testshell.sh
my Code is this:
#! /bin/ksh
ls1=$(ls);
echo $ls1 > a.txt;
while [[ 1 > 0 ]] ; do
ls2=$(ls);
echo $ls2 > b.txt;
#cat b.txt;
#sort b.txt > b.txt;
#diff -u a.txt b.txt;
#diff -a --suppress-common-lines -y a.txt b.txt
comm -3 a.txt b.txt;
printf "\n";
ls1=$ls2;
echo $ls1 > a.txt;
#cat a.txt;
#sleep 2;
#sort a.txt > a.txt;
done
THANKS
#! /bin/ksh
set -vx
PreCycle="$( ls -1 )"
while true
do
ThisCycle="$( ls -1 )"
echo "${PreCycle}${ThisCycle}" | uniq
PreCycle="${ThisCycle}"
sleep 10
done
give add and removed difference but without use of file. Could directly give new file same way but uniq -f 1 failed (don't understand why) when used on list prefixed by + and - depending of source

bash script: check if all words from one file are contained in another, otherwise issue error

I was wondering if you could help. I am new to bash scripting.
I want to be able to compare two lists. File1.txt will contain a list of a lot of parameters and file2.txt will only contain a section of those parameters.
File1.txt
dbipAddress=192.168.175.130
QAGENT_QCF=AGENT_QCF
QADJUST_INVENTORY_Q=ADJUST_INVENTORY_Q
QCREATE_ORDER_Q=CREATE_ORDER_Q
QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
File2.txt
AGENT_QCF
ADJUST_INVENTORY_Q
CREATE_ORDER_Q
I want to check if all the Qs in file1.txt are contained in file2.txt (after the =). If they aren't, then the bash script should stop and echo a message.
So, in the example above the script should stop as File2.txt does not contain the following Q: LOAD_INVENTORY_Q.
The Qs in file1.txt or file2.txt do not follow any particular order.
The following command will print out lines in file1.txt with values (anything appearing after =) that do not appear in file2.txt.
[me#home]$ awk -F= 'FNR==NR{keys[$0];next};!($2 in keys)' file2.txt file1.txt
dbipAddress=192.168.175.130
QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
Breakdown of the command:
awk -F= 'FNR==NR{keys[$0];next};!($2 in keys)' file2.txt file1.txt
--- ---------------------- -------------
| | |
change the | Target lines in file1.txt where
delimiter | the second column (delimited by `=`) do
to '=' | not exist in the keys[] array.
Store each line in
file2.txt as a key
in the keys[] array
To do something more elaborate, say if you wish to run the command as a pre-filter to make sure the file is valid before proceeding with your script, you can use:
awk -F= 'FNR==NR{K[$0];N++;next};!($2 in K) {print "Line "(NR-N)": "$0; E++};END{exit E}' file2.txt file1.txt
ERRS=$?
if [ $ERRS -ne 0 ]; then
# errors found, do something ...
fi
That will print out all lines (including line numbers) in file1.txt that do not meet the bill, and returns an exit code that matches the number of non-conforming lines. That way your script can detect the errors easily by checking $? and respond accordingly.
Example output:
[me#home]$ awk -F= 'FNR==NR{K[$0];N++;next};!($2 in K) {print "Line "(NR-N)": "$0;E++};END{exit E}' file2.txt file1.txt
Line 1: dbipAddress=192.168.175.130
Line 5: QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
[me#home]$ echo $?
2
You can use cut to get only the part after =. comm can be used to output the lines contained in the first file but not the second one:
grep ^Q File1.txt | cut -d= -f2- | sort | comm -23 - <(sort File2.txt)
The following command line expression will filter out the lines that occur in file2.txt but not file1.txt:
cat file1.txt | grep -Fvf file2.txt | grep '^Q'
explanation:
-F : match patterns exactly (no expansion etc.) ; much faster
-v : only print lines that don't match
-f : get your patterns from the file specified
| grep '^Q' : pipe the output into grep, and look for lines that start with "Q"
This isn't exactly "stop the bash script when..." since it will process and print every mismatch; also, it doesn't test that there's an "=" in front of the pattern - but I hope it's useful.
Here's another way:
missing=($(comm -23 <(awk -F= '/^Q/ {print $2}' file1.txt | sort) <(sort file2.txt)))
if (( ${#missing[#]} )); then
echo >&2 "The following items are missing from file2.txt:"
printf '%s\n' "${missing[#]}"
exit 1
fi
Assuming that the relevant lines in file1.txt always start with a Q:
grep "^Q" file1.txt | while IFS= read -r line
do
what=${line#*=}
grep -Fxq "$what" file2.txt || echo "error: $what not found"
done
Output:
error: LOAD_INVENTORY_Q not found

In bash, How do I test if a word is not in a list?

Good day everybody,
I want to make an if conditional with the following aim:
have two files, the script check a word of file1 (locate in variable $word2test) if exits in file2 (each word locate in variable $wordINlist) do nothing | if the word is not in file2, print it to stdout
My first approach is:
if ! [[ "$word2test" =~ "$wordINlist" ]] ; then
echo $word2test
fi
Thanks in advance for any suggestion
Try this simple bash sample script :
word=foobar
grep -q "\<$word\>" FILE || echo "$word is *not* in FILE"
Another way with REGEX :
word=foobar
grep -q "^$word *$" FILE || echo "$word is *not* in FILE"
If your files are simple lists of one word per line, try this:
grep -Fvf file2 file1
or
join -v 1 <(sort file1) <(sort file2)
Assuming $wordINlist is an array (you say "list" but I'm assuming you meant array), you can iterate through it like so:
for item in ${wordINlist[#]}; do
[[ $item == $word2test ]] || echo $word2test
done
If $wordINlist is a file, then you can simply grep through it:
egrep -q "\b${word2test}\b" "$wordINlist" || echo "$word2test"
When egrep finds a match it returns true, otherwise it returns false. So that simply says, "either a match was found, or echo $word2test"
If all you're wanting to do is see which items are in file1 and NOT in file2, use comm:
comm -23 <(sort -u file1) <(sort -u file2)

Resources