AutoIt - Find duplicate images by content? - image

I am looking for a way to find duplicate images using AutoIt. I've looked into PixelSearch and SearchImage but neither do exactly what I need them to do.
I am trying to compare 2 images by filename and see if they are the same image (a duplicate). The best way I've thought to do it would be to:
1) Get both image sizes in pixels
2) Use a while loop to get the color of each pixel and store it in an array
3) Check to see if both arrays are equal to each other.
Does anybody have any ideas on how to achieve this?

I just did some more research on this subject and built a small UDF based on a few answers I read. (Mainly based off of monoceres's answer on AutoItScript.com). I figured I would post my solution here to help any future developers!
CompareImagesUDF.au3
Func _CompareImages($ciImageOne, $ciImageTwo)
_GDIPlus_Startup()
$fname1=$ciImageOne
If $fname1="" Then Exit
$fname2=$ciImageTwo
If $fname2="" Then Exit
$bm1 = _GDIPlus_ImageLoadFromFile($fname1)
$bm2 = _GDIPlus_ImageLoadFromFile($fname2)
; MsgBox(0, "bm1==bm2", CompareBitmaps($bm1, $bm2))
Return CompareBitmaps($bm1, $bm2)
_GDIPlus_ImageDispose($bm1)
_GDIPlus_ImageDispose($bm2)
_GDIPlus_Shutdown()
EndFunc
Func CompareBitmaps($bm1, $bm2)
$Bm1W = _GDIPlus_ImageGetWidth($bm1)
$Bm1H = _GDIPlus_ImageGetHeight($bm1)
$BitmapData1 = _GDIPlus_BitmapLockBits($bm1, 0, 0, $Bm1W, $Bm1H, $GDIP_ILMREAD, $GDIP_PXF32RGB)
$Stride = DllStructGetData($BitmapData1, "Stride")
$Scan0 = DllStructGetData($BitmapData1, "Scan0")
$ptr1 = $Scan0
$size1 = ($Bm1H - 1) * $Stride + ($Bm1W - 1) * 4
$Bm2W = _GDIPlus_ImageGetWidth($bm2)
$Bm2H = _GDIPlus_ImageGetHeight($bm2)
$BitmapData2 = _GDIPlus_BitmapLockBits($bm2, 0, 0, $Bm2W, $Bm2H, $GDIP_ILMREAD, $GDIP_PXF32RGB)
$Stride = DllStructGetData($BitmapData2, "Stride")
$Scan0 = DllStructGetData($BitmapData2, "Scan0")
$ptr2 = $Scan0
$size2 = ($Bm2H - 1) * $Stride + ($Bm2W - 1) * 4
$smallest = $size1
If $size2 < $smallest Then $smallest = $size2
$call = DllCall("msvcrt.dll", "int:cdecl", "memcmp", "ptr", $ptr1, "ptr", $ptr2, "int", $smallest)
_GDIPlus_BitmapUnlockBits($bm1, $BitmapData1)
_GDIPlus_BitmapUnlockBits($bm2, $BitmapData2)
Return ($call[0]=0)
EndFunc ;==>CompareBitmaps
Now to compare imagages, all you have to do is include the CompareImagesUDF.au3 file and call the function.
CompareImagesExample.au3
#Include "CompareImagesUDF.au3"
; Define the two images (They can be different file formats)
$img1 = "Image1.jpg"
$img2 = "Image2.jpg"
; Compare the two images
$duplicateCheck = _CompareImages($img1, $img2)
MsgBox(0,"Is Duplicate?", $duplicateCheck)

If you want to find out if both images are an exact match, regardless if names are the same or different, use the built-in Crypt function _Crypt_HashFile with MD2 or MD5 to make a hash of both files and compare that.

Related

Divisor is equal to zero, Need guidance with case statement

This is my query & I'm getting error divisor is equal to zero, I know i need to build this as a case statement just tried a few things and cant get it to work, thanks in advance.
NVL(ROUND(((SELECT PC.BUCKET_ACCUM_COST
FROM PART_CB PC WHERE
PART_CB_NO = '201'
AND PC.PART_NO = I.PART_NO
AND
PC.CONTRACT = P.CONTRACT
AND
PC.TOP_LEVEL_PART_NO || '' =
Z_BEL_FINANCE_API.GET_PART_COST_TOP_PART_NO(P.CONTRACT, P.PART_NO,
P.COST_SET, P.ALTERNATIVE_NO,
P.ROUTING_ALTERNATIVE_NO)
AND
PC.COST_SET = P.COST_SET
AND
PC.COST_BUCKET_ID != 'SYS'
AND
PC.TOP_ALTERNATIVE_NO =
Z_BEL_FINANCE_API.GET_PART_COST_TOP_ALT_NO(P.CONTRACT, P.PART_NO,
P.COST_SET, P.ALTERNATIVE_NO,
P.ROUTING_ALTERNATIVE_NO)
AND
PC.TOP_ROUTING_NO =
Z_BEL_FINANCE_API.GET_PART_COST_TOP_ROUTING_NO(P.CONTRACT, P.PART_NO,
P.COST_SET, P.ALTERNATIVE_NO,
P.ROUTING_ALTERNATIVE_NO)
AND
PC.BUCKET_SEQ = Z_BEL_FINANCE_API.GET_PART_COST_BUCKET_SEQ(P.CONTRACT,
P.PART_NO, P.COST_SET, P.ALTERNATIVE_NO,
P.ROUTING_ALTERNATIVE_NO)) /
(SELECT WC_RATE
FROM WCT
WHERE WORK_CENTER_NO = 'COST1'
AND COST_SET = '1'
AND CONTRACT = P.CONTRACT)), 4), 0) MACHINE_SETUP_TIME,
The only dividing in this mess is here:
/ (SELECT WC_RATE FROM WCT ...)
If you don't want to divide with zero, you'll have to handle it.
For example, use DECODE (or CASE) and - if you want to get 0 as the result, divide with a very large number (e.g. 1E99)

Lua Random number generator always produces the same number

I have looked up several tutorials on how to generate random numbers with lua, each said to use math.random(), so I did. however, every time I use it I get the same number every time, I have tried rewriting the code, and I always get the lowest possible number. I even included a random seed based on the OS time. code below.
require "math"
math.randomseed(os.time())
num = math.random(0,10)
print(num)
I'm using the random function like this:
math.randomseed(os.time())
num = math.random() and math.random() and math.random() and math.random(0, 10)
This is working fine. An other option would be to improve the built-in random function, described here.
This might help! I had to use these functions to write a class that generates Nano IDs. I basically used the milliseconds from the os.clock() function and used that for math.randomseed().
NanoId = {
validCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-",
generate = function (size, validChars)
local response = ""
local ms = string.match(tostring(os.clock()), "%d%.(%d+)")
local temp = math.randomseed(ms)
if (size > 0 and string.len(validChars) > 0) then
for i = 1, size do
local num = math.random(string.len(validChars))
response = response..string.sub(validChars, num, num)
end
end
return response
end
}
function NanoId:Generate()
return self.generate(21, self.validCharacters)
end
-- Runtime Testing
for i = 1, 10 do
print(NanoId:Generate())
end
--[[
Output:
>>> p2r2-WqwvzvoIljKa6qDH
>>> pMoxTET2BrIjYUVXNMDNH
>>> w-nN7J0RVDdN6-R9iv4i-
>>> cfRMzXB4jZmc3quWEkAxj
>>> aFeYCA2kgOx-s4UN02s0s
>>> xegA--_EjEmcDk3Q1zh7K
>>> 6dkVRaNpW4cMwzCPDL3zt
>>> R2Fct5Up5OwnHeExDnqZI
>>> JwnlLZcp8kml-MHUEFAgm
>>> xPr5dULuv48UMaSTzdW5J
]]

how to generate endless random objects in corona SDK?

so I am very new to coding in general and I am trying to make a vertically-scrolling endless runner which basically involves jumping onto platforms to stay alive.I want to generate the same platform in three different locations endlessly. I basically copied some code from an article on the internet and then changed it around to try to make it suit my needs. However, when I run my code in the simulator, one platform is generated in the same location and no others appear. Also, when I look at the console, random numbers do appear. here is the code I am using
local blocks = display.newGroup ()
local groundMin = 200
local groundMax = 100
local groundLevel = groundMin
local function blockgenerate( event )
for a = 1, 1, -1 do
isDone = false
numGen = math.random(3)
local newBlock
print (numGen)
if (numGen == 1 and isDone == false) then
newBlock = display.newImage ("platform.jpg")
end
if (numGen == 2 and isDone == false) then
newBlock = display.newImage ("platform.jpg")
end
if (numGen == 3 and isDone == false) then
newBlock = display.newImage ("platform.jpg")
end
newBlock.name = ("block" .. a)
newBlock.id = a
newBlock.x = (a * 100) - 100
newBlock.y = groundLevel
blocks : insert(newBlock)
end
end
timer.performWithDelay (1000, blockgenerate, -1)
thank you very much in advance and sorry my description was so long
Your "a" variable is always going to be 1. Perhaps you meant to use:
a = a + 1

Detect duplicate videos from YouTube

In consideration to my M.tech Project
I want to know if there is any algorithm to detect duplicate videos from youtube.
For example (here are links of two videos):
random user upload
upload by official channel
Amongst these second is official video and T-series has it's copyright.
Is youtube officially doing something to remove duplicate videos from youtube?
Not only videos, there exists duplicate youtube channels also.
Sometimes the original video has less number of views than that of pirated version.
So, while searching found this
(see page number [49] of pdf)
What I learnt from the given link
Original vs copyright infringed video detection Classifier is used.
Given a query, firstly top k search results are being retrieved.Thereafter three parameters are used to classify the videos
Number of subscribers
user profile
username popularity
and on the basis of these parameters, original video is identified as described in the link.
EDIT 1:
There are basically two different objectives
To identify original video with the above method
To eliminate the duplicate videos
obviously identifying original video is easier than finding out all the duplicate videos.
So i preferred to first find out the original video.
Approach which i can think till now
to improve the accuracy:
We can first find out the original videos with above method
And then use the most popular publicized frames(may be multiple) of that video to search on google image. This method therefore retrieves the list of duplicate videos in google image search results.
After getting these duplicate videos, we can once again check frame by frame and reach a level of satisfaction(yes retrieved videos were "exact or "almost" duplicate copy of original video)
Will this approach work?
if not, is there any better algorithm, to improve upon the given method?
Please write in the comments section if i am unable to explain my approach clearly.
I will soon add some more details.
I've recently hacked together a small tool for that purpose. It's still work in progress but usually pretty accurate. The idea is to simply compare time between brightness maxima in the center of the video. Therefore it should work with different resolutions, frame rates and rotation of the video.
ffmpeg is used for decoding, imageio as bridge to python, numpy/scipy for maxima computation and some k-nearest-neighbor library (annoy, cyflann, hnsw) for comparison.
At the moment it's not polished at all so you should know a little python to run it or simply copy the idea.
Me too had the same problem.. So wrote a program myself..
Problem is I had videos of various formats and resolution.. So needed to take hash of each video frame and compare.
https://github.com/gklc811/duplicate_video_finder
you can just change the directories at top and you are good to go..
from os import path, walk, makedirs, rename
from time import clock
from imagehash import average_hash
from PIL import Image
from cv2 import VideoCapture, CAP_PROP_FRAME_COUNT, CAP_PROP_FRAME_WIDTH, CAP_PROP_FRAME_HEIGHT, CAP_PROP_FPS
from json import dump, load
from multiprocessing import Pool, cpu_count
input_vid_dir = r'C:\Users\gokul\Documents\data\\'
json_dir = r'C:\Users\gokul\Documents\db\\'
analyzed_dir = r'C:\Users\gokul\Documents\analyzed\\'
duplicate_dir = r'C:\Users\gokul\Documents\duplicate\\'
if not path.exists(json_dir):
makedirs(json_dir)
if not path.exists(analyzed_dir):
makedirs(analyzed_dir)
if not path.exists(duplicate_dir):
makedirs(duplicate_dir)
def write_to_json(filename, data):
file_full_path = json_dir + filename + ".json"
with open(file_full_path, 'w') as file_pointer:
dump(data, file_pointer)
return
def video_to_json(filename):
file_full_path = input_vid_dir + filename
start = clock()
size = round(path.getsize(file_full_path) / 1024 / 1024, 2)
video_pointer = VideoCapture(file_full_path)
frame_count = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_COUNT)))
width = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_WIDTH)))
height = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_HEIGHT)))
fps = int(VideoCapture.get(video_pointer, int(CAP_PROP_FPS)))
success, image = video_pointer.read()
video_hash = {}
while success:
frame_hash = average_hash(Image.fromarray(image))
video_hash[str(frame_hash)] = filename
success, image = video_pointer.read()
stop = clock()
time_taken = stop - start
print("Time taken for ", file_full_path, " is : ", time_taken)
data_dict = dict()
data_dict['size'] = size
data_dict['time_taken'] = time_taken
data_dict['fps'] = fps
data_dict['height'] = height
data_dict['width'] = width
data_dict['frame_count'] = frame_count
data_dict['filename'] = filename
data_dict['video_hash'] = video_hash
write_to_json(filename, data_dict)
return
def multiprocess_video_to_json():
files = next(walk(input_vid_dir))[2]
processes = cpu_count()
print(processes)
pool = Pool(processes)
start = clock()
pool.starmap_async(video_to_json, zip(files))
pool.close()
pool.join()
stop = clock()
print("Time Taken : ", stop - start)
def key_with_max_val(d):
max_value = 0
required_key = ""
for k in d:
if d[k] > max_value:
max_value = d[k]
required_key = k
return required_key
def duplicate_analyzer():
files = next(walk(json_dir))[2]
data_dict = {}
for file in files:
filename = json_dir + file
with open(filename) as f:
data = load(f)
video_hash = data['video_hash']
count = 0
duplicate_file_dict = dict()
for key in video_hash:
count += 1
if key in data_dict:
if data_dict[key] in duplicate_file_dict:
duplicate_file_dict[data_dict[key]] = duplicate_file_dict[data_dict[key]] + 1
else:
duplicate_file_dict[data_dict[key]] = 1
else:
data_dict[key] = video_hash[key]
if duplicate_file_dict:
duplicate_file = key_with_max_val(duplicate_file_dict)
duplicate_percentage = ((duplicate_file_dict[duplicate_file] / count) * 100)
if duplicate_percentage > 50:
file = file[:-5]
print(file, " is dup of ", duplicate_file)
src = analyzed_dir + file
tgt = duplicate_dir + file
if path.exists(src):
rename(src, tgt)
# else:
# print("File already moved")
def mv_analyzed_file():
files = next(walk(json_dir))[2]
for filename in files:
filename = filename[:-5]
src = input_vid_dir + filename
tgt = analyzed_dir + filename
if path.exists(src):
rename(src, tgt)
# else:
# print("File already moved")
if __name__ == '__main__':
mv_analyzed_file()
multiprocess_video_to_json()
mv_analyzed_file()
duplicate_analyzer()

Very simple set value of array cell, program very slow when he writes on specify column

I am using a continuous and old professional program. My program builds several simple data arrays and writes the array to an excel cell like this:
Sheets("toto").Cells(4,i) = "blabla"
But for one value of i, the write time is very long and I don't understand why.
Here is my code :
...
For No_Bug = 0 To Indtab - 1
If mesComments(No_Bug) <> "" Then
Sheets(feuille_LBT).Cells(Ligne_Bug, 1) = Ligne_Bug - 5
Sheets(feuille_LBT).Cells(Ligne_Bug, 2) = mesID_Test(No_Bug)
Sheets(feuille_LBT).Cells(Ligne_Bug, 3) = mesResultats(No_Bug)
Sheets(feuille_LBT).Cells(Ligne_Bug, 4) = mesComments(No_Bug)
Sheets(feuille_LBT).Cells(Ligne_Bug, 5).FormulaLocal = mesScreens(No_Bug)
Sheets(feuille_LBT).Cells(Ligne_Bug, 6) = 2 'If I comment only this line, the programm is fast, ifnot the programm is very slow (~1, 2 secondes per loop), What the hell ??? xD
Sheets(feuille_LBT).Cells(Ligne_Bug, 7) = 1
End If
...
Is this cell referenced from other cells? Check if any complicated computations related with this cell.

Resources