for loop and same context (doxctpl) - image

I have made a code to generate the word document report and even with the for loop I end up getting multiple documents only difference in the image, meaning all other data such as title and volume and rate and price are same for all documents.
I used doxctpl , Docxtemplate for coding,
I created the template word doc with image and words.
then, I tried to change the words context first then change images in the coding.
for i in csv: #csv file has multiple columns named title, volume, rate, price info)
number = number + 1
DEST_FILE = "dir/auto_" + str(number) + ".docx" # to save individual doc file
Title = products[0].product_title
Volume = products[0].lastest_volume
Rate = products[0].evaluate_rate
Price = products[0].sale_price
context = {"Title": Title, "Volume": Volume, "Rate": Rate, "Price": Price}
print(context)
for file in files:
old_im = 'dir.media_to_paste.jpg'
new_im = f"image/{file}"
tpl.replace_media(old_im, new_im)
tpl.render(context)
tpl.save(DEST_FILE)
I changed the image to change first, but the result are same.
Results show as
auto1.docx
Image1 + Title 1, Volume 1....
auto2. docx
Image2 + Tilte 1, Volume 1 ....

for i in op[1:]:
number = number + 1
Title = i.split(",")[0]
Volume =i.split(",")[1]
Rate = i.split(",")[2]
Price = i.split(",")[3]
Currency = i.split(",")[4]
old_im = 'dir.media_to_paste.jpg'
new_im = f"image/{file}"
tpl.replace_media(old_im, new_im)
doc.render(context)
fixed

Related

openpyxl convert scraped data in time format

I would like to convert the data scraping from an internet site, regarding the time, the data is extracted like this (for example 9:15) and inserted into the cell, I would like at the bottom of the column to make the total of the hours, the problem I would like python to convert it to numerical format so that I can add it up.
any idea?
def excel():
# Writing on a EXCEL FILE
filename = f"Monatsplan {userfinder} {month} {year}.xlsx"
try:
wb = load_workbook(filename)
ws = wb.worksheets[0] # select first worksheet
except FileNotFoundError:
headers_row = [
"Datum",
"Tour",
"Funktion",
"Von",
"Bis",
"Schichtdauer",
"Bezahlte Zeit",
]
wb = Workbook()
ws = wb.active
ws.append(headers_row)
wb.save(filename)
ws.append(
[
datumcleaned[:10],
tagesinfo,
"",
"",
"",
"",
"",
]
)
wb.save(filename)
wb.close()
excel()
You should split the data you scrapped.
time_scrapped = '9:15'
time_split = time_scrapped.split(":")
hours = int(time_split[0])
minutes = int(time_split[1])
Then you can place it in separate columns and create formula at the bottom of the column.

Python Matplotlib Add 2 Dynamic Title Components

I have 6 subplots that need 2 dynamic title components and I can code for 1 but I'm not sure how to change my code below to add a 2nd dynamic title component on the same line after searching the literature. Here is my for loop to generate the 6 subplots with the "plt.title.." line below:
list = [0,1,2,3,4,5]
now = datetime.datetime.now()
currm = now.month
import calendar
fig, ax = plt.subplots(6)
for x in list:
dam = DS.where(DS['time.year']==rmax.iloc[x,1]).groupby('time.month').mean()#iterate by index of
column "1" or the years
dam = dam.sel(month=3)#current month mean 500
dam = dam.sel(level=500)
damc = dam.to_array()
lats = damc['lat'].data
lons = damc['lon'].data
#plot data
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines(lw=1)
damc = damc.squeeze()
cnplot = plt.contour(lons,lats,damc,cmap='jet')
plt.title('Mean 500mb Hgt + Phase {} 2020'.format(calendar.month_name[currm-1]))
plt.show()
#plt.clf()
I need to add one of each from this list in the loop to the "plt.title.." between the "+" and the word "Phase" line above...?
tindices = ['SOI','AO','NAO','PNA','EPO','PDO']
Thank you for any help with this!
Try accessing the tindices one by one and passing them to the title
plt.title('Mean 500mb Hgt + {} Phase {} 2020'.format(tindices[x],
calendar.month_name[currm-1]))

In windows, how do I find out a folders sort by parameters

I am building an image viewing app in Node.js. I noticed that in Windows, the pictures in a folder can be sorted by name, size, status, type, date and tags etc, and grouped after sorting by the same list and more.
Is there a way of getting the sort parameters or maybe just retrieving the sorted list of files, matching the regular expression /\.(jpg|jpg_large|jpeg|jpe|jfif|jif|jfi|jpe|gif|png|ico|bmp|webp|svg)$/i, as an array (ex: ['c:\man.jpg', 'c:\woman.jpg'] using Powershell?
EDIT:
This article got me closer to a solution. https://cyberforensicator.com/2019/02/03/shellbags-forensics-directory-viewing-preferences/
Unfortunately it doesn't explain how to get the nodelist value for a given folder so I used an app called shellbagsview from nirsoft to get this value. In any case, if the value is found the rest is easy. I have included a sample python script which explains how this is done here.
from winreg import *
# Registry is of the form:
# HKEY_CURRENT_USER\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\Bags\1375\Shell\{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7}
# where 1375 is a value called the NodeList, and {5C4F28B5-F869-4E84-8E60-F11DB97C5CC7} is a value under Shell chosen based on creation date. It is a good idea to look at the registry after getting the nodelist from shellbagsview
folder_reg_path = "Software\\Classes\\Local Settings\\Software\\Microsoft\\Windows\Shell\\Bags\\1375\\Shell\\{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7}"
# the size of icons used by the folder
def get_folder_icon_size(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'IconSize')
return '%d pixels' % (value[0])
# the folder view. details, list, tiles e.t.c
def get_logical_view_mode(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'LogicalViewMode')
logical_view_mode_dict = {1 : "Details view", 2 : "Tiles view", 3 : "Icons view", 4 : "List view", 5 : "Content view"}
return logical_view_mode_dict[value[0]]
# folder view is based on view mode. so you can have a logical view mode of icons view with a view mode of large icons for instance
def get_folder_view_mode(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'Mode')
# view_mode 7 is only available on xp. A dead os
view_mode_dict = {1 : "Medium icons", 2 : "Small icons", 3 : "List", 4 : "Details", 5 : "Thumbnail icons", 6 : "Large icons", 8 : "Content"}
return view_mode_dict[value[0]]
# how is the folder being sorted
def get_folder_sort_by(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'Sort')
folder_sort_dict = {"0E000000" : "Date Modified", "10000000" : "Date Accessed", "0F000000" : "Date Created", "0B000000" : "Type", "0C000000" : "Size", "0A000000" : "Name", "02000000" : "Title", "05000000" : "Tags"}
# we get a byte value which we will hexify and get a rather long string
# similar to : 000000000000000000000000000000000100000030f125b7ef471a10a5f102608c9eebac0c000000ffffffff
reg_value = value[0].hex()
# now for this string, we need to get the last 16 strings. then we now get the first 8 out of it. so we will have
folder_sort_dict_key = (reg_value[-16:][:8]).upper()
return folder_sort_dict[folder_sort_dict_key]
# in what order is the folder being sorted. ascending or descending???
def get_folder_sort_by_order(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'Sort')
folder_sort_dict = {"01000000" : "Ascending", "FFFFFFFF" : "Descending"}
# we get a byte value which we will hexify and get a rather long string
# similar to : 000000000000000000000000000000000100000030f125b7ef471a10a5f102608c9eebac0c000000ffffffff
reg_value = value[0].hex()
# now for this string, we need to get the last 16 strings. then we now get the last 8 out of it. so we will have
folder_sort_dict_key = (reg_value[-16:][-8:]).upper()
return folder_sort_dict[folder_sort_dict_key]
icon_size = get_folder_icon_size(folder_reg_path)
logical_view_mode = get_logical_view_mode(folder_reg_path)
view_mode = get_folder_view_mode(folder_reg_path)
sorted_by = get_folder_sort_by(folder_reg_path)
sorted_by_order = get_folder_sort_by_order(folder_reg_path)
print ('The folder icon size is %s' % icon_size)
print('The folder logical view mode is %s' % logical_view_mode)
print('The folder view mode is %s' % view_mode)
print('The folder is sorted by %s in %s order' % (sorted_by, sorted_by_order))
The question itself and
the environment to run this in is unclear.
As you reference PowerShell and a RegEx to limit to specific extensions,
With this sample tree:
> tree /f a:\
A:\
└───Test
boy.bmp
child.ico
girl.gif
man.jpg
woman.jpg
this script:
Get-ChildItem -Path A:\Test -File |
Where-Object Extension -match '\.(jpg|jpg_large|jpeg|jpe|jfif|jif|jfi|jpe|gif|png|ico|bmp|webp|svg)$' |
Sort-Object Name |
Select-Object -ExpandProperty FullName |
ConvertTo-Json -Compress
yields:
["A:\\Test\\boy.bmp","A:\\Test\\child.ico","A:\\Test\\girl.gif","A:\\Test\\man.jpg","A:\\Test\\woman.jpg"]
The IShellView implementation (the file list part of Explorer) asks its IShellBrowser for a stream when it needs to load/save its state. My suggestion would be to host a IExplorerBrowser instance "browsed to the folder" and ask the view for its items. I don't know if you can ask it about which column it has sorted by but just getting the items in sorted order should be enough for your needs.
I don't know how to this in a scripting language but I assume PS supports enough COM for it to be possible.

Detect duplicate videos from YouTube

In consideration to my M.tech Project
I want to know if there is any algorithm to detect duplicate videos from youtube.
For example (here are links of two videos):
random user upload
upload by official channel
Amongst these second is official video and T-series has it's copyright.
Is youtube officially doing something to remove duplicate videos from youtube?
Not only videos, there exists duplicate youtube channels also.
Sometimes the original video has less number of views than that of pirated version.
So, while searching found this
(see page number [49] of pdf)
What I learnt from the given link
Original vs copyright infringed video detection Classifier is used.
Given a query, firstly top k search results are being retrieved.Thereafter three parameters are used to classify the videos
Number of subscribers
user profile
username popularity
and on the basis of these parameters, original video is identified as described in the link.
EDIT 1:
There are basically two different objectives
To identify original video with the above method
To eliminate the duplicate videos
obviously identifying original video is easier than finding out all the duplicate videos.
So i preferred to first find out the original video.
Approach which i can think till now
to improve the accuracy:
We can first find out the original videos with above method
And then use the most popular publicized frames(may be multiple) of that video to search on google image. This method therefore retrieves the list of duplicate videos in google image search results.
After getting these duplicate videos, we can once again check frame by frame and reach a level of satisfaction(yes retrieved videos were "exact or "almost" duplicate copy of original video)
Will this approach work?
if not, is there any better algorithm, to improve upon the given method?
Please write in the comments section if i am unable to explain my approach clearly.
I will soon add some more details.
I've recently hacked together a small tool for that purpose. It's still work in progress but usually pretty accurate. The idea is to simply compare time between brightness maxima in the center of the video. Therefore it should work with different resolutions, frame rates and rotation of the video.
ffmpeg is used for decoding, imageio as bridge to python, numpy/scipy for maxima computation and some k-nearest-neighbor library (annoy, cyflann, hnsw) for comparison.
At the moment it's not polished at all so you should know a little python to run it or simply copy the idea.
Me too had the same problem.. So wrote a program myself..
Problem is I had videos of various formats and resolution.. So needed to take hash of each video frame and compare.
https://github.com/gklc811/duplicate_video_finder
you can just change the directories at top and you are good to go..
from os import path, walk, makedirs, rename
from time import clock
from imagehash import average_hash
from PIL import Image
from cv2 import VideoCapture, CAP_PROP_FRAME_COUNT, CAP_PROP_FRAME_WIDTH, CAP_PROP_FRAME_HEIGHT, CAP_PROP_FPS
from json import dump, load
from multiprocessing import Pool, cpu_count
input_vid_dir = r'C:\Users\gokul\Documents\data\\'
json_dir = r'C:\Users\gokul\Documents\db\\'
analyzed_dir = r'C:\Users\gokul\Documents\analyzed\\'
duplicate_dir = r'C:\Users\gokul\Documents\duplicate\\'
if not path.exists(json_dir):
makedirs(json_dir)
if not path.exists(analyzed_dir):
makedirs(analyzed_dir)
if not path.exists(duplicate_dir):
makedirs(duplicate_dir)
def write_to_json(filename, data):
file_full_path = json_dir + filename + ".json"
with open(file_full_path, 'w') as file_pointer:
dump(data, file_pointer)
return
def video_to_json(filename):
file_full_path = input_vid_dir + filename
start = clock()
size = round(path.getsize(file_full_path) / 1024 / 1024, 2)
video_pointer = VideoCapture(file_full_path)
frame_count = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_COUNT)))
width = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_WIDTH)))
height = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_HEIGHT)))
fps = int(VideoCapture.get(video_pointer, int(CAP_PROP_FPS)))
success, image = video_pointer.read()
video_hash = {}
while success:
frame_hash = average_hash(Image.fromarray(image))
video_hash[str(frame_hash)] = filename
success, image = video_pointer.read()
stop = clock()
time_taken = stop - start
print("Time taken for ", file_full_path, " is : ", time_taken)
data_dict = dict()
data_dict['size'] = size
data_dict['time_taken'] = time_taken
data_dict['fps'] = fps
data_dict['height'] = height
data_dict['width'] = width
data_dict['frame_count'] = frame_count
data_dict['filename'] = filename
data_dict['video_hash'] = video_hash
write_to_json(filename, data_dict)
return
def multiprocess_video_to_json():
files = next(walk(input_vid_dir))[2]
processes = cpu_count()
print(processes)
pool = Pool(processes)
start = clock()
pool.starmap_async(video_to_json, zip(files))
pool.close()
pool.join()
stop = clock()
print("Time Taken : ", stop - start)
def key_with_max_val(d):
max_value = 0
required_key = ""
for k in d:
if d[k] > max_value:
max_value = d[k]
required_key = k
return required_key
def duplicate_analyzer():
files = next(walk(json_dir))[2]
data_dict = {}
for file in files:
filename = json_dir + file
with open(filename) as f:
data = load(f)
video_hash = data['video_hash']
count = 0
duplicate_file_dict = dict()
for key in video_hash:
count += 1
if key in data_dict:
if data_dict[key] in duplicate_file_dict:
duplicate_file_dict[data_dict[key]] = duplicate_file_dict[data_dict[key]] + 1
else:
duplicate_file_dict[data_dict[key]] = 1
else:
data_dict[key] = video_hash[key]
if duplicate_file_dict:
duplicate_file = key_with_max_val(duplicate_file_dict)
duplicate_percentage = ((duplicate_file_dict[duplicate_file] / count) * 100)
if duplicate_percentage > 50:
file = file[:-5]
print(file, " is dup of ", duplicate_file)
src = analyzed_dir + file
tgt = duplicate_dir + file
if path.exists(src):
rename(src, tgt)
# else:
# print("File already moved")
def mv_analyzed_file():
files = next(walk(json_dir))[2]
for filename in files:
filename = filename[:-5]
src = input_vid_dir + filename
tgt = analyzed_dir + filename
if path.exists(src):
rename(src, tgt)
# else:
# print("File already moved")
if __name__ == '__main__':
mv_analyzed_file()
multiprocess_video_to_json()
mv_analyzed_file()
duplicate_analyzer()

How to extract string from large file only if specific string appears previous using Ruby?

I am trying to extract information from a large file and cannot figure out how to extract strings from file lines only when a previous line in the same record within the file has been matched by regex. An example of one record in the file is as follows:
*NEW RECORD
RECTYPE = D
MH = Informed Consent
AQ = ES HI LJ PX SN ST
ENTRY = Consent, Informed
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = Disclosure
FX = Mental Competency
FX = Therapeutic Misconception
FX = Treatment Refusal
ST = T058
ST = T078
AN = competency to consent: coordinate IM with MENTAL COMPETENCY (IM)
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Voluntary authorization, by a patient or research subject, etc,...
This file contains over 20,000 records like this example. I want to identify a small percent of those records using the "MH" field. In this example, I want to find "Informed Consent", and then use regex to extract the information in the FX, AN, and MS fields only within that record. So far, I have opened the file, accessed the hash that the MH terms are stored in, and been able to extract those terms from the records in the file. I also have a functioning regex that identifies the content in the "FX" field.
File.open('mesh_descriptor.bin').each do |file_line|
file_line = file_line.chomp
# read each key of candidate_descriptor_keys
candidate_descriptor_keys.each do |cand_term|
if file_line =~ /^MH\s=\s(#{cand_term})$/
mesh_header = $1
puts "MH from Mesh Descriptor file is: #{mesh_header}"
if file_line =~ /^FX\s=\s(.*)$/
see_also = $1
puts " See_Also from Descriptor file is: #{see_also}"
end
end
end
end
The hash contains the following MH (keys):
candidate_descriptor_keys = ["Body Weight", "Obesity", "Thinness", "Fetal Weight", "Overweight"]
I had success extracting "FX" when I put the statement outside of the "if" statement to extract "MH", but all of the "FX" from the whole file were retrieved - not what I need. I thought putting the "if" statement for "FX" within the previous "if" statement would restrict the results to only those found when the first statement is true, but I am getting no results (also no errors) with this strategy. What I would like as a result is:
> Informed Consent
> Disclosure
> Mental Competency
> Therapeutic Misconception
> Treatment Refusal
as well as the strings within the "AN" and "MS" fields for only those records matching "MH". Any suggestions would be helpful!
I think this may be what you are looking for, but if not, let me know and I will change it. Look especially at the very end to see if that is the sort of output (for input having two records, both with a "MH" field) you want. I will also add a "explanation" section at the end once I have understood your question correctly.
I have assumed that each record begins
*NEW_RECORD
and you wish to identify all lines beginning "MH" whose field is one of the elements of:
candidate_descriptor_keys =
["Body Weight", "Obesity", "Thinness", "Informed Consent"]
and for each match, you would like to print the contents of the lines for the same record that begin with "FX", "AN" and "MS".
Code
NEW_RECORD_MARKER = "*NEW RECORD"
def getem(fname, candidate_descriptor_keys)
line = 0
found_mh = false
File.open(fname).each do |file_line|
file_line = file_line.strip
case
when file_line == NEW_RECORD_MARKER
puts # space between records
found_mh = false
when found_mh == false
candidate_descriptor_keys.each do |cand_term|
if file_line =~ /^MH\s=\s(#{cand_term})$/
found_mh = true
puts "MH from line #{line} of file is: #{cand_term}"
break
end
end
when found_mh
["FX", "AN", "MS"].each do |des|
if file_line =~ /^#{des}\s=\s(.*)$/
see_also = $1
puts " Line #{line} of file is: #{des}: #{see_also}"
end
end
end
line += 1
end
end
Example
Let's begin be creating a file, starging with a "here document that contains two records":
records =<<_
*NEW RECORD
RECTYPE = D
MH = Informed Consent
AQ = ES HI LJ PX SN ST
ENTRY = Consent, Informed
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = Disclosure
FX = Mental Competency
FX = Therapeutic Misconception
FX = Treatment Refusal
ST = T058
ST = T078
AN = competency to consent
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Voluntary authorization
*NEW RECORD
MH = Obesity
AQ = ES HI LJ PX SN ST
ENTRY = Obesity
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = 1st FX
FX = 2nd FX
AN = Only AN
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Only MS
_
If you puts records you will see it is just a string. (You'll see that I shortened two of them.) Now write it to a file:
File.write('mesh_descriptor', records)
If you wish to confirm the file contents, you could do this:
puts File.read('mesh_descriptor')
We also need to define define the array candidate_descriptor_keys:
candidate_descriptor_keys =
["Body Weight", "Obesity", "Thinness", "Informed Consent"]
We can now execute the method getem:
getem('mesh_descriptor', candidate_descriptor_keys)
MH from line 2 of file is: Informed Consent
Line 7 of file is: FX: Disclosure
Line 8 of file is: FX: Mental Competency
Line 9 of file is: FX: Therapeutic Misconception
Line 10 of file is: FX: Treatment Refusal
Line 13 of file is: AN: competency to consent
Line 16 of file is: MS: Voluntary authorization
MH from line 18 of file is: Obesity
Line 23 of file is: FX: 1st FX
Line 24 of file is: FX: 2nd FX
Line 25 of file is: AN: Only AN
Line 28 of file is: MS: Only MS

Resources