openpyxl convert scraped data in time format

openpyxl convert scraped data in time format - time

I would like to convert the data scraping from an internet site, regarding the time, the data is extracted like this (for example 9:15) and inserted into the cell, I would like at the bottom of the column to make the total of the hours, the problem I would like python to convert it to numerical format so that I can add it up.
any idea?
def excel():
# Writing on a EXCEL FILE
filename = f"Monatsplan {userfinder} {month} {year}.xlsx"
try:
wb = load_workbook(filename)
ws = wb.worksheets[0] # select first worksheet
except FileNotFoundError:
headers_row = [
"Datum",
"Tour",
"Funktion",
"Von",
"Bis",
"Schichtdauer",
"Bezahlte Zeit",
]
wb = Workbook()
ws = wb.active
ws.append(headers_row)
wb.save(filename)
ws.append(
[
datumcleaned[:10],
tagesinfo,
"",
"",
"",
"",
"",
]
)
wb.save(filename)
wb.close()
excel()

You should split the data you scrapped.
time_scrapped = '9:15'
time_split = time_scrapped.split(":")
hours = int(time_split[0])
minutes = int(time_split[1])
Then you can place it in separate columns and create formula at the bottom of the column.

Related

Fine-tune a pre-trained model

I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code:
enter image description here
and I got the following error:
enter image description here
I will be thankful if anyone could help.
The preprocessing steps I followed:
input_ids_t = []
attention_masks_t = []
for sent in df_train['text_a']:
encoded_dict = tokenizer.encode_plus(
sent,
add_special_tokens = True,
max_length = 128,
pad_to_max_length = True,
return_attention_mask = True,
return_tensors = 'tf',
)
input_ids_t.append(encoded_dict['input_ids'])
attention_masks_t.append(encoded_dict['attention_mask'])
# Convert the lists into tensors.
input_ids_t = tf.concat(input_ids_t, axis=0)
attention_masks_t = tf.concat(attention_masks_t, axis=0)
labels_t = np.asarray(df_train['label'])
and i did the same for testing data. Then:
train_data = tf.data.Dataset.from_tensor_slices((input_ids_t,attention_masks_t,labels_t))
and the same for testing data

It sounds like you are feeding the transformer_model 1 input instead of 3. Try removing the square brackets around transformer_model([input_ids, input_mask, segment_ids])[0] so that it reads transformer_model(input_ids, input_mask, segment_ids)[0]. That way, the function will have 3 arguments and not just 1.

for loop and same context (doxctpl)

I have made a code to generate the word document report and even with the for loop I end up getting multiple documents only difference in the image, meaning all other data such as title and volume and rate and price are same for all documents.
I used doxctpl , Docxtemplate for coding,
I created the template word doc with image and words.
then, I tried to change the words context first then change images in the coding.
for i in csv: #csv file has multiple columns named title, volume, rate, price info)
number = number + 1
DEST_FILE = "dir/auto_" + str(number) + ".docx" # to save individual doc file
Title = products[0].product_title
Volume = products[0].lastest_volume
Rate = products[0].evaluate_rate
Price = products[0].sale_price
context = {"Title": Title, "Volume": Volume, "Rate": Rate, "Price": Price}
print(context)
for file in files:
old_im = 'dir.media_to_paste.jpg'
new_im = f"image/{file}"
tpl.replace_media(old_im, new_im)
tpl.render(context)
tpl.save(DEST_FILE)
I changed the image to change first, but the result are same.
Results show as
auto1.docx
Image1 + Title 1, Volume 1....
auto2. docx
Image2 + Tilte 1, Volume 1 ....

for i in op[1:]:
number = number + 1
Title = i.split(",")[0]
Volume =i.split(",")[1]
Rate = i.split(",")[2]
Price = i.split(",")[3]
Currency = i.split(",")[4]
old_im = 'dir.media_to_paste.jpg'
new_im = f"image/{file}"
tpl.replace_media(old_im, new_im)
doc.render(context)
fixed

How to update bokeh active interaction with GeoJSON as data source?

I have made an interactive choropleth map with bokeh, and I'm trying to add active interactions using the dropdown widget (Select). However, most tutorials and SO questions about active interactions use ColumnDataSource, and not GeoJSONDataSource.
The issue is that GeoJSONDataSource doesn't have a .data method like ColumnDataSource does, so idk exactly how the syntax works when updating it.
My dataset is a dictionary in the form of city_dict = {'Amsterdam': <some data frame>, 'Antwerp': <some data frame>, ...}, where the dataframe is in geojson format. I have already confirmed that this format works when making glyphs.
def update(attr, old, new):
s_value = dropdown.value
p.title.text = '%s', s_value
new_src1 = make_dataset(s_value)
val1 = GeoJSONDataSource(new_src1)
r1.data_source = val1
where make_dataset is a function that transforms my original dataset into a dataset that can feed into the GeoJSONDataSource function. make_dataset requires a string (name of the city) to work eg. 'Amsterdam'. It works on passive interactions.
The main plot code (removed unnecessary stuff) is:
dropdown = Select(value='Amsterdam', options = cities)
controls = WidgetBox(dropdown)
initial_city = 'Amsterdam'
a = make_dataset(initial_city)
src1 = GeoJSONDataSource(a)
p = figure(title = 'Amsterdam', plot_height = 750 , plot_width = 900, toolbar_location = 'right')
r1 = p.patches('xs','ys', source = src1, fill_color = {'field' :'norm', 'transform' : color_mapper})
dropdown.on_change('value', update)
layout = row(controls, p)
curdoc().add_root(layout)
I've added the error I get. error handling message Message 'PATCH-DOC' (revision 1) content: {'events': [{'kind': 'ModelChanged', 'model': {'type': 'Select', 'id': '1147'}, 'attr': 'value', 'new': 'Antwerp'}], 'references': []}: ValueError("expected a value of type str, got ('%s', 'Antwerp') of type tuple",)

Using a CSV file to insert values using Ruby

I have some sample code I can execute for our Nexpose server and I need to do some mass asset tagging. Here is an example of the code.
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
nsc.login
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', ['ip1', 'ip2'])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new("tagname", Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
tag.save(nsc)
I have a file called with the following data.
ip1,ip2,tagname
192.168.1.1,192.168.1.255,Workstations
How would I go about running a for loop and using the CSV to quickly process the above code? I have no experiance with Ruby and tried to follow some example but I'm confused at this point.

There's a CSV library in Ruby's standard lib collection that you can use.
Basic example based on your code example and data, not tested:
require 'csv'
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
nsc.login
CSV.foreach("path/to/file.csv", headers: true) do |row|
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', [row['ip1'], row['ip2'])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new(row['tagname'], Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
tag.save(nsc)
end

I made a directory with input.csv and main.rb
input.csv
ip1,ip2,tagname
192.168.1.1,192.168.1.255,Workstations
main.rb
require "csv"
CSV.foreach("input.csv", headers: true) do |row|
puts "ip1: #{row['ip1']}"
puts "ip2: #{row['ip2']}"
puts "tagname: #{row['tagname']}"
end
the output is
ip1: 192.168.1.1
ip2: 192.168.1.255
tagname: Workstations
I hope this can help. If you have questions I'm here :)

If you just need to loop through each line of the file and fire that chunk of code for each line, you could do something like this:
file = Net::HTTP.get(URI(<whatever_your_file_name_is>))
index = 0
file.each_line do |line|
next if index == 0
index += 1
split_line = line.split(',')
ip1 = split_line[0]
ip2 = split_line[1]
tagname = split_line[2]
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
nsc.login
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', [ip1, ip2])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new(tagname, Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
tag.save(nsc)
end
NOTE: This code example is assuming that the CSV file is stored remotely, not locally.
ALSO: In case you're wondering, the next if index == 0 is there to skip your header record.
UPDATE
To use this approach for a local file, you can use File.open() instead of Net::HTTP.get(), like so:
file = File.open(<whatever_your_file_name_is>).read
Two things to note:
Make sure you use the fully-qualified name of the file - i.e. ~/folder/folder/filename.csv instead of just filename.csv.
If the files you're going to be loading are enormous, this might not be an ideal approach because it's actually reading the whole file into memory. But considering your file only has 3 columns, you'd have to have an extreme number of rows in the file for this to be an issue.

How to automatically turn BibTex citation into something parseable by Zotero?

I have a citation system which publishes users notes to a wiki (Researchr). Programmatically, I have access to the full BibTeX record of each entry, and I also display this on the individual pages (for example - click on BibTeX). This is in the interest of making it easy for users of other citation manager to automatically import the citation of a paper that interests them. I would also like other citation managers, especially Zotero, to be able to automatically detect and import a citation.
Zotero lists a number of ways of exposing metadata that it will understand, including meta tags with RDF, COiNS, Dublin Core and unAPI. Is there a Ruby library for converting BibTeX to any of these standards automatically - or a Javascript library? I could probably create something, but if something existed, it would be far more robust (BibTeX has so many publication types and fields etc).

There's a BibTeX2RDF convertor available here, might be what you're after.

unAPI is not a data standard - it's a way to serve data (to Zotero and other programs). Zotero imports Bibtex, so serving Bibtex via unAPI works just fine. Inspire is an example of a site that does that:
http://inspirehep.net/

By now one can simply import bibtex files of type .bib directly in Zotero. However, I noticed my bibtex files were often less complete than Zotero (in particular they often missed a DOI), and I did not find an "auto-complete" function (based on the data in the bibtex entries) in Zotero.
So I import the .bib file with Zotero, to ensure they are all in there. Then I run a python script that gets all the missing DOI's it can find for the entries in that .bib file, and exports them to a space separated .txt file.:
# pip install habanero
from habanero import Crossref
import re
def titletodoi(keyword):
cr = Crossref()
result = cr.works(query=keyword)
items = result["message"]["items"]
item_title = items[0]["title"]
tmp = ""
for it in item_title:
tmp += it
title = keyword.replace(" ", "").lower()
title = re.sub(r"\W", "", title)
# print('title: ' + title)
tmp = tmp.replace(" ", "").lower()
tmp = re.sub(r"\W", "", tmp)
# print('tmp: ' + tmp)
if title == tmp:
doi = items[0]["DOI"]
return doi
else:
return None
def get_dois(titles):
dois = []
for title in titles:
try:
doi = titletodoi(title)
print(f"doi={doi}, title={title}")
if not doi is None:
dois.append(doi)
except:
pass
# print("An exception occurred")
print(f"dois={dois}")
return dois
def read_titles_from_file(filepath):
with open(filepath) as f:
lines = f.read().splitlines()
split_lines = splits_lines(lines)
return split_lines
def splits_lines(lines):
split_lines = []
for line in lines:
new_lines = line.split(";")
for new_line in new_lines:
split_lines.append(new_line)
return split_lines
def write_dois_to_file(dois, filename, separation_char):
textfile = open(filename, "w")
for doi in dois:
textfile.write(doi + separation_char)
textfile.close()
filepath = "list_of_titles.txt"
titles = read_titles_from_file(filepath)
dois = get_dois(titles)
write_dois_to_file(dois, "dois_space.txt", " ")
write_dois_to_file(dois, "dois_per_line.txt", "\n")
The DOIs of the .txt are fed into magic wand of Zotero. Next, I (manually) remove the duplicates by choosing the latest added entry (because that comes from the magic wand with the most data).
After that, I run another script to update all the reference id's in my .tex and .bib files to those generated by Zotero:
# Importing library
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import *
import os, fnmatch
import Levenshtein as lev
# Let's define a function to customize our entries.
# It takes a record and return this record.
def customizations(record):
"""Use some functions delivered by the library
:param record: a record
:returns: -- customized record
"""
record = type(record)
record = author(record)
record = editor(record)
record = journal(record)
record = keyword(record)
record = link(record)
record = page_double_hyphen(record)
record = doi(record)
return record
def get_references(filepath):
with open(filepath) as bibtex_file:
parser = BibTexParser()
parser.customization = customizations
bib_database = bibtexparser.load(bibtex_file, parser=parser)
# print(bib_database.entries)
return bib_database
def get_reference_mapping(main_filepath, sub_filepath):
found_sub = []
found_main = []
main_into_sub = []
main_references = get_references(main_filepath)
sub_references = get_references(sub_filepath)
for main_entry in main_references.entries:
for sub_entry in sub_references.entries:
# Match the reference ID if 85% similair titles are detected
lev_ratio = lev.ratio(
remove_curly_braces(main_entry["title"]).lower(),
remove_curly_braces(sub_entry["title"]).lower(),
)
if lev_ratio > 0.85:
print(f"lev_ratio={lev_ratio}")
if main_entry["ID"] != sub_entry["ID"]:
print(f'replace: {sub_entry["ID"]} with: {main_entry["ID"]}')
main_into_sub.append([main_entry, sub_entry])
# Keep track of which entries have been found
found_sub.append(sub_entry)
found_main.append(main_entry)
return (
main_into_sub,
found_main,
found_sub,
main_references.entries,
sub_references.entries,
)
def remove_curly_braces(string):
left = string.replace("{", "")
right = left.replace("{", "")
return right
def replace_references(main_into_sub, directory):
for pair in main_into_sub:
main = pair[0]["ID"]
sub = pair[1]["ID"]
print(f"replace: {sub} with: {main}")
# UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
# findReplace(latex_root_dir, sub, main, "*.tex")
# findReplace(latex_root_dir, sub, main, "*.bib")
def findReplace(directory, find, replace, filePattern):
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, filePattern):
filepath = os.path.join(path, filename)
with open(filepath) as f:
s = f.read()
s = s.replace(find, replace)
with open(filepath, "w") as f:
f.write(s)
def list_missing(main_references, sub_references):
for sub in sub_references:
if not sub["ID"] in list(map(lambda x: x["ID"], main_references)):
print(f'the following reference has a changed title:{sub["ID"]}')
latex_root_dir = "some_path/"
main_filepath = f"{latex_root_dir}latex/Literature_study/zotero.bib"
sub_filepath = f"{latex_root_dir}latex/Literature_study/references.bib"
(
main_into_sub,
found_main,
found_sub,
main_references,
sub_references,
) = get_reference_mapping(main_filepath, sub_filepath)
replace_references(main_into_sub, latex_root_dir)
list_missing(main_references, sub_references)
# For those references which have levenshtein ratio below 85 you can specify a manual swap:
manual_swap = [] # main into sub
# manual_swap.append(["cantley_impact_2021","cantley2021impact"])
# manual_swap.append(["widemann_envision_2021","widemann2020envision"])
for pair in manual_swap:
main = pair[0]
sub = pair[1]
print(f"replace: {sub} with: {main}")
# UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
# findReplace(latex_root_dir, sub, main, "*.tex")
# findReplace(latex_root_dir, sub, main, "*.bib")

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

openpyxl convert scraped data in time format - time

You should split the data you scrapped. time_scrapped = '9:15' time_split = time_scrapped.split(":") hours = int(time_split[0]) minutes = int(time_split[1]) Then you can place it in separate columns and create formula at the bottom of the column.

Related

Fine-tune a pre-trained model

for loop and same context (doxctpl)

How to update bokeh active interaction with GeoJSON as data source?

Using a CSV file to insert values using Ruby

How to automatically turn BibTex citation into something parseable by Zotero?

Categories

Resources