discord.py sqlite3.OperationalError: no such column: - discord.py

...
winner_of_lottery = bot.get_user(person_id)
money = 100 * len(list_of_participants)
if winner_of_lottery and id != 9090:
base.execute("UPDATE users SET balance = balance + {} WHERE id = {}".format(money, winner))
base.execute("""UPDATE general SET last_winner = {} WHERE num = 1""".format(winner_of_lottery.name))
base.execute("""UPDATE general SET last_winner_sum = {} WHERE num = 1""".format(money))
Error
sqlite3.OperationalError: no such column:
in line
base.execute("""UPDATE general SET last_winner = {} WHERE num = 1""".format(winner_of_lottery.name))
sql table:
cursor.execute("""CREATE TABLE IF NOT EXISTS general (
last_winner TEXT,
last_winner_sum INT,
num INT
)""")
I don't know what am I need to do. Please, help
I try clear this problem since yesterday.
P.S. table is not empty
if cursor.execute("""SELECT last_winner FROM general WHERE num = 1""").fetchone() is None:
cursor.execute("""INSERT INTO general VALUES ("аноним:alien:", 200, 1)""")
base.commit()

Related

probleme de renitialisation d'un QtreeWidget

I am developing an application that allows to place orders with python and QtDesigner. I can't manage to place two commands in a row. The first command passes without any problem but when I want to place another command without closing the application, this error is displayed: "self.ui.treeWidgetcommand.topLevelItem(self.Line ).setText(0, str(Id))
AttributeError: 'NoneType' object has no attribute 'setText'".
def AddCommande(self):
QtWidgets.QTreeWidgetItem(self.ui.treeWidgetcommande)
Libelle = self.ui.comboBoxproduit.currentText()
Qte = int(self.ui.lineEditQteproduit.text())
Info = self.stock.GetProductName(Libelle)[0]
Id = str(int(Info[0]))
Pu = Info[1]
Total = int(Qte)*int(Pu)
data=(Libelle,Qte,Id,Pu,Total)
#print(data)
self.ui.treeWidgetcommande.topLevelItem(self.Ligne).setText(0, str(Id))
self.ui.treeWidgetcommande.topLevelItem(self.Ligne).setText(1, str(Libelle))
self.ui.treeWidgetcommande.topLevelItem(self.Ligne).setText(2, str(Qte))
self.ui.treeWidgetcommande.topLevelItem(self.Ligne).setText(3, str(Pu))
self.ui.treeWidgetcommande.topLevelItem(self.Ligne).setText(4, str(Total))
self.Ligne +=1
def ValiderCommande(self):
Client = self.ui.comboBoxclient.currentText()
IdClient = self.stock.GetClientIdByName(Client.split(" ")[0])
PrixTotal = 0
UniqueId = random.random()
Date = date.today()
Data = (IdClient,PrixTotal,Date,UniqueId)
if self.stock.AddCommande(Data) == 0:
for i in range(self.Ligne):
IdCommande = self.stock.GetClientIdByUniqueId(UniqueId)
Libelle = self.ui.treeWidgetcommande.topLevelItem(i).text(1)
IdProduit = self.ui.treeWidgetcommande.topLevelItem(i).text(0)
Pu = self.ui.treeWidgetcommande.topLevelItem(i).text(3)
Qte = self.ui.treeWidgetcommande.topLevelItem(i).text(2)
Total = int(self.ui.treeWidgetcommande.topLevelItem(i).text(4))
InfoData = (IdCommande, Libelle, Qte, Pu, Total)
data = (Qte,IdProduit)
if self.stock.AjoutInfoCommande(InfoData) == 0:
PrixTotal += Total
self.stock.UpdateQteStock(data)
if self.stock.UpdateCommande(PrixTotal,IdCommande) == 0:
self.ui.treeWidgetcommande.clear()
#self.ui.treeWidgetcommande.topLevelItem(self.Ligne).setHidden(True)
self.ui.lineEditQteproduit.setText(" ")
`
I would like after placing an order, reset my treeWidget array and be able to place other orders.

applyInPandas() aggregation runs slowly on big delta table

I'm trying to create a gold table notebook in Databricks, however it would take 9 days to fully reprocess the historical data (43GB, 35k parquet files). I tried scaling up the cluster but it doesn't go above 5000 records/second. The bottleneck seems to be the applyInPandas() function. I'm wondering if I could replace pandas with anything else to make the gold notebook execute faster.
Silver table has 60 columns (read_id, reader_id, tracker_timestamp, event_type, ebook_id, page_id, agent_ip, agent_device_type, ...). Each row of data represents read event of an ebook. E.g 'page turn', 'click on image', 'click on link',... All of the events that have occurred in the single session have the same read.id. In the gold table I'm trying to group those events in sessions and calculate the number of times each event has occurred in the single session. So instead of 100+ rows of data for a read session in silver table I would end up just with a single aggregated row in gold table.
Input is the silver delta table:
import pyspark.sql.functions as F
import pyspark.sql.types as T
import pandas as pd
from pyspark.sql.functions import pandas_udf
input = (spark
.readStream
.format("delta")
.option("withEventTimeOrder", "true")
.option("maxFilesPerTrigger", 100)
.load(f"path_to_silver_bucket")
)
I use withWatermark and session_window functions to ensure I end up grouping all of the events from the single read session. (read session automatically ends 30 minutes after the last reader activity)
group = input.withWatermark("tracker_timestamp", "10 minutes").groupBy("read_id", F.session_window(input.tracker_timestamp, "30 minutes"))
In the next step I use the applyInPandas function like so:
sessions = group.applyInPandas(processing_function, schema=processing_function_output_schema)
Definition of the processing_function used in applyInPandas:
def processing_function(df):
surf_time_ms = df.query('event_type == "surf"')['duration'].sum()
immerse_time_ms = df.query('event_type == "immersion"')['duration'].sum()
min_timestamp = df['tracker_timestamp'].min()
max_timestamp = df['tracker_timestamp'].max()
shares = len(df.query('event_type == "share"'))
leads = len(df.query('event_type == "lead_store"'))
is_read = len(df.query('event_type == "surf"')) > 0
distinct_pages = df['page_id'].nunique()
data = {
"read_id": df['read_id'].values[0],
"surf_time_ms": surf_time_ms,
"immerse_time_ms": immerse_time_ms,
"min_timestamp": min_timestamp,
"max_timestamp": max_timestamp,
"shares": shares,
"leads": leads,
"is_read": is_read,
"number_of_events": len(df),
"distinct_pages": distinct_pages
}
for field in not_calculated_string_fields:
data[field] = df[field].values[0]
new_df = pd.DataFrame(data=data, index=['read_id'])
for x in all_events:
new_df[f"count_{x}"] = df.query(f"type == '{x}'").count()
for x in duration_events:
duration = df.query(f"event_type == '{x}'")['duration']
duration_sum = duration.sum()
new_df[f"duration_{x}_ms"] = duration_sum
if duration_sum > 0:
new_df[f"mean_duration_{x}_ms"] = duration.mean()
else:
new_df[f"mean_duration_{x}_ms"] = 0
return new_df
And finally, I'm writing the calculated row to the gold table like so:
for_partitioning = (sessions
.withColumn("tenant", F.col("story_tenant"))
.withColumn("year", F.year(F.col("min_timestamp")))
.withColumn("month", F.month(F.col("min_timestamp"))))
checkpoint_path = "checkpoint-path"
gold_path = f"gold-bucket"
(for_partitioning
.writeStream
.format('delta')
.partitionBy('year', 'month', 'tenant')
.option("mergeSchema", "true")
.option("checkpointLocation", checkpoint_path)
.outputMode("append")
.start(gold_path))
Can anybody think of a more efficient way to do a UDF in PySpark than applyInPandas for the above example? I simply cannot afford to wait 9 days to reprocess 43GB of data...
I've tried playing around with different input and output options (e.g. .option("maxFilesPerTrigger", 100)) but the real problem seems to be applyInPandas.
You could rewrite your processing_function into native Spark if you really wanted.
"read_id": df['read_id'].values[0]
F.first('read_id').alias('read_id')
"surf_time_ms": df.query('event_type == "surf"')['duration'].sum()
F.sum(F.when(F.col('event_type') == 'surf', F.col('duration'))).alias('surf_time_ms')
"immerse_time_ms": df.query('event_type == "immersion"')['duration'].sum()
F.sum(F.when(F.col('event_type') == 'immersion', F.col('duration'))).alias('immerse_time_ms')
"min_timestamp": df['tracker_timestamp'].min()
F.min('tracker_timestamp').alias('min_timestamp')
"max_timestamp": df['tracker_timestamp'].max()
F.max('tracker_timestamp').alias('max_timestamp')
"shares": len(df.query('event_type == "share"'))
F.count(F.when(F.col('event_type') == 'share', F.lit(1))).alias('shares')
"leads": len(df.query('event_type == "lead_store"'))
F.count(F.when(F.col('event_type') == 'lead_store', F.lit(1))).alias('leads')
"is_read": len(df.query('event_type == "surf"')) > 0
(F.count(F.when(F.col('event_type') == 'surf', F.lit(1))) > 0).alias('is_read')
"number_of_events": len(df)
F.count(F.lit(1)).alias('number_of_events')
"distinct_pages": df['page_id'].nunique()
F.countDistinct('page_id').alias('distinct_pages')
for field in not_calculated_string_fields:
data[field] = df[field].values[0]
*[F.first(field).alias(field) for field in not_calculated_string_fields]
for x in all_events:
new_df[f"count_{x}"] = df.query(f"type == '{x}'").count()
The above can probably be skipped? As far as my tests go, new columns get NaN values, because .count() returns a Series object instead of one simple value.
for x in duration_events:
duration = df.query(f"event_type == '{x}'")['duration']
duration_sum = duration.sum()
new_df[f"duration_{x}_ms"] = duration_sum
if duration_sum > 0:
new_df[f"mean_duration_{x}_ms"] = duration.mean()
else:
new_df[f"mean_duration_{x}_ms"] = 0
*[F.sum(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"duration_{x}_ms") for x in duration_events]
*[F.mean(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"mean_duration_{x}_ms") for x in duration_events]
So, instead of
def processing_function(df):
...
...
sessions = group.applyInPandas(processing_function, schema=processing_function_output_schema)
you could use efficient native Spark:
sessions = group.agg(
F.first('read_id').alias('read_id'),
F.sum(F.when(F.col('event_type') == 'surf', F.col('duration'))).alias('surf_time_ms'),
F.sum(F.when(F.col('event_type') == 'immersion', F.col('duration'))).alias('immerse_time_ms'),
F.min('tracker_timestamp').alias('min_timestamp'),
F.max('tracker_timestamp').alias('max_timestamp'),
F.count(F.when(F.col('event_type') == 'share', F.lit(1))).alias('shares'),
F.count(F.when(F.col('event_type') == 'lead_store', F.lit(1))).alias('leads'),
(F.count(F.when(F.col('event_type') == 'surf', F.lit(1))) > 0).alias('is_read'),
F.count(F.lit(1)).alias('number_of_events'),
F.countDistinct('page_id').alias('distinct_pages'),
*[F.first(field).alias(field) for field in not_calculated_string_fields],
# skipped count_{x}
*[F.sum(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"duration_{x}_ms") for x in duration_events],
*[F.mean(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"mean_duration_{x}_ms") for x in duration_events],
)

Understanding the distance metric in company name matching using KNN

I am trying to understand the following code that I found for matching a messy list of company names to a list of clean list of company names. My question is what the 'Ratio' metric is calculated using. It appears that the ratio is from scorer = fuzz.token_sort_ratio which is I understand is part of the fuzzywuzzy package and therefore a levenschtein distance calculation correct? I'm trying to understand why the author uses this as the scorer rather than the distance output from KNN. When I try changing the metric inside NearestNeighbors, it doesn't appear to change the results. Does the metric in NearestNeighbors matter then?
Original article:
https://audhiaprilliant.medium.com/fuzzy-string-matching-optimization-using-tf-idf-and-knn-b07fce69b58f
def build_vectorizer(
clean: pd.Series,
analyzer: str = 'char',
ngram_range: Tuple[int, int] = (1, 4),
n_neighbors: int = 1,
**kwargs
) -> Tuple:
# Create vectorizer
vectorizer = TfidfVectorizer(analyzer = analyzer, ngram_range = ngram_range, **kwargs)
X = vectorizer.fit_transform(clean.values.astype('U'))
# Fit nearest neighbors corpus
nbrs = NearestNeighbors(n_neighbors = n_neighbors, metric = 'cosine').fit(X)
return vectorizer, nbrs
# String matching - KNN
def tfidf_nn(
messy,
clean,
n_neighbors = 1,
**kwargs
):
# Fit clean data and transform messy data
vectorizer, nbrs = build_vectorizer(clean, n_neighbors = n_neighbors, **kwargs)
input_vec = vectorizer.transform(messy)
# Determine best possible matches
distances, indices = nbrs.kneighbors(input_vec, n_neighbors = n_neighbors)
nearest_values = np.array(clean)[indices]
return nearest_values, distances
# String matching - match fuzzy
def find_matches_fuzzy(
row,
match_candidates,
limit = 5
):
row_matches = process.extract(
row, dict(enumerate(match_candidates)),
scorer = fuzz.token_sort_ratio,
limit = limit
)
result = [(row, match[0], match[1]) for match in row_matches]
return result
# String matching - TF-IDF
def fuzzy_nn_match(
messy,
clean,
column,
col,
n_neighbors = 100,
limit = 5, **kwargs):
nearest_values, _ = tfidf_nn(messy, clean, n_neighbors, **kwargs)
results = [find_matches_fuzzy(row, nearest_values[i], limit) for i, row in enumerate(messy)]
df = pd.DataFrame(itertools.chain.from_iterable(results),
columns = [column, col, 'Ratio']
)
return df
# String matching - Fuzzy
def fuzzy_tf_idf(
df: pd.DataFrame,
column: str,
clean: pd.Series,
mapping_df: pd.DataFrame,
col: str,
analyzer: str = 'char',
ngram_range: Tuple[int, int] = (1, 3)
) -> pd.Series:
# Create vectorizer
clean = clean.drop_duplicates().reset_index(drop = True)
messy_prep = df[column].drop_duplicates().dropna().reset_index(drop = True).astype(str)
messy = messy_prep.apply(preprocess_string)
result = fuzzy_nn_match(messy = messy, clean = clean, column = column, col = col, n_neighbors = 1)
# Map value from messy to clean
return result

problems with the leaderboard discord.py

The leaderboard shows the same username even if they are different users in case they have the same value.
I don't know how to solve it but when in the code I ask to resist a variable it gives me only 3 elements and not 4 even if 4 come out.
code:
#client.command(aliases = ["lb"])
async def leaderboard(ctx,x = 10):
leader_board = {}
total = []
for user in economy_system:
name = int(user)
total_amount = economy_system[user]["wallet"] + economy_system[user]["bank"]
leader_board[total_amount] = name
total.append(total_amount)
print(leader_board)
total = sorted(total,reverse=True)
embed = discord.Embed(
title = f"Top {x} Richest People",
description = "This is decided on the basis of raw money in the bank and wallet",
color = 0x003399
)
index = 1
for amt in total:
id_ = leader_board[amt]
member = client.get_user(id_)
name = member.name
print(name)
embed.add_field(
name = f"{index}. {name}",
value = f"{amt}",
inline = False
)
if index == x:
break
else:
index += 1
await ctx.send(embed=embed)
print resists this:
{100: 523967502665908227, 350: 554617490806800387, 1100: 350886488235311126}
Padre Mapper
Flore (Orsolinismo)
Aetna
Aetna
In theory there should also be 100: 488826524791734275 (i.e. my user id) but it doesn't find it.
Your problem comes from this line:
leader_board[total_amount] = name
If total_amount is already a key (eg. two users have the same amount of money), it will replace the previous value (which was a user ID) and replace it with another user ID. In this situation, if multiple users have the same amount of money, only one will be saved in leader_board.
Then, you have this line:
total.append(total_amount)
In this case, if two users have the same amount of money, you would just have two identical values, which is normal but, considering the problem above, this will create a shift.
Let's say you have ten users with two of them who have the same amount of money. leader_board will only contain 9 items whereas total will contain 10 values. That's the reason why you have two of the same name in your message.
To solve the problem:
#client.command(aliases = ["lb"])
async def leaderboard(ctx, x=10):
d = {user_id: info["wallet"] + info["bank"] for user_id, info in economy_system.items()}
leaderboard = {user_id: amount for user_id, amount in sorted(d.items(), key=lambda item: item[1], reverse=True)}
embed = discord.Embed(
title = f"Top {x} Richest People",
description = "This is decided on the basis of raw money in the bank and wallet",
color = 0x003399
)
for index, infos in enumerate(leaderboard.items()):
user_id, amount = infos
member = client.get_user(user_id)
embed.add_field(
name = f"{index}. {member.display_name}",
value = f"{amount}",
inline = False
)
await ctx.send(embed=embed)
If I guessed right and your dictionnary is organized like this, it should work:
economy_system = {
user_id: {"bank": x, "wallet": y}
}

R psych::statsBy() error: "'x' must be numeric"

I'm trying to do a multilevel factor analysis using the "psych" package. The first step is recommended to use the statsBy() funtion to have a correlation data:
statsBy(study2, group = "ID")
However, it gives this "Error in FUN(data[x, , drop = FALSE], ...) : 'x' must be numeric".
For the dataset, I only included a grouping variable "ID", and other two numeric variables. I ran the following line to check if the varibales are numeric.
sapply(study2, is.numeric)
ID v1 V2
FALSE TRUE TRUE
Here are the code in the tracedown of the error.But I don't know what 'x' refers here, and I noticed in line 8 and 9, the X is in captital and is lowercase in line 10.
*10.
FUN(data[x, , drop = FALSE], ...)
9.
FUN(X[[i]], ...)
8.
lapply(X = ans[index], FUN = FUN, ...)
7.
tapply(seq_len(728L), list(z = c("5edfa35e60122c277654d35b", "5ed69fbc0a53140e516ad4ed", "5d52e8160ebbe900196e252e", "5efa3da57a38f213146c7352", "5ef98f3df4d541726b1bcc48", "5debb7511e806c2a59cad664", "5c28a4530091e40001ca4d00", "5872a0d958ca4c00018ce4fe", "5c87868eddda2d00012add18", "5e80b7427567f07891655e7e", ...
6.
eval(substitute(tapply(seq_len(nd), IND, FUNx, simplify = simplify)), data)
5.
eval(substitute(tapply(seq_len(nd), IND, FUNx, simplify = simplify)), data)
4.
structure(eval(substitute(tapply(seq_len(nd), IND, FUNx, simplify = simplify)), data), call = match.call(), class = "by")
3.
by.data.frame(data, z, colMeans, na.rm = na.rm)
2.
by(data, z, colMeans, na.rm = na.rm)
1.
statsBy(study2, group = "ID")*
The dataset has 728 rows and those like "5edfa35e60122c277654d35b" are IDs. Could anyone help explain what might have gone wrong?
I had the same error, the only way was to convert the group variable to the numeric class.
Try:
study2$ID<-as.numeric(study2$ID)
statsBy(study2, group = "ID")
If dat$ID is of class character:
study2$ID<-as.numeric(as.factor(study2$ID))
statsBy(study2, group = "ID")

Resources