Streamlit Unhashable TypeError when i use st.cache - huggingface-transformers

when i use the st.cache decorator to cash hugging-face transformer model i get
Unhashable TypeError
this is the code
from transformers import pipeline
import streamlit as st
from io import StringIO
#st.cache(hash_funcs={StringIO: StringIO.getvalue})
def model() :
return pipeline("sentiment-analysis", model='akhooli/xlm-r-large-arabic-sent')

after searching in issues section in streamlit repo
i found that hashing argument is not required , just need to pass this argument
allow_output_mutation = True

This worked for me:
from transformers import pipeline
import tokenizers
import streamlit as st
import copy
#st.cache(hash_funcs={tokenizers.Tokenizer: lambda _: None, tokenizers.AddedToken: lambda _: None})
def get_model() :
return pipeline("sentiment-analysis", model='akhooli/xlm-r-large-arabic-sent')
input = st.text_input('Text')
bt = st.button("Get Sentiment Analysis")
if bt and input:
model = copy.deepcopy(get_model())
st.write(model(input))
Note 1:
calling the pipeline with input model(input) changes the model and we shouldn't change a cached value so we need to copy the model and run it on the copy.
Note 2:
First run will load the model using the get_model function next run will use the chace.
Note 3:
You can read more about Advanced caching in stremlit in thier documentation.
Output examples:

Related

ljspeech Hugging Face examples not working

When trying to run the ljspeech example, I get the following error, even when the model is moved to the only GPU in the system. I am using Cuda 11.7, Pytorch 1.13.1, and Fairseq 0.12.2.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
The code used:
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd
import torch
models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
"facebook/fastspeech2-en-ljspeech",
arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0].to(torch.device('cuda'))
models[0] = model
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(models, cfg)
text = "Hello, this is a test run."
sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)
ipd.Audio(wav, rate=rate)

python-telegram-bot for (v20.x) TypeError: bad operand type for unary ~: 'type'

I am trying to build a telegram bot, I have just reproduce the code:
from telegram.ext import MessageHandler
from telegram.ext import filters
from telegram.ext import Application
from telegram import Update
from telegram.ext import ContextTypes
from decouple import config
def testFunc (update: Update, context: ContextTypes.DEFAULT_TYPE):
print('Hi')
def main():
BOT_TOKEN = config('TELE_BOT_API_KEY_2')
application = Application.builder().token(BOT_TOKEN).build()
application.add_handler(MessageHandler(filters.Text & ~filters.Command, testFunc))
application.run_polling()
if __name__ == '__main__':
main()
The error this code shows is:
Bot\AsyncAdvanceBot\test3.py", line 16, in main
application.add_handler(MessageHandler(filters.Text & ~filters.Command, testFunc))
TypeError: bad operand type for unary ~: 'type'
I am using python-telegram-bot api v20.x
I know this might be a naive problem that I might be missing.
Thanks!
I tried changing the code to different format but it doesn't work.
I got it! It was as I said naive error. I was not thinking straight😅
I have seen the document even earlier and was only focusing on making filters.Command to filters.COMMAND, but forgot to change filters.Text to filters.TEXT.
just replaced
filters.Text & ~filters.Command
with
filters.TEXT & ~filters.COMMAND

Error when importing sklearn in pipeline component

When I run this simple pipeline (in GCP's Vertex AI Workbench) I get an error:
ModuleNotFoundError: No module named 'sklearn'
Here is my code:
from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, component
from google.cloud import aiplatform
#component(
packages_to_install=["sklearn"],
base_image="python:3.9",
)
def test_sklearn():
import sklearn
#pipeline(
pipeline_root=PIPELINE_ROOT,
name="sklearn-pipeline",
)
def pipeline():
test_sklearn()
compiler.Compiler().compile(pipeline_func=pipeline, package_path="sklearn_pipeline.json")
job = aiplatform.PipelineJob(
display_name=PIPELINE_DISPLAY_NAME,
template_path="sklearn_pipeline.json",
pipeline_root=PIPELINE_ROOT,
location=REGION
)
job.run(service_account=SERVICE_ACCOUNT)
What do I do wrong? :)
It seems that the package name sklearn does not work after a version upgrade.You need to change the value of packages_to_install from "sklearn" to "scikit-learn" in the #component block.

Huggingface reformer for long document summarization

I understand reformer is able to handle a large number of tokens. However it does not appear to support the summarization task:
>>> from transformers import ReformerTokenizer, ReformerModel
>>> from transformers import pipeline
>>> summarizer = pipeline("summarization", model="reformer")
404 Client Error: Not Found for url: https://huggingface.co/reformer/resolve/main/config.json
...
How would you construct the pipeline "manually" to use reformer for summarization?
Try this:
summarizer = pipeline("summarization", model="google/reformer-enwik8")
via here.
However, this produces...
/lib/python3.7/site-packages/sentencepiece.py", line 177, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

Why is the output file from Biopython not found?

I work with a Mac. I have been trying to make a multiple sequence alignment in Python using Muscle. This is the code I have been running:
from Bio.Align.Applications import MuscleCommandline
cline = MuscleCommandline(input="testunaligned.fasta", out="testunaligned.aln", clwstrict=True)
print(cline)
from Bio import AlignIO
align = AlignIO.read(open("testunaligned.aln"), "clustal")
print(align)
I keep getting the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'testunaligned.aln'
Does anyone know how I could fix this? I am very new to Python and computer science in general, and I am totally at a loss. Thanks!
cline in your code is an instance of MuscleCommandline object that you initialized with all the parameters. After the initialization, this instance can run muscle, but it will only do that if you call it. That means you have to invoke cline()
When you simply print the cline object, it will return a string that corresponds to the command you can manually run on the command line to get the same result as when you invoke cline().
And here the working code:
from Bio.Align.Applications import MuscleCommandline
cline = MuscleCommandline(
input="testunaligned.fasta",
out="testunaligned.aln",
clwstrict=True
)
print(cline)
cline() # this is where mucle runs
from Bio import AlignIO
align = AlignIO.read(open("testunaligned.aln"), "clustal")
print(align)

Resources