How to remove input from from generated text in GPTNeo?

How to remove input from from generated text in GPTNeo? - huggingface-transformers

I'm writing a program to generate text...
I need to remove the input from the generated text. How can I do this?
The code:
input_ids = tokenizer(context, return_tensors="pt").input_ids
gen_tokens = model.generate(
input_ids,
do_sample=True,
temperature=0.8,
top_p=0.9)
strs = tokenizer.batch_decode(gen_tokens)[0]
Here the strs contains the input I've given...
How to remove that?

The Transformers library does not provide you with a way to do it, but this is something you can easily achieve with 1 line of code:
strs = strs.replace(context,"")
This is actually what I'm doing behind my NLP Cloud API as it uses Transformers behind the hood.

Related

How to convert json into data type that is supported by Azure Form Recognizer

I'd like to convert a json into the data type that is supported by Azure Form Recognizer. I'm able to convert the data type into a dic and then into a json but I'm not able to do the opposite without analysing once again the document. How could I use the data type supported by Azure Form Recognizer without having to analyse the document more than one time?
Here is what I have.
endpoint = "endpoint"
key = "key"
# create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
document_analysis_client = DocumentAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))
# Extract text from doc using "prebuilt-document"
with open("file.pdf", "rb") as f:
poller = document_analysis_client.begin_analyze_document(
"prebuilt-document", document=f)
result = poller.result()
import json
form_pages = poller.result()
d = form_pages.to_dict()
json_string = json.dumps(d)
print(json_string)
data = json.loads(json_string)
poller1 = form_pages.from_dict(data)

What's the scenario for converting the JSON model representation back to the SDK model? The operations don't take the result models as an input, as a solution the original result model be stored somewhere until it needs to be used again in that case.
Also, for converting the model to JSON, it would be better to use the AzureJSONEncoder to that the SDK can properly serialize all types. For example:
from azure.core.serialization import AzureJSONEncoder
# save the dictionary as JSON content in a JSON file, use the AzureJSONEncoder
# to help make types, such as dates, JSON serializable
# NOTE: AzureJSONEncoder is only available with azure.core>=1.18.0.
with open('data.json', 'w') as f:
json.dump(analyze_result_dict, f, cls=AzureJSONEncoder)
Here is a link to the full sync sample: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/samples/v3.2/sample_convert_to_and_from_dict.py

Should I adjust the weights of embedding of newly added tokens?

I'm a beginner of neural language processing. Recenttly, I try to train a text generation model based on GPT-2 with huggingface transformers. I added some new tokens to the tokenizer and resize the embedding of the model with model.resize_token_embeddings(len(tokenizer)). Suppose I added 6 new tokens, should I add the weights of the 6 tokens to the optimizer? How should I do it? Thank you very much!

Just call the resize_token_embeddings function:
gpt2_tokenizer = AutoTokenizer.from_pretrained('gpt2')
ATTR_TO_SPECIAL_TOKEN = {'additional_special_tokens': ['SPEC1', 'SPEC2']}
orig_num_tokens = len(gpt2_tokenizer)
num_added_tokens = gpt2_tokenizer.add_special_tokens(ATTR_TO_SPECIAL_TOKEN) # doesn't add if they are already there
if num_added_tokens > 0:
gpt2_model.resize_token_embeddings(new_num_tokens=orig_num_tokens + num_added_tokens)

Count and print images from URL

This is my first time using Spark/Scala and I am lost.
I am suppose to write a program that takes in a URL and outputs the number of images and the name of the image file.
So I was able to get the image count. I am doing this all in the command prompt which is making it quite difficult to go back and edit my def without out retyping the whole thing. Is there a better alternative. It took me quite a while just to get Spark/Scala working (I would of like to u PySpark but was unable to get them to communicate)
scala> def URLcount(url : String) : String = {
| var html = scala.io.Source.fromURL(url).mkString
| var list = html.split("\n").filter(_ != "")
| val rdds = sc.parallelize(list)
| val count = rdds.filter(_.contains("img")).count()
| return("There are " + count + " images at the " + url + " site.")
| }
URLcount: (url: String)String
scala> URLcount("https://www.yahoo.com/")
res14: String = There are 9 images at the https://www.yahoo.com/ site.
So I'm assuming after I parallelize the list I should be about to apply a filter and create a list of all the strings that contain "img src"
How would I create such list and then print it line by line to display the image urls?

I don't sure it is great solution for parsing HTML via Spark. I think that Spark created for big data (while it is general purpose). I did not find any easy way to parse HTML through Spark (but I easy find it for both XML and JSON). It is mean that in this case you will print a very long string, because HTML pages are often compressed. Anyway, for this page your program will print lines like this:
<p>So I'm assuming after I parallelize the list I should be about to apply a filter and create a list of all the strings that contain "img src"
I can advice you use Jsoup:
val yahoo = Jsoup.connect("https://www.yahoo.com").get
val images = yahoo.select("img[src]")
images.forEach(println)
You can use Spark for other purposes.
PS: I found 39 image tags with src attribute on https://www.yahoo.com. It is very easy to got error if you don't use good HTML parser.
Another way: prepare your data and than use Spark.
Sorry for my English.

Sorting record in Adobe Air

Ok probably barking up the wrong tree with this one but some guidance would be nice!
Currently got an app that exports data to a text file
stream.open(file, FileMode.APPEND);
stream.writeUTFBytes(data1 + data2);
stream.close();
and then use the following to import that data
var textloader:URLLoader = URLLoader(event.target);
MyTextFile_txt.text = textloader.data;
Now is there anyway of sorting this information (for example put it in order of data2 records)? I know sorting from a textfile is probably a little difficult. Would there be a better way of exporting the file instead? Or when importing the file can I get it to import into a specific text box.
Dunno just throwing some ideas out.

Although not essential you can use stream.readUTFBytes instead of URLLoader.
Regarding sorting data you can add all the loaded data into an array and then use sort() on the array.
e.g.
var someArray:Array = [];
for (var i:int; i < loadedData.xmlNodeName.length; i++) {
someArray.push(loadedData.xmlNodeName[i]);
}
someArray.sort();
http://help.adobe.com/en_US/ActionScript/3.0_ProgrammingAS3/WS5b3ccc516d4fbf351e63e3d118a9b90204-7fa4.html

Creating barcode images

I need to send emails out to several thousand customers with a unique barcode present so they can redeem it either instore or online.
We have a list of coupon/barcode codes to use and have a way to dynamically pull these codes into the email so a customer will see a unique code. The problem is I need to somehow generate several thousand barcode images that are created using the unique codes. How can I solve this?
This would be perfect if our email marketing company had this functionality but unfortunately they don't:
http://www.emaildirect.com/blog/2011/11/create-unique-barcodes-with-emaildirect/
Any help would be greatly appreciated.
I have found my answer!
By using the barcode generator www.barcodesinc.com I generated a URL and input this into my email.
Eg: http://www.barcodesinc.com/generator/image.php?code=999999999&style=197&type=C128B&width=200&height=50&xres=1&font=3
I then changed the 999999999 in the URL to my conditional code to change to the specific code for that person and also bring back the barcode image for that code too!

I have found my answer!
By using the barcode generator www.barcodesinc.com I generated a URL and input this into my email.
Eg: http://www.barcodesinc.com/generator/image.php?code=999999999&style=197&type=C128B&width=200&height=50&xres=1&font=3
I then changed the 999999999 in the URL to my conditional code to change to the specific code for that person and also bring back the barcode image for that code too!

<img src="http://qrfree.kaywa.com/?s=8&d=your+text+here" alt="QRCode"/>
OR
http://qrfree.kaywa.com/?s=8&d=your+text+here

I'm no expert on this and haven't touched html but you could serialize each image and follow this example that has some sample code on QR code given a string.
Imports ThoughtWorks.QRCode.Codec
Dim objQRCode As QRCodeEncoder = New QRCodeEncoder()
Dim imgImage As Image
Dim objBitmap As Bitmap
objQRCode.QRCodeEncodeMode = QRCodeEncoder.ENCODE_MODE.BYTE
objQRCode.QRCodeScale = 2
objQRCode.QRCodeVersion = 5
objQRCode.QRCodeErrorCorrect = ThoughtWorks.QRCode.Codec.QRCodeEncoder.ERROR_CORRECTION.L
imgImage = objQRCode.Encode("Test Data")
objBitmap = New Bitmap(imgImage)
objBitmap.Save("C:\QRCode.jpg")

Hi Try getting in touch with http://www.linktagger.com and ask if they can help. They provide Enterprise type services for the city where I live on maps and bus terminals so it might help you.

Here is an working example to generate barcode for the array of barcodes.We can
retrieve thousands of barcodes from csv file using pandas as well.
This example calls API and save the response in image(.png format) obtained as response from API call.
import shutil
import requests
data = [11111111111, 22222222222222222, 33333333333333, 4444444444444]
url = 'https://www.barcodesinc.com/generator_files/' + 'image.php?'
for d in data:
params = {
'code': d,
'style': '197',
'type': 'C128B',
'width': '200',
'height': '50',
'xres': '1',
'font': '3',
}
response = requests.get(url, params, stream=True)
with open('image-%s.png' % d, 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to remove input from from generated text in GPTNeo? - huggingface-transformers

The Transformers library does not provide you with a way to do it, but this is something you can easily achieve with 1 line of code: strs = strs.replace(context,"") This is actually what I'm doing behind my NLP Cloud API as it uses Transformers behind the hood.

Related

How to convert json into data type that is supported by Azure Form Recognizer

Should I adjust the weights of embedding of newly added tokens?

Count and print images from URL

Sorting record in Adobe Air

Creating barcode images

Categories

Resources