Perspective API: Proper way to send requests with auto-detection of language - google-api

I am a bit confused on the proper way to send requests using Google's Perspective API.
Sending the following request works:
{"comment":{"text":"yo hamburger"},"languages":["en"],"requestedAttributes":{"TOXICITY":{}}}
In the documentation, it says, "...If you are using a production attribute, language is auto-detected if not specified in the request." So, I tried:
{"comment":{"text":"yo hamburger"},"requestedAttributes":{"TOXICITY":{}}}
And in response, I got a HTTP/1.0 400 Bad Request.
I also tried including all of the languages listed on the documentation page, like this:
{"comment":{"text":"yo hamburger"},"languages":["en","fr","es","de","it","pt"],"requestedAttributes":{"TOXICITY":{}}}
But that also gave me a response of HTTP/1.0 400 Bad Request.
Another attempt was made leaving the array of languages empty, like this:
{"comment":{"text":"yo hamburger"},"languages":[],"requestedAttributes":{"TOXICITY":{}}}
However, it still gave me a response of HTTP/1.0 400 Bad Request.
I was wondering, what is the proper way to send a request to the API and have it auto-detect language?

User x00 provided the path to the solution in the question's comment section. By using curl, I was able to see what was going on.
Here's what was happening:
In this first example, the system worked without error.
CURL:
curl -H "Content-Type: application/json" --data \
'{comment: {text: "yo hamburger"},
languages: ["en"],
requestedAttributes: {TOXICITY:{}} }' \
https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=[API_KEY]
RESPONSE:
{
"attributeScores": {
"TOXICITY": {
"spanScores": [
{
"begin": 0,
"end": 12,
"score": {
"value": 0.050692778,
"type": "PROBABILITY"
}
}
],
"summaryScore": {
"value": 0.050692778,
"type": "PROBABILITY"
}
}
},
"languages": [
"en"
],
"detectedLanguages": [
"tr",
"ja-Latn",
"de",
"en"
]
}
In this second example, the system was indeed auto-detecting language, but since "yo hamburger" was detected as Turkish, it could not provide a solution and instead sent a 400 as the response code.
CURL:
curl -H "Content-Type: application/json" --data \
'{comment: {text: "yo hamburger"},
requestedAttributes: {TOXICITY:{}} }' \
https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=[API_KEY]
RESPONSE:
{
"error": {
"code": 400,
"message": "Attribute TOXICITY does not support request languages: tr",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.commentanalyzer.v1alpha1.Error",
"errorType": "LANGUAGE_NOT_SUPPORTED_BY_ATTRIBUTE",
"languageNotSupportedByAttributeError": {
"detectedLanguages": [
"tr"
],
"attribute": "TOXICITY"
}
}
]
}
}
This next example is more mysterious to me, as the language field for the request is plural, "languages," so it seems you can provide more than one language. However, it said it couldn't support that.
CURL:
curl -H "Content-Type: application/json" --data \
'{comment: {text: "yo hamburger"},
languages:["en","fr","es","de","it","pt"],
requestedAttributes: {TOXICITY:{}} }' \
https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=[API_KEY]
RESPONSE:
{
"error": {
"code": 400,
"message": "Attribute TOXICITY does not support request languages: en,fr,es,de,it,pt",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.commentanalyzer.v1alpha1.Error",
"errorType": "LANGUAGE_NOT_SUPPORTED_BY_ATTRIBUTE",
"languageNotSupportedByAttributeError": {
"requestedLanguages": [
"en",
"fr",
"es",
"de",
"it"
],
"attribute": "TOXICITY"
}
}
]
}
}
In this next example, leaving the languages array empty also provided the auto-detection of language, but again, "yo hamburger" was detected as Turkish, so it could not provide a response.
CURL:
curl -H "Content-Type: application/json" --data \
'{comment: {text: "yo hamburger"},
languages:[],
requestedAttributes: {TOXICITY:{}} }' \
https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=[API_KEY]
RESPONSE:
{
"error": {
"code": 400,
"message": "Attribute TOXICITY does not support request languages: tr",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.commentanalyzer.v1alpha1.Error",
"errorType": "LANGUAGE_NOT_SUPPORTED_BY_ATTRIBUTE",
"languageNotSupportedByAttributeError": {
"detectedLanguages": [
"tr"
],
"attribute": "TOXICITY"
}
}
]
}
}
Noticing that Perspective API would not allow me to choose all of the languages that are provided for the TOXICITY report, I decided to try two languages. The response was the same. Apparently Perspective API rejects the request if multiple languages are specified. Perhaps naming the field "languages" was a thought for the future.
CURL:
curl -H "Content-Type: application/json" --data \
'{comment: {text: "yo hamburger"},
languages: ["en","fr"],
requestedAttributes: {TOXICITY:{}} }' \
https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=[API_KEY]
RESPONSE:
{
"error": {
"code": 400,
"message": "Attribute TOXICITY does not support request languages: en,fr",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.commentanalyzer.v1alpha1.Error",
"errorType": "LANGUAGE_NOT_SUPPORTED_BY_ATTRIBUTE",
"languageNotSupportedByAttributeError": {
"requestedLanguages": [
"en",
"fr"
],
"attribute": "TOXICITY"
}
}
]
}
}

Maybe you're using bad client library or other issue is causing the problem, Here is documentation about client library that in example language is auto-detected without problem. Check that and if not successful provide more details for further investigations.

As I said in the comments, general approach to these kind of issues: use curl. It helps a lot.
To sum up you findings:
auto-detection with a set of languages doesn't seem to work.
the correct way to send a request with auto-detection enabled is
{comment: {text: "some text"}, requestedAttributes: {TOXICITY:{}} }
but sometimes it fails on short texts, especially with slang inside.
So what can be done about it?
The easyest way is to assign some weight to Bad Requests (probably something around 0.5). Anyway, as a response you get the probability and not a definitive answer. So
toxicity score = 1 means "definitely toxic"
toxicity score = 0 means "not toxic at all"
and toxicity score = 0.5 means "we have no idea"
same thing goes for Bad Request - "you have no idea"
and you will get 0.5 from time to time, so you must deal somehow with comments of that score anyway. As well as with network errors etc.
But I would say that a probability of toxicity of a comment that result in LANGUAGE_NOT_SUPPORTED_BY_ATTRIBUTE is higher than 0.5. But it's up to you to decide on the exact number.
As auto-detection doesn't work well with short texts you can bump up probability of correct auto-detection by adding some context into you request: a couple of other comments in the thread, or better yet, a couple of other comments from the same user. Not too big ones and not too small ones.
Make three requests specifying a language. As far as I can tell TOXICITY works only with English, Spanish, and French. On github I've got this reply:
"TOXICITY is currently supported in English (en), Spanish (es), French (fr), German (de), Portuguese (pt), and Italian (it). We will work to remove the contradictions you identified."
Auto-detect by yourself before sending a request. That'll require some effort, but it shouldn't be too hard, given you have much more context available to you than is available to Perspective API (or any other third-party API)
Also
These kind of APIs are not supposed to stay unattended. Fine tuning and moderation on your part is required. Or else we'll end up in the worst-case scenario of algocracy :).
And I think it's a good idea in general to store statistics of toxicity of comments for a user... as well as some manual coefficient. Because for example: Mathematical formulas give high toxicity
I've posted an couple of issues on github, but no reply yet (whating for reply on the second issue). When/If I'll get them I'll update my answer with details.

Related

Get more than one header using the metadataHeaders[] query parameter GmailAPI

When I make a GET http request for the metadataHeaders that only requests one of them like so:
https://gmail.googleapis.com/gmail/v1/users/me/messages/${messageId}?$format=metadata&metadataHeaders=From
it works just fine. But my question is how do I go about sending the array of headers that I want [to, from, subject], in my request? So basically, how would I restructure my metadataHeaders query parameter... found here ->
https://gmail.googleapis.com/gmail/v1/users/me/messages/${messageId}?$format=metadata&metadataHeaders=From
to also contain From, To, & Subject
I have been trying to figure out how to get these headers for quite a while to no avail. I tried looking at the documentation ( https://developers.google.com/gmail/api/reference/rest/v1/users.messages/get ) but although I know its possible thanks to it, I can't seem to find out how to implement it in my http request. I also tried looking through the stack overflow responses to similar questions but many weren't really useful at all since many of the questions were different from mine, using the oauth library, or in a programming language. All I care for is how to make the http request.
Headers is just an array So you can just add it more then once
metadataHeaders=from&metadataHeaders=to&metadataHeaders=subject
Request:
GET https://gmail.googleapis.com/gmail/v1/users/me/messages/185cf8d12166fc7a?format=metadata&metadataHeaders=from&metadataHeaders=to&key=[YOUR_API_KEY] HTTP/1.1
Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
Resonse:
"payload": {
"partId": "",
"headers": [
{
"name": "From",
"value": "[REDACTED]"
},
{
"name": "To",
"value": "[REDACTED]#gmail.com"
},
{
"name": "Subject",
"value": "Security alert"
},
]
},

Few Near Blocks missing in mainnet but present in testnet and it shows as testnet specific blocks. How to get those block details from mainnet

There are few blocks in near which are missing in mainnet but present in testnet and it shows as testnet specific blocks. Please suggest how do we consider these blocks or get these blocks using the api "https://archival-rpc.mainnet.near.org". Below is the scenario for one of the block
If I try to get the block details of the block 73685420 by using the below curl query:
curl --location --request POST 'https://archival-rpc.mainnet.near.org' \
--header 'Content-Type: application/json' \
--data-raw '{
"jsonrpc": "2.0",
"id": "dontcare",
"method": "block",
"params": {
"block_id": 73685420
}
}'
I get the below output
{
"jsonrpc": "2.0",
"error": {
"name": "HANDLER_ERROR",
"cause": {
"info": {},
"name": "UNKNOWN_BLOCK"
},
"code": -32000,
"message": "Server error",
"data": "DB Not Found Error: BLOCK HEIGHT: 73685420 \n Cause: Unknown"
},
"id": "dontcare"
}
But when I searched the above block in testnet explorer I was able to get it.
how to get the details from mainnet?
Testnet Explorer Block
maybe you can find the answer here: Why Blocks are Missing or Skipped on NEAR
Just a little explanation about the info inside the link:
The blocks are very fast in NEAR Protocol, and the transactions should resolve quickly, sometimes is the specifict validator answer late the block is skipped and the transactions resolved in next, this works as expected.
But thats the reason of the skipped blocks.

Near mainnet api: Error Block Missing (unavailable on the node)

I was testing near apis and only a few endpoints are working as expected.
https://rpc.mainnet.near.org
I was trying to fetch the block by id and it was throwing this error.
{
"jsonrpc": "2.0",
"error": {
"code": -32000,
"message": "Server error",
"data": "Block Missing (unavailable on the node): BBht2EZwfrGrucZKUuW91tMctfE3rMsUQJcFSduTRCGR \n Cause: Unknown"
},
"id": "dontcare"
}
The final block call is working and it is even working for few 50 blocks back but for old blocks it is throwing above error.
Is there any range of blocks this api supports?
Can I rely on this api to fetch historical data?
curl request
curl --location --request POST 'https://rpc.mainnet.near.org' --header 'Content-Type: application/json' --data-raw '{
"jsonrpc": "2.0",
"id": "dontcare",
"method": "block",
"params": {
"block_id": 33929500
}
}'
This block was garbage collected. Regular nodes only maintain blocks for the last 5 epochs, if you need historical data you should query instead archival nodes (https://archival-rpc.mainnet.near.org)
See this answer for more details https://stackoverflow.com/a/67199078/4950797

PUT vs POST when adding documents in elastic search

I am new to Elasticsearch and trying to add documents in elastic index. I have got confused between PUT and POST here as both are producing same results in below scenario:
curl -H "Content-Type: application/json" -XPUT "localhost:9200/products/mobiles/1?pretty" -d"
{
"name": "iPhone 7",
"camera": "12MP",
"storage": "256GB",
"display": "4.7inch",
"battery": "1,960mAh",
"reviews": ["Incredibly happy after having used it for one week", "Best iPhone so far", "Very expensive, stick to Android"]
}
"
vs
curl -H "Content-Type: application/json" -XPOST "localhost:9200/products/mobiles/1?pretty" -d"
{
"name": "iPhone 7",
"camera": "12MP",
"storage": "256GB",
"display": "4.7inch",
"battery": "1,960mAh",
"reviews": ["Incredibly happy after having used it for one week", "Best iPhone so far", "Very expensive, stick to Android"]
}
"
POST :used to achieve auto-generation of ids.
PUT :used when you want to specify an id.
see this
They both are among safe methods of HTTP.
usually we use POST to create a resource and PUT to modify that. besides if you're free to set-up the server side, you can use both of them because they both have similar properties like: they both have body, they are safe, data is not shown in URL, and ....
though it is better to consider standard rules that I said one of them before:
usually we use POST to create a resource and PUT to modify that. this way your code is more readable, changeable ...
for going deeper you can consider these tips according to put-versus-post:
Deciding between POST and PUT is easy: use PUT if and only if the endpoint will follow these 2 rules:
The endpoint must be idempotent: so safe to redo the request over and over again;
The URI must be the address to the resource being updated.
When we use PUT, we’re saying that we want the resource that we’re sending in our request to be stored at the given URI. We’re literally “putting” the resource at this address.
The only different between POST and PUT is that you cannot use PUT to create documents with auto ID generation.
The following query will create a document and auto generate an ID:
POST /products/_doc
{
"name": "Shoes",
"price": 100,
"in_stock": 64
}
Trying the same with PUT results to an "Incorrect HTTP method".
PUT /products/_doc
{
"name": "Shoes",
"price": 100,
"in_stock": 64
}
Unless I didn't experiment hard enough, this is the only difference between POST and PUT when creating documents.
Other than this, POST and PUT will get you to achieve the same things.

How do you create an entity using an api call in DialogFlow [formerly api.ai]

I am trying to create a chatbot in DialogueFlow. In the docs it says
You can create your own entities for your agents, either through web forms, uploading them in JSON or CSV formats, or via API calls.
How do I create an entity using an API call?
Send a POST request! Dialogflow has good REST endpoints.
curl -X POST \
'https://api.dialogflow.com/v1/entities?v=20150910' \
-H 'Authorization: Bearer YOUR_DEVELOPER_ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
--data '{
"entries": [{
"synonyms": ["apple", "red apple"],
"value": "apple"
},
{
"value": "banana"
}
],
"name": "fruit"
}'
From the docs.
this is exactly what I was looking for.
But I've just spent a couple of hours googling trying to discover how can I send this curl POST and unfortunately I didn't find nothing that can help me.
If someone can give a light here I be very happy.
Some details:
I communicate with my chatbot throw a python Flask server, this means that I am using the python SDK.
In which part of the code in the server should I make this request?
Here is the solution that I found:
import os.path
import sys
import requests
import json
DEVELOPER_ACCESS_TOKEN = 'your developer token'
def sending_entities():
# 1 DEFINE THE URL
url = 'https://api.dialogflow.com/v1/entities?v=20150910'
# 2 DEFINE THE HEADERS
headers = {'Authorization': 'Bearer '+DEVELOPER_ACCESS_TOKEN,'Content-Type': 'application/json'}
# 3 CREATE THE DATA
data = json.dumps({
"name": "fruit",
"entries": [
{
"synonyms": ["apple", "red apple"],
"value": "apple"
},
{
"value": "banana"
}
]
})
# 4 MAKE THE REQUEST
response = requests.post(url,headers=headers,data=data)
print (response.json)

Resources