Stanford NLP core not producing 'Before' and 'After' nodes in tokenization - stanford-nlp

Running the current version Stanford CoreNLP 3.9.1 (updated 2018/04/05) as http server.
Using the default .properties that comes with each different language download:
The French output is missing the 'before' and 'after' nodes.
"tokens": [
{
"index": 1,
"word": "Je",
"originalText": "Je",
"characterOffsetBegin": 0,
"characterOffsetEnd": 2,
"pos": "PRON"
},
Whereas all the other Latin languages available e.g. German - include the nodes.
{
"index": 2,
"word": "durchgecheckt",
"originalText": "durchgecheckt",
"characterOffsetBegin": 10,
"characterOffsetEnd": 23,
"pos": "VVPP",
"before": " ",
"after": " "
},
How to do set it, to include them in the output?

Do you have the same issue with version 3.9.2 ?
I think in 3.9.2 you want these options:
tokenize.options = ellipses=ptb3,normalizeParentheses=true,ptb3Dashes=false,splitContractions=true,splitCompounds=true,invertible=true"
We're trying to release version 4.0.0 fairly soon with new tokenization for French that is designed to match modern Universal Dependencies data sets.

Related

Yahoo finance symbol suggest no longer working?

result = requests.get('http://d.yimg.com/autoc.finance.yahoo.com/autoc?query=tesla&callback=YAHOO.Finance.SymbolSuggest.ssCallback').json()
result
When I run the python code above, I get <Response [404]>, does anyone know why that might be? I am worried this API no longer works even though I saw posts from just a year ago talking about it working.
If it's not documented, you can't rely on it working. The only (relatively) sure bet is to use some official API instead (which usually comes at a cost).
That said, if you want to continue using undocumented stuff (with the same risk of it getting shut down or you getting blocked any day), give this a try:
https://query2.finance.yahoo.com/v1/finance/search?q=tesla
(I looked at what https://finance.yahoo.com/ uses.)
This delivers results like these:
{
"explains": [],
"count": 15,
"quotes": [
{
"exchange": "NMS",
"shortname": "Tesla, Inc.",
"quoteType": "EQUITY",
"symbol": "TSLA",
"index": "quotes",
"score": 2048451,
"typeDisp": "Equity",
"longname": "Tesla, Inc.",
"exchDisp": "NASDAQ",
"isYahooFinance": true
},
{
"exchange": "NEO",
"shortname": "TESLA, INC. CDR (CAD HEDGED)",
"quoteType": "EQUITY",
"symbol": "TSLA.NE",
"index": "quotes",
"score": 24083,
"typeDisp": "Equity",
"longname": "Tesla, Inc.",
"exchDisp": "NEO",
"isYahooFinance": true
},
// ...
],
"news": [
// Also delivers news here...
],
// Some meta stuff here
}
Use at your own risk.

How do I know which rule is active or inactive with sonarqube web api?

all.
I use sonarqube webapi /api/rules/show?key=squid:S4087&actives=true to get detailed information about a rule. The result is json format data, which field present the rule is active or inactive. Anyone can help? I want to know if the rule is active or inactive, is there any other way to do this?
Sonarqube version is Version 6.7 (build 33306).
You can use the following:
api/rules/search?rule_key=squid:S4087&f=actives
Result is:
{
"total": 1,
"p": 1,
"ps": 100,
"rules": [
{
"key": "squid:S4087",
"type": "CODE_SMELL"
}
],
"actives": {
"squid:S4087": [
{
"qProfile": "AWWHfPzOrB_d62qUtqCX",
"inherit": "NONE",
"severity": "MINOR",
"params": [],
"createdAt": "2018-08-29T23:00:39+0200"
}
]
},
"qProfiles": {
"AWWHfPzOrB_d62qUtqCX": {
"name": "Sonar way",
"lang": "java",
"langName": "Java"
}
}
}

Microsoft LUIS builtin.number

I used builtin.number in my LUIS app trying to collect a 4 digit pin number. The following is what's returned from LUIS when my input is "one two three four".
"entities": [
{
"entity": "one",
"type": "builtin.number",
"startIndex": 0,
"endIndex": 2,
"resolution": {
"value": "1"
}
},
{
"entity": "two",
"type": "builtin.number",
"startIndex": 4,
"endIndex": 6,
"resolution": {
"value": "2"
}
},
{
"entity": "three",
"type": "builtin.number",
"startIndex": 8,
"endIndex": 12,
"resolution": {
"value": "3"
}
},
{
"entity": "four",
"type": "builtin.number",
"startIndex": 14,
"endIndex": 17,
"resolution": {
"value": "4"
}
},
As you can see, it's returning individual digits in both text and digit format. Seems to me that it's more important to return the whole digit than the individual ones. Is there a way to do it so that I get '1234' as result for builtin.number?
Thanks!
It's not possible to do what you're asking for by only using LUIS. The way LUIS does its tokenization is that it recognizes each word/number individually due to the whitespace. It goes without saying that 'onetwothreefour' will also not return 1234.
Additionally, users are unable to modify the recognition of the prebuilt entities on an individual model level. The recognizers for certain languages are open-source, and contributions from the community are welcome.
All of that said, a way you could achieve what you're asking for is by concatenating the numbers. A JavaScript example might be something like the following:
var pin = '';
entities.forEach(entity => {
if (entity.type == 'builtin.number') {
pin += entity.resolution.value;
}
}
console.log(pin); // '1234'
After that you would need to perform your own handling/regexp, but I'll leave that to you. (after all, what if someone provides "seven eight nine ten"? Or "twenty seventeen"?)

SonarQube Component Tree response data

I'm having some trouble understanding some of the data in the response from the SonarQube GET api/measures/component_tree API.
Some metrics have a value attribute while others don't. I've figured out that the value displayed in the UI is the "value" unless it does not exist, then the value at the earliest period is used. The other periods are then basically deltas between measurements. Would anyone be able to provide some details around what the response values actually mean? Unfortunately, the actual API documentation that SonarQube provides doesn't give any detail around response data. Specifically, I'm wondering when a value attribute would and would not be there, what the index means since not all have the same indexes (ie. some go 1-4, others have just 3,4), and what the period data represents.
{
"metric": "new_lines_to_cover",
"periods": [
{
"index": 1,
"value": "572"
},
{
"index": 2,
"value": "572"
},
{
"index": 3,
"value": "8206"
},
{
"index": 4,
"value": "186574"
}
]
},
{
"metric": "duplicated_lines",
"value": "80819",
"periods": [
{
"index": 1,
"value": "-158"
},
{
"index": 2,
"value": "-158"
},
{
"index": 3,
"value": "-10544"
},
{
"index": 4,
"value": "-6871"
}
]
},
{
"metric": "new_line_coverage",
"periods": [
{
"index": 3,
"value": "3.9900249376558605"
},
{
"index": 4,
"value": "17.221615720524017"
}
]
},
The heuristic is very close from the truth:
if the metric starts with "new_", it means it's a metric that compute new elements on a period of time. Starting with 6.3, only the leak period is supported
otherwise, the "value" represents the raw value.
For example, to compute the number of issues:
violations computes the total number of issues
new_violations computes the number of new issues on the leak period
To know more about the leak period concept in SonarQube, please check this article.

ElasticSearch _Source is always empty on the return

I am posting a query to http://localhost:9200/movie_db/movie/_search but _source attribute is always empty on the return resposne. I made it enabled but that doesn't help.
Movie DB:
TRY DELETE /movie_db
PUT /movie_db {"mappings": {"movie": {"properties": {"title": {"type": "string", "analyzer": "snowball"}, "actors": {"type": "string", "position_offset_gap" : 100, "analyzer": "standard"}, "genre": {"type": "string", "index": "not_analyzed"}, "release_year": {"type": "integer", "index": "not_analyzed"}, "description": {"_source": true, "type": "string", "analyzer": "snowball"}}}}}
BULK INDEX movie_db/movie
{"_id": 1, "title": "Hackers", "release_year": 1995, "genre": ["Action", "Crime", "Drama"], "actors": ["Johnny Lee Miller", "Angelina Jolie"], "description": "High-school age computer expert Zero Cool and his hacker friends take on an evil corporation's computer virus with their hacking skills."}
{"_id": 2, "title": "Johnny Mnemonic", "release": 1995, "genre": ["Science Fiction", "Action"], "actors": ["Keanu Reeves", "Dolph Lundgren"], "description": "A guy with a chip in his head shouts incomprehensibly about room service in this dystopian vision of our future."}
{"_id": 3, "title": "Swordfish", "release_year": 2001, "genre": ["Action", "Crime"], "actors": ["John Travolta", "Hugh Jackman", "Halle Berry"], "description": "A cast of characters challenge society's commonly held view that computer experts are not the beautiful people. Somehow, the CIA is hacked in under 5 minutes."}
{"_id": 4, "title": "Tomb Raider", "release_year": 2001, "genre": ["Adventure", "Action", "Fantasy"], "actors": ["Angelina Jolie", "Jon Voigt"], "description": "The story of a girl and her quest for antiquities in the face of adversity. This epic is adapter from its traditional video-game format to the big screen"}
Query:
{
"query" :
{
"term" : { "genre" : "Crime" }
},
}
Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
"_index": "movie_db",
"_type": "movie",
"_id": "3",
"_score": 0.30685282,
"_source": {}
},
{
"_index": "movie_db",
"_type": "movie",
"_id": "1",
"_score": 0.30685282,
"_source": {}
}
]
}
}
I had the same problem: despite enabling _source in my query as well as in my mappings, _source would always be {}.
Your proposed solution of setting cluster.name in elasticsearch.yml gave me the hint that the problem must be some hidden setting in the old cluster.
I found out that I had an index template definition that came with a plugin I installed (in my case elasticsearch-transport-couchbase), which said
"_source" : {
"includes" : [ "meta.*" ]
},
thereby implicitely excluding all fields other than meta.* from source.
Check your templates like this:
curl -XGET localhost:9200/_template/?pretty
I deleted the couchbase template like so
curl -XDELETE localhost:9200/_template/couchbase
and created a new, almost identical one but with source enabled.
Here is how:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
Solution:
In elasticsearch config folder, open elasticsearch.yml and set cluster.name to a different value, then restart elasticsearch.bat
I once accidentally passed a single field in source array and that too didn't exist. Just for example "_source": ["bazinga"] and in the aggregations result source was empty.
So maybe you could simple pass a totally unrelated string into the _source array. This can be a better solution instead of making changes in the elasticsearch.yml file.

Resources