How do I add synonyms in regexp.json? - watson-assistant-solutions

After I added the following synonyms in regexp.json, regexp engine has failed to start.
How is the rule of adding synonyms in regexp.json?
{
"intents" : [
{
"name" : ["greetings"],
"grammar" : [
"morning"
]
}
],
"entities" : {
},
"synonyms" : [
"good-bye","hello"
]
}

synonyms should be a list of arrays within an array. Try this
"synonyms" : [
[ "goodbye", "bye", "bye bye", "bye now","ok bye","then bye","bye then", "adieu", "adios", "au","ciao", "toodles" ],
[ "hi", "hello", "aloha", "bonjour", "buenous", "greetings", "Hey", "heya", "Hola", "yello","yo"]
]
Does that work?

Related

To index geojson data to elasticsearch using curl

I'd like to index a geojson data to elasticsearch using curl
The geojson data looks like this:
{
"type": "FeatureCollection",
"name": "telco_new_development",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "ogc_fid": 1, "name": "Yarrabilba", "carrier_name": "OptiComm", "uid": "35", "development_name": "Yarrabilba", "stage": "None", "developer_name": "Refer to Carrier", "development_nature": "Residential", "development_type": "Sub-division", "estimated_number_of_lots_or_units": "18500", "status": "Ready for service", "developer_application_date": "Check with carrier", "contract_date": "TBC", "estimated_service_date": "30 Jul 2013", "technology_type": "FTTP", "last_modified_date": "8 Jul 2020" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 153.101112, -27.797998 ], [ 153.09786, -27.807122 ], [ 153.097715, -27.816313 ], [ 153.100598, -27.821068 ], [ 153.103789, -27.825047 ], [ 153.106079, -27.830225 ], [ 153.108248, -27.836107 ], [ 153.110692, -27.837864 ], [ 153.116288, -27.840656 ], [ 153.119923, -27.844818 ], [ 153.122317, -27.853523 ], [ 153.127785, -27.851777 ], [ 153.131234, -27.85115 ], [ 153.135634, -27.849741 ], [ 153.138236, -27.848668 ], [ 153.141703, -27.847075 ], [ 153.152205, -27.84496 ], [ 153.155489, -27.843381 ], [ 153.158613, -27.841546 ], [ 153.161937, -27.84059 ], [ 153.156361, -27.838492 ], [ 153.157097, -27.83451 ], [ 153.15036, -27.832705 ], [ 153.151126, -27.827536 ], [ 153.15169, -27.822564 ], [ 153.148492, -27.820801 ], [ 153.148375, -27.817969 ], [ 153.139019, -27.815804 ], [ 153.139814, -27.808556 ], [ 153.126486, -27.80576 ], [ 153.124679, -27.803584 ], [ 153.120764, -27.802953 ], [ 153.121397, -27.797353 ], [ 153.100469, -27.79362 ], [ 153.099828, -27.793327 ], [ 153.101112, -27.797998 ] ] ] ] } },
{ "type": "Feature", "properties": { "ogc_fid": 2, "name": "Elliot Springs", "carrier_name": "OptiComm", "uid": "63", "development_name": "Elliot Springs", "stage": "None", "developer_name": "Refer to Carrier", "development_nature": "Residential", "development_type": "Sub-division", "estimated_number_of_lots_or_units": "11674", "status": "Ready for service", "developer_application_date": "Check with carrier", "contract_date": "TBC", "estimated_service_date": "29 Nov 2018", "technology_type": "FTTP", "last_modified_date": "8 Jul 2020" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 146.862725, -19.401424 ], [ 146.865987, -19.370253 ], [ 146.872767, -19.370901 ], [ 146.874484, -19.354706 ], [ 146.874913, -19.354301 ], [ 146.877059, -19.356811 ], [ 146.87972, -19.35835 ], [ 146.889161, -19.359321 ], [ 146.900062, -19.367581 ], [ 146.884955, -19.38507 ], [ 146.88341, -19.402558 ], [ 146.862725, -19.401424 ] ] ] ] } },
...
However, my curl is returns an error called The bulk request must be terminated by a newline [\\n]
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/geo/building/_bulk?pretty' --data-binary #building.geojson
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\\n]"
}
],
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\\n]"
},
"status" : 400
}
Any suggestion?
your format is not suitable for _bulk like that, as it's missing the structure it expects. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html goes into that
you need;
to update your json file to have something like { "index" : { "_index" : "INDEX-NAME-HERE" } } before each of the documents
each document also needs to be on a single line
each line needs a \n at the end of it so that the bulk API knows when the action/record ends

How to add a custom key-value pair to a grok pattern?

How can I add a custom key-value pair to a grok pattern?
For example, I would like to add a key-value pair of "city": [["New York]] to the data result, even though it doesn't exist in the log line.
How do I do this? Tyvm, Keith :^)
Complete, Minimal, and Verifiable Example
Data:
WARN 10/11/2017 kmiklas
Grok:
%{WORD:logLevel}\s%{DATE:date}\s%{USER:user}
{
"logLevel": [
[
"WARN"
]
],
"date": [
[
"10/11/2017"
]
],
"DATE_US": [
[
"10/11/2017"
]
],
"MONTHNUM": [
[
"10",
null
]
],
"MONTHDAY": [
[
"11",
null
]
],
"YEAR": [
[
"2017",
null
]
],
"DATE_EU": [
[
null
]
],
"user": [
[
"kmiklas"
]
],
"USERNAME": [
[
"kmiklas"
]
]
}
I understand that will be a fixed field, so you need to use the mutate method, like this:
mutate { add_field => { "city" => [["New York"]] } }
if you want the new field to be only in some logs you need to include if
if "some_test" in [message]{mutate.....}

Step function cloud formation issue with Fn::Sub when passing list as second parameter

I am trying to create step function using cloud formation. I want to pass the lambda arns as second argument to Fn::Sub function. It works if I pass just one Arn but fails when I pass multiple. (with Fn::Get). I checked the template with Yml validator and did not see any issues.
Cloud formation template definition for Step:
---
Resources:
ContractDraftStateMachine:
Type: "AWS::StepFunctions::StateMachine"
Properties:
RoleArn:
Fn::GetAtt: [ StepFunctionExecutionRole, Arn ]
DefinitionString:
Fn::Sub:
- |-
{
"Comment" : "Sample draft process",
"StartAt" : "AdvanceWorkflowToDraftInProgress",
"States" : {
"AdvanceWorkflowToDraftInProgress" : {
"Type" : "Task",
"Resource": "${WorkflowStateChangeLambdaArn}",
"InputPath":"$.contractId",
"OutputPath":"$",
"ResultPath":null,
"Next" : "CheckQuestionnaireType",
"Retry" : [
{
"ErrorEquals" : ["States.TaskTimeout"],
"MaxAttempts": 5,
"IntervalSeconds": 1
},
{
"ErrorEquals" : ["CustomErrorA"],
"MaxAttempts": 5
}
],
"Catch": [
{
"ErrorEquals": [ "States.ALL" ],
"Next": "FailureNotifier"
}
]
},
"CheckQuestionnaireType" : {
"Type" : "Choice",
"Choices" : [
{
"Variable" : "$.questionnaireType",
"StringEquals" : "CE",
"Next" : "PublishQuestionnaireAnswersToCE"
},
{
"Variable" : "$.questionnaireType",
"StringEquals" : "LEAF",
"Next" : "PublishQuestionnaireAnswersToLeaf"
}
]
},
"PublishQuestionnaireAnswersToCE" : {
"Type" : "Task",
"Resource": "${WorkflowStateChangeLambdaArn}",
"Next" : "UpdateCEMetadataAndGenerateDocuments",
"ResultPath" : null,
"OutputPath" : "$",
"Retry" : [
{
"ErrorEquals" : ["States.TaskTimeout"],
"MaxAttempts": 5,
"IntervalSeconds": 1
},
{
"ErrorEquals" : ["CustomErrorA"],
"MaxAttempts": 5
}
],
"Catch": [
{
"ErrorEquals": [ "States.ALL" ],
"Next": "FailureNotifier"
}
]
},
"PublishQuestionnaireAnswersToLeaflet" : {
"Type" : "Task",
"Resource": "${WorkflowStateChangeLambdaArn}",
"End" : true,
"Retry" : [
{
"ErrorEquals" : ["States.TaskTimeout"],
"MaxAttempts": 5,
"IntervalSeconds": 1
},
{
"ErrorEquals" : ["CustomErrorA"],
"MaxAttempts": 5
}
],
"Catch": [
{
"ErrorEquals": [ "States.ALL" ],
"Next": "FailureNotifier"
}
]
},
"UpdateCEMetadataAndGenerateDocuments" : {
"Type" : "Task",
"Resource": "${WorkflowStateChangeLambdaArn}",
"End" : true,
"Retry" : [
{
"ErrorEquals" : ["States.TaskTimeout"],
"MaxAttempts": 5,
"IntervalSeconds": 1
},
{
"ErrorEquals" : ["CustomErrorA"],
"MaxAttempts": 5
}
],
"Catch": [
{
"ErrorEquals": [ "States.ALL" ],
"Next": "FailureNotifier"
}
]
},
"FailureNotifier" : {
"Type" : "Task",
"Resource": "${FailureNotifierLambdaArn}",
"End" : true,
"Retry" : [
{
"ErrorEquals" : ["States.TaskTimeout"],
"MaxAttempts": 5,
"IntervalSeconds": 1
},
{
"ErrorEquals" : ["CustomErrorA"],
"MaxAttempts": 5
}
]
}
}
}
- WorkflowStateChangeLambdaArn:
Fn::GetAtt: [ CreateContractFromQuestionnaireFunction, Arn ]
- FailureNotifierLambdaArn:
Fn::GetAtt: [ CreateContractFromQuestionnaireFunction, Arn ]
Error - Template error: One or more Fn::Sub intrinsic functions don't specify expected arguments. Specify a string as first argument, and an optional second argument to specify a mapping of values to replace in the string
This is just a sample with same lambda used multiple times but the problem is in passing list/map to Fn::Sub.
Could anyone help me resolve this issue or provide an alternate solution to achieve the same?
Thanks,
Fn::Sub takes either a single string as a parameter or a list. When using the list method there should be just two elemenets in the list. The first element is a string (the template) and the second is a map.
From the Fn::Sub documentation
Fn::Sub:
- String
- { Var1Name: Var1Value, Var2Name: Var2Value }
Note: since you are just using Fn::Get attribute to build the substitution value you can just use ${CreateContractFromQuestionnaireFunction.Arn} and use the single string version of Fn::Sub.
E.g. (I've shortened the step function for clarity.
Fn::Sub:|-
{
"Comment" : "Sample draft process",
"StartAt" : "AdvanceWorkflowToDraftInProgress",
"States" : {
"AdvanceWorkflowToDraftInProgress" : {
"Type" : "Task",
"Resource": "${CreateContractFromQuestionnaireFunction.Arn}",
"InputPath":"$.contractId",
"OutputPath":"$",
"ResultPath":null,
"Next" : "CheckQuestionnaireType",
"Retry" : [
...

Google places API not returning address components

My program is trying to determine the City, state, and country based on some text, for example "New york yankee stadium" I want to get New york city, NY, USA. I am using Google places API to do this. According to the documentations, the API should return a list of address component https://developers.google.com/places/web-service/details. However, right now its only returning formatted address "1 E 161st St, Bronx, NY 10451, United States".
here is my web service url
https://maps.googleapis.com/maps/api/place/textsearch/json?key=MY_KEY&query=new%20york%20yankee%20stadium
Anyone familiar with google places API that can let me know if I am not writting the right query or parameter?
{
"html_attributions" : [],
"results" : [
{
"formatted_address" : "1 E 161st St, Bronx, NY 10451, United States",
"geometry" : {
"location" : {
"lat" : 40.82964260000001,
"lng" : -73.9261745
},
"viewport" : {
"northeast" : {
"lat" : 40.83279975,
"lng" : -73.92236575000001
},
"southwest" : {
"lat" : 40.82643674999999,
"lng" : -73.93052034999999
}
}
},
"icon" : "https://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png",
"id" : "3d78036d61d35f48650bda737226432b57d82511",
"name" : "Yankee Stadium",
"opening_hours" : {
"open_now" : true,
"weekday_text" : []
},
"photos" : [
{
"height" : 540,
"html_attributions" : [
"\u003ca href=\"https://maps.google.com/maps/contrib/101696810905045719819/photos\"\u003eYankee Stadium\u003c/a\u003e"
],
"photo_reference" : "CoQBdwAAAIxmCLrNS_XZ2FcJqVvRVtBUlNYMBVTVKppOWBu7sICj2q70cqJARBoJlTcZpydbMTzURKWWMVJhYpVCqsnia5pjmDhjvjsTirrEnAc6gvmRYKuUwgewB9Re--FulXzXZ5DY3P9fkwIwuc4U9BJVbqHD5O-N6SbbHcqn4XHUj_OdEhCoNPZ3kiNJhxOCGdYG5O4DGhTqVfUjdq7JzasqYATvQxkL1-H3xg",
"width" : 1242
}
],
"place_id" : "ChIJcWnnWiz0wokRCB6aVdnDQEk",
"rating" : 4.4,
"reference" : "CmRRAAAA5dHiw1YmLxW60_jITBZjMiUs48L4aVUqlPnPDpN_ySa7rw8kPp04WWk0qf8mG-kkMFSNzh39lP0YwfynW54tLcY4s_EYbAPvNWTMe6wXHm_FJiVbI0Lfenyxz4yOTzunEhDgI64EWoXkQe9k45y6qP3-GhSVSdCMPPZA3joFbnYGV-bqo2e0lw",
"types" : [ "stadium", "point_of_interest", "establishment" ]
}
],
"status" : "OK"
}
Its a two step process, first search and get the place_id from google places search service,
use the returned place_id and pass it with another call to receive the individual address components,
https://maps.googleapis.com/maps/api/place/details/json?place_id=ChIJcWnnWiz0wokRCB6aVdnDQEk&key=
{"html_attributions": [],
"result": {
"address_components": [
{
"long_name": "1",
"short_name": "1",
"types": [
"street_number"
]
},
{
"long_name": "East 161st Street",
"short_name": "E 161st St",
"types": [
"route"
]
},
{
"long_name": "Concourse",
"short_name": "Concourse",
"types": [
"neighborhood",
"political"
]
},
{
"long_name": "Bronx",
"short_name": "Bronx",
"types": [
"sublocality_level_1",
"sublocality",
"political"
]
},
{
"long_name": "Bronx County",
"short_name": "Bronx County",
"types": [
"administrative_area_level_2",
"political"
]
},

How to insert large polar polygons in elasticsearch?

When inserting a large polygon near the south pole:
"polygon":{
"type":"polygon",
"coordinates":[
[
[
-134.97410583496094,
-61.81480026245117
],
[
-130.1757049560547,
-63.236000061035156
],
[
-125.17160034179688,
-64.40799713134766
],
[
-152.0446014404297,
-75.72830200195312
],
[
143.52340698242188,
-77.68319702148438
],
[
147.41830444335938,
-75.44519805908203
],
[
150.2816925048828,
-73.01909637451172
],
[
-162.17909240722656,
-71.5260009765625
],
[
-134.97410583496094,
-61.81480026245117
]
]
]
},
, the following error is returned.
{
"error" : "RemoteTransportException[[ISAAC][inet[/x.x.x.x:9300]][indices:data/write/index]]; nested: MapperParsingException[failed to parse [polygon]]; nested: InvalidShapeException[Self- intersection at or near point (-142.29442281263474, -71.62101996804898, NaN)]; ",
"status" : 400
}
The mapping of the type is:
curl -XPUT http://localhost:9200/files/_mapping/polar -d '
{
"polar" : {
"properties" : {
"startTimeRange" : { "type" : "date"},
"endTimeRange" : { "type" : "date"},
"productShortName" : {
"type": "string",
"index" : "not_analyzed"
},
"polygon" : {
"type" : "geo_shape",
"tree" : "quadtree",
"precision" : "1000m"
}
}
}
}
'
The intended shape is essential a rectangle crossing the dateline (anti-meridian).
It looks like the shape is being interpreted as a self-intersecting
polygon crossing the meridian (0 - Longitude).
What is the best way to represent the intended shape in elasticsearch?
Dateline and Pole Crossing is a known issue. Dateline crossing was fixed in ES 1.4.3 but the pole crossing patch will be released in a future version. For now (and this can be a serious PITA for ambiguous polygon's) you'll have to unwrap the Polygon into a MultiPolygon yourself (presumably at the application layer).
Here's an example using your data.
The original self-crossing poly (as seen at the following gist: https://gist.github.com/nknize/ea0e103a22bddae13dfb)
{
"type" : "Polygon",
"coordinates":[[
[
-134.97410583496094,
-61.81480026245117
],
[
-130.1757049560547,
-63.236000061035156
],
[
-125.17160034179688,
-64.40799713134766
],
[
-152.0446014404297,
-75.72830200195312
],
[
143.52340698242188,
-77.68319702148438
],
[
147.41830444335938,
-75.44519805908203
],
[
150.2816925048828,
-73.01909637451172
],
[
-162.17909240722656,
-71.5260009765625
],
[
-134.97410583496094,
-61.81480026245117
]
]]
}
Corrected version using polar "unwrapping" (seen at the following gist: https://gist.github.com/nknize/8e87ee88d3915498507e)
{
"type" : "MultiPolygon",
"coordinates":[[[
[
-180.0,
-72.092693931
],
[
-162.17909240722656,
-71.5260009765625
],
[
-134.97410583496094,
-61.81480026245117
],
[
-130.1757049560547,
-63.236000061035156
],
[
-125.17160034179688,
-64.40799713134766
],
[
-152.0446014404297,
-75.72830200195312
],
[ -173.3707512019,
-90.0
],
[
-180.0,
-90.0
],
[
-180.0,
-72.092693931
]
]],
[[
[
173.3707512019,
-90.0
],
[
143.52340698242188,
-77.68319702148438
],
[
147.41830444335938,
-75.44519805908203
],
[
150.2816925048828,
-73.01909637451172
],
[
180.0,
-72.092693931
],
[
180.0,
-90.0
],
[
173.3707512019,
-90.0
]
]]
]
}
Note that the above "corrected" MultiPolygon is a rough calculation (with floating point error) used just for this example.
Not an ideal answer, but I hope it helps!

Resources