How to use JSONata to change JSON response - jsonata

I am working with an API that returns the following JSON:
{
"rows": [
{
"keys": [
"search term 1",
"https://example.com/article-about-keyword-1/"
],
"clicks": 24,
"impressions": 54,
"ctr": 0.4444444444444444,
"position": 2.037037037037037
},
{
"keys": [
"search term 2",
"https://example.com/article-about-keyword-2/"
],
"clicks": 17,
"impressions": 107,
"ctr": 0.1588785046728972,
"position": 2.663551401869159
}
],
"responseAggregationType": "byPage"
}
And I'm trying to use JSONata to change it to something more like this:
{
"rows": [
{
"keyword": search term 1,
"URL": https://example.com/article-about-keyword-1/,
"clicks": 24,
"impressions": 54,
"ctr": 0.4444444444444444,
"position": 2.037037037037037
},
{
"keyword": search term 2,
"URL": https://example.com/article-about-keyword-2/,
"clicks": 17,
"impressions": 107,
"ctr": 0.1588785046728972,
"position": 2.663551401869159
}
],
"responseAggregationType": "byPage"
}
Basically, I'm trying the break the 'keys' part out into 'Keyword' and 'URL'.
Have been playing around for a while in https://try.jsonata.org/ but I'm not getting very far. Any help appreciated.

Splitting the keys array should be achievable by accessing each element there by its index (given that the keyword and the URL are guaranteed to appear on the same index).
Here’s the full JSONata expression to translate from your source file to the desired target shape:
{
"rows": rows.{
"keyword": keys[0],
"URL": keys[1],
"clicks": clicks,
"impressions": impressions,
"ctr": ctr,
"position": position
}[],
"responseAggregationType": responseAggregationType
}
By the way, I’ve built this solution in 2 minutes by using the Mappings tool that my team is building at Stedi.

Related

In consistent results returned with white space in query

Using NEST.
I have the following code.
QueryContainerDescriptor<ProductIndex> q
var queryContainer = new QueryContainer();
queryContainer &= q.Match(m => m.Field(f => f.Code).Query(parameters.Code));
I would like to have both these criteria
code=FRUIT 12 //with space
code=FRUIT12 //no space
Return products 1 and 2
Currently
I get products 1 and 2 if I set code=FRUIT 12 //with space
and I only get product 2 if I set code=FRUIT12 //no space
Sample data
Products
[
{
"id": 1,
"name": "APPLE",
"code": "FRUIT 12"
},
{
"id": 2,
"name": "ORANGE",
"code": "FRUIT12"
}
]
by default, a string field will have a standard tokenizer, that will emit a single token "FRUIT12" for the "FRUIT12" input.
You need to use a word_delimiter token filter in your field analyzer to allow the behavior your are expecting :
GET _analyze
{
"text": "FRUIT12",
"tokenizer": "standard"
}
gives
{
"tokens": [
{
"token": "FRUIT12",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
}
]
}
----------- and
GET _analyze
{
"text": "FRUIT12",
"tokenizer": "standard",
"filters": ["word_delimiter"]
}
gives
{
"tokens": [
{
"token": "FRUIT",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "12",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
If your add the word_delimiter token filter on your field, any search query on this field will also have the word_delimiter token filter enabled ( unless you override it with search_analyzer option in the mapping )
so "FRUIT12" mono-term query will be "translated" to ["FRUIT", "12"] multi term query.

Elasticsearch Nest Getting and updating a single Document

I would like to be able to select a single document here is a sample of how a document looks
{
"_index": "myindex_products",
"_type": "product",
"_id": "8Wct9mEBlkDZwzEMRfbG",
"_version": 1,
"_score": 1,
"_source": {
"productId": 5749,
"name": "Product Name Here",
"productCode": "PRODCODE",
"productCategoryId": 73,
"length": 6,
"height": 0,
"productTypeId": 1,
"url": "product-name-here",
"productBrandId": 7,
"width": 0,
"dispatchTimeInDays": 10,
"leadTimeInDays": 6,
"stockAvailable": 0,
"weightKg": 0.001,
"reviewRating": 5,
"reviewRatingCount": 17,
"limitedStock": false,
"price": 16.3,
"productImage": "28796-14654.jpg",
"productCategory": {
"productCategoryId": 73,
"name": "Accessories - New",
"fullPath": "Accessories - New",
"code": "00057"
},
"productSpecification": [
{
"productSpecificationId": 127151,
"productId": 5749,
"specificationId": 232,
"name": "Brand",
"value": "Brand1"
}
,
{
"productSpecificationId": 127175,
"productId": 5749,
"specificationId": 10,
"name": "Guarantee",
"value": "10 years"
}
]
}
}
_id is being generated when I index so I don't know this at the point I want to update. I have the productId value and I would like to use this to select a document to then update/delete is there a way to return a single document if you know a particular exact value.
Thanks
While indexing, you can use something like PUT your_index/5749 (5749 being your product id) and ES will use its value for the _id field instead of auto-generating it.

Microsoft LUIS builtin.number

I used builtin.number in my LUIS app trying to collect a 4 digit pin number. The following is what's returned from LUIS when my input is "one two three four".
"entities": [
{
"entity": "one",
"type": "builtin.number",
"startIndex": 0,
"endIndex": 2,
"resolution": {
"value": "1"
}
},
{
"entity": "two",
"type": "builtin.number",
"startIndex": 4,
"endIndex": 6,
"resolution": {
"value": "2"
}
},
{
"entity": "three",
"type": "builtin.number",
"startIndex": 8,
"endIndex": 12,
"resolution": {
"value": "3"
}
},
{
"entity": "four",
"type": "builtin.number",
"startIndex": 14,
"endIndex": 17,
"resolution": {
"value": "4"
}
},
As you can see, it's returning individual digits in both text and digit format. Seems to me that it's more important to return the whole digit than the individual ones. Is there a way to do it so that I get '1234' as result for builtin.number?
Thanks!
It's not possible to do what you're asking for by only using LUIS. The way LUIS does its tokenization is that it recognizes each word/number individually due to the whitespace. It goes without saying that 'onetwothreefour' will also not return 1234.
Additionally, users are unable to modify the recognition of the prebuilt entities on an individual model level. The recognizers for certain languages are open-source, and contributions from the community are welcome.
All of that said, a way you could achieve what you're asking for is by concatenating the numbers. A JavaScript example might be something like the following:
var pin = '';
entities.forEach(entity => {
if (entity.type == 'builtin.number') {
pin += entity.resolution.value;
}
}
console.log(pin); // '1234'
After that you would need to perform your own handling/regexp, but I'll leave that to you. (after all, what if someone provides "seven eight nine ten"? Or "twenty seventeen"?)

Exclude from CamelCase tokenizer in Elasticsearch

Struggling to make iPhone match when searching for iphone in Elasticsearch.
Since I have some source code at stake, I surely need CamelCase tokenizer, but it appears to break iPhone into two terms, so iphone can't be found.
Anyone knows of a way to add exceptions to breaking camelCase words into tokens (camel + case)?
UPDATE: to make it clear, I want NullPointerException to be tokenized as [null, pointer, exception], but I don't want iPhone to become [i, phone].
Any other solution?
UPDATE 2: #ChintanShah's answer suggests a different approach that gives us even more - NullPointerException will be tokenized as [null, pointer, exception, nullpointer, pointerexception, nullpointerexception], which is definitely much more useful from the point of view of the one that searches. And indexing is also faster! Price to pay is index size, but it is a superior solution.
You can achieve your requirements with word_delimiter token filter.
This is my setup
{
"settings": {
"analysis": {
"analyzer": {
"camel_analyzer": {
"tokenizer": "whitespace",
"filter": [
"camel_filter",
"lowercase",
"asciifolding"
]
}
},
"filter": {
"camel_filter": {
"type": "word_delimiter",
"generate_number_parts": false,
"stem_english_possessive": false,
"split_on_numerics": false,
"protected_words": [
"iPhone",
"WiFi"
]
}
}
}
},
"mappings": {
}
}
This will split the words on case changes so NullPointerException will be tokenized as null, pointer and exception but iPhone and WiFi will remain as it is as they are protected. word_delimiter has lot of options for flexibility. You can also preserve_original which will help you a lot.
GET logs_index/_analyze?text=iPhone&analyzer=camel_analyzer
Result
{
"tokens": [
{
"token": "iphone",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 1
}
]
}
Now with
GET logs_index/_analyze?text=NullPointerException&analyzer=camel_analyzer
Result
{
"tokens": [
{
"token": "null",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "pointer",
"start_offset": 4,
"end_offset": 11,
"type": "word",
"position": 2
},
{
"token": "exception",
"start_offset": 11,
"end_offset": 20,
"type": "word",
"position": 3
}
]
}
Another approach is to analyze your field twice with different analyzers but I feel word_delimiter will do the trick.
Does this help?

Configuring line charts with remote data in Kendo UI

I am looking to render a line chart using Kendo UI. http://demos.telerik.com/kendo-ui/line-charts/remote-data-binding
It expects the json data to be directly an array like in the format (from their example):
[
{
"date": "12/30/2011",
"close": 405,
"volume": 6414369,
"open": 403.51,
"high": 406.28,
"low": 403.49,
"symbol": "2. AAPL"
},
{
"date": "11/30/2011",
"close": 382.2,
"volume": 14464710,
"open": 381.29,
"high": 382.276,
"low": 378.3,
"symbol": "2. AAPL"
}
]
However, I have a URL that returns the data in the following format. Note the extra object 'ranks' at the beginning which has the array:
{
"ranks": [
{
"id": 2,
"rank": 3,
"rankdate": "2015-05-17T00:00:00+0000",
"student": {
"id": 203,
"name": "Student1",
"currentRank": 3,
"LastVerified": "2015-05-17T22:30:00+0000"
}
},
{
"id": 1,
"rank": 4,
"rankdate": "2015-05-16T00:00:00+0000",
"student": {
"id": 203,
"name": "Student1",
"currentRank": 3,
"LastVerified": "2015-05-17T22:30:00+0000"
}
}
]
}
I was wondering if there was a way to have the datasource look inside "ranks" for the array instead of expecting it directly.
Found it. One can customize the schema in Kendo for the datasource using:
schema: {
data: "ranks"
},

Resources