Exact match in Alfresco FTS with special characters - full-text-search

I'm trying to perform a query in Alfresco 5.2.7, in Alfresco FTS language using the API, to highlight some results. As long as I don't have special characters, everything is fine. The issue comes when I try to perform a query to search (in exact match) the word "name?". I can't escape the question mark to be able to force the query to search the question mark along with the whole word. The results in the highlighting array are always without the special characters.
This is my functional query:
{
"query": {
"language": "afts",
"query": "(content:\"content\") AND TYPE:\"cm:content\""
},
"paging": {
"maxItems": 100,
"skipCount": 0
},
"scope": {
"locations": "nodes"
},
"highlight": {
"snippetCount":10,
"mergeContiguous": true,
"fields": [
{
"field": "cm:content"
}
]
}
}
And this is the result: note the highlighted word is between tag.
{
"list": {
"pagination": {
"count": 1,
"hasMoreItems": false,
"totalItems": 1,
"skipCount": 0,
"maxItems": 100
},
"context": {},
"entries": [
{
"entry": {
"isFile": true,
"createdByUser": {
"id": "admin",
"displayName": "Administrator"
},
"modifiedAt": "2021-01-15T15:29:03.275+0000",
"nodeType": "miims:contenytrOPI",
"content": {
"mimeType": "text/html",
"mimeTypeName": "HTML",
"sizeInBytes": 489,
"encoding": "UTF-8"
},
"parentId": "8b7c5c54-293b-4c95-a850-824efd402667",
"createdAt": "2020-12-22T08:12:12.369+0000",
"isFolder": false,
"search": {
"score": 0.21531886,
"highlight": [
{
"field": "cm:content",
"snippets": [
"\n\n\n2??3pppusa\n2 <em>content</em>?"
]
}
]
},
"modifiedByUser": {
"id": "admin",
"displayName": "Administrator"
},
"name": "nodeName",
"location": "nodes",
"id": "b7811537-b3af-47bf-9f9c-c4bfaa43832a"
}
}
]
}
}
The question is simple: how to force FTS to ignore special characters like "?" or "*" and search those characters in a literal way? I have tried with ? or /? in the query, with no results.

This should be pretty straight forward by escaping the special character through a backslash.
https://docs.alfresco.com/4.2/concepts/rm-searchsyntax-escaping.html
https://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters

Related

How can i form the property in compose to return int(0) if condition is true and not return anything if condition is false?

How can i form this expression to return int value of 0 if true and don't return the property if false? warehouse event is an array and the property is inside a compose.
Expression:
if(contains(variables('WareHouseEvent'), 'OB_2910'), int(0), <not
return anything)
An alternative to the first answer is to always add and then remove it after the fact.
This is an example you can copy into your own tenant for testing.
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Compose_JSON_Object": {
"inputs": {
"AnotherProperty": "Another Value",
"TestProperty": "#variables('Data')"
},
"runAfter": {
"Initialize_Integer": [
"Succeeded"
]
},
"type": "Compose"
},
"Initialize_Integer": {
"inputs": {
"variables": [
{
"name": "Data",
"type": "integer",
"value": 0
}
]
},
"runAfter": {},
"type": "InitializeVariable"
},
"New_JSON_Object": {
"inputs": {
"variables": [
{
"name": "New Object",
"type": "object",
"value": "#if(equals(variables('Data'), 1), removeProperty(outputs('Compose_JSON_Object'), 'TestProperty'), outputs('Compose_JSON_Object'))"
}
]
},
"runAfter": {
"Compose_JSON_Object": [
"Succeeded"
]
},
"type": "InitializeVariable"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {},
"triggers": {
"manual": {
"conditions": [],
"inputs": {},
"kind": "Http",
"type": "Request"
}
}
},
"parameters": {}
}
I have an integer variable at the top which stores a value of either 1 or 0.
Then in my compose, I add that value to a property in the compose statement.
Then beneath that, I set a new variable with the (potentially) updated object using an expression to determine if the added property should be removed or not.
You'd just need to adjust the condition portion of the IF statement with your expression.
if(equals(variables('Data'), 1), removeProperty(outputs('Compose_JSON_Object'), 'TestProperty'), outputs('Compose_JSON_Object'))
The property will be removed depending on the value of the Data variable.
Removed
Retained
One of the workarounds is that you can use Condition connector when if the mentioned condition satisfies it executes true block else it executes false. From there you can use the same Compose content.
Here is the screenshot of the logic app -
output :-

NiFi: ReplaceText alternatives to modify JSON

My NiFi application receives two kinda different types of JSON's.
First of them looks like:
[
{
"campaign": {
"resourceName": "customers/8952771329/campaigns/11381694617",
"status": "ENABLED",
"name": "Saint_Spring_Active Minerals_oct-nov_2020_trueview_skip_5766500views",
"id": "11381694617"
},
"metrics": {
"interactionEventTypes": [
"VIDEO_VIEW"
],
"clicks": "6",
"videoQuartileP100Rate": 0.44493171079034244,
"videoQuartileP25Rate": 0.9747718298919024,
"videoQuartileP50Rate": 0.7339309987701469,
"videoQuartileP75Rate": 0.5337562301767105,
"videoViewRate": 0.4471109114825628,
"videoViews": "27872",
"viewThroughConversions": "0",
"contentBudgetLostImpressionShare": 0.0000013066088274492382,
"contentImpressionShare": 0.0999,
"contentRankLostImpressionShare": 0.9001,
"conversionsValue": 0,
"conversions": 0,
"costMicros": "9338700950",
"ctr": 0.00009624947864865732,
"currentModelAttributedConversions": 0,
"currentModelAttributedConversionsValue": 0,
"engagementRate": 0,
"engagements": "0",
},
"segments": {
"device": "CONNECTED_TV",
"date": "2020-12-20"
}
}
]
And second:
[
{
"adGroup": {
"resourceName": "customers/5404177717/adGroups/110501283582",
"campaign": "customers/5404177717/campaigns/11628802542"
},
"metrics": {
"interactionEventTypes": [
"CLICK"
],
"clicks": "1",
"averageCpm": 95497428.02172929,
"gmailForwards": "0",
"gmailSaves": "0",
"gmailSecondaryClicks": "0",
"impressions": "4418",
"interactionRate": 0.00022634676324128565,
"interactions": "1"
},
"adGroupAd": {
"resourceName": "customers/5404177717/adGroupAds/110501283582~480227690139",
"status": "ENABLED",
"ad": {
"resourceName": "customers/5404177717/ads/480227690139",
"id": "480227690139",
"name": "20 sec perek"
},
"adGroup": "customers/5404177717/adGroups/110501283582"
},
"segments": {
"device": "DESKTOP",
"date": "2020-11-21"
}
}
]
I already have 2 tables in my database to save this data. I have an attribute table.name just to not create same block where's only table name is different.
My next block is FlattenJson. After this i'm using ReplaceText with search value (replacement value is empty string): (customers\\\/${client.customer.id}\\\/campaigns\\\/|customers\\\/${client.customer.id}\\\/adGroups\\\/).
Why this? From this line: "adGroup": "customers/5404177717/adGroups/110501283582" i only need last value 110501283582 as ad_group_id. And from this line: "campaign": "customers/5404177717/campaigns/11628802542" i only need 11628802542. ${client.customer.id} can be different, so i'm using EL features.
Also i need to change json value name adGroup to ad.group.id, for this i'm also using ReplaceText.
Can i do it faster without two ReplaceText processors?
Look at the following processors...I think using them can be an alternative:
JoltTransformJSON:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.JoltTransformJSON/
UpdateRecord:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.UpdateRecord/index.html

spring mongodb criteria API: check two values on the same nested element

I have the following query:
Criteria crit = Criteria.where("nestedObj.date").lt(LocalDate.now())
.and("nestedObj.active").is(true)
.and("someId").is(null)
.and("somethingElse").exists(false);
How can I make sure that nestedObj.active and nestedObj.date are checked on the same nestedObj?
I only want this to match if a document has a nestedObj that is active AND has a date older than today.
Example:
If the nestedObj array on a document loos like this, the query should match:
[
{
"nestedObj": {
"active": "true",
"date": "2010-29-10"
},
{
"nestedObj": {
"active": "false",
"date": "2010-29-10"
},
{
"nestedObj": {
"active": "true",
"date": "2022-29-10"
}
]
But if it looks like this, it shouldn't:
[
{
"nestedObj": {
"active": "false",
"date": "2010-29-10"
},
{
"nestedObj": {
"active": "true",
"date": "2022-29-10"
}
]
Check the element match in https://docs.mongodb.com/manual/reference/operator/query/elemMatch/
for instance
where("nestedObj.date").elemMatch( where("attribute1").is("value1").and("attribute2").regex("(?i).*$something.*")

I'm using Sentiment on NLU, getting this error: "warnings": [ "sentiment: cannot locate keyphrase"

when I enter this request:
{
"text": "
Il sindaco pensa solo a far realizzare rotonde...non lo disturbate per le cavolate! ,Che schifo!
",
"features":
{
"sentiment": {
"targets": [
"aggressione", "aggressioni", "agguati", "agguato", "furto", "furti", "lavoro nero",
"omicidi", "omicidio", "rapina", "rapine", "ricettazione", "ricettazioni", "rom", "zingari", "zingaro",
"scippo", "scippi", "spaccio", "scommesse"
]
},
"categories": {},
"entities": {
"emotion": true,
"sentiment": true,
"limit": 5
},
"keywords": {
"emotion": true,
"sentiment": true,
"limit": 5
}
}
}
I get this response:
{
"language": "it",
"keywords": [
{
"text": ",Che schifo",
"relevance": 0.768142
}
],
"entities": [],
"categories": [
{
"score": 0.190673,
"label": "/law, govt and politics/law enforcement/police"
},
{
"score": 0.180499,
"label": "/style and fashion/clothing/pants"
},
{
"score": 0.160763,
"label": "/society/crime"
}
],
"warnings": [
"sentiment: cannot locate keyphrase"
]
}
Why I don't receive output for the document sentiment? if NLU does not find the key phrase it gives back this warning without the sentiment for the text! is this a NLU error to fix?
If NLU does not find any of the keyphrases you passed, then it would throw the warning "cannot locate keyphrase". It does return the doc sentiment even when one of the targets is present in the text.
If you are not sure about the presence of target phrases in your text, make a separate API call just for sentiment without any targets for retrieving document sentiment.
I would not say it as a bug on NLU Side but the service can be lenient instead of being strict if it did not find any target phrase in a given text.

What are good ways to solve a strange data retrieval issue in elastic search?

I've got a strange issue with an elastic search server.
The elastic search version is 1.6. 'records' is the name of the type. The url for the search is http://some.domain:9200/user/records/_search. The field mapping for 'un' is string.
The following query which been working for years is sometimes failing depending on the value of {someId} newer ids fail, old ones work. The data is there it's just not being found ...
{
"from": 0,
"size": 1,
"sort": {
"un": "desc",
"_score": "desc"
},
"query": {
"query_string": {
"query": "un:\"{someId}\"",
"fields": [
"id",
"un",
"e",
"fn",
"ln",
"bn",
"jt",
"sy",
"c",
"st",
"p",
"fbid",
"lnid"
]
}
}
}
After doing some diagnostics I discovered the following query always works whether or not {someId} is old or new ...
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "records.un",
"query": "{someId}"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"aggs": {}
}
This is a sample document that matches with the second query and fails with the first.
{
"un": "xxxxxxx.xxxxxxx",
"e": "xxxxxxx",
"pswd": "xxxxxxx",
"fn": "xxxxxxx",
"ln": "xxxxxxx",
"bn": "xxxxxxx",
"jt": "",
"sy": "xxxxxxx",
"urole": "User",
"id": "xxxxxxx",
"status": "1",
"lld": "201704280016",
"cd": "201702100132",
"md": "201704280549",
"cc": "0",
"p": "",
"logo": "",
"mlogo": "",
"ad": "201702100132",
"com": "xxxxxxx",
"rr": "true",
"sid": "00000000-0000-0000-0000-000000000000",
"fbidp": "",
"lnidp": "",
"role": "Lots of data is in this one",
"dim": "",
"drm": "",
"drcm": "xxxxxxx",
"drcfbm": "xxxxxxx",
"drclnm": "xxxxxxx",
"as": "false",
"apr": "true",
"iuid": "xxxxxxx",
"vcount": "9",
"pplatform": "",
"pname": "",
"pid": "00000000-0000-0000-0000-000000000000",
"preciept": "",
"ms": "Free"
}
I'm thinking that reindexing the server might solve the issue. What are good ways to solve strange data retrieval issues in elastic search?
There is significant difference between your first ("query": "un:\"{someId}\"") query and second ("query": "{someId}") query. In former query as you are wrapping someId in quotes as a result it will search for exact phrase i.e if you have xxx.yyy then it will look for whole id including dot(.) so id will be matched only when id doesn't contains dot where as in latter query your someId will be analyzed i.e xxx.yyy will be tokenized into two strings (xxx and yyy) and it will be matched if you have dot.
You need to change mappings of un field. If you are not doing any full-text search queries on un then I'd suggest you to make it not_analyzed. Otherwise you need to use different analyzer like whitespace instead of default standard analyzer. I'd really suggest to go with former solution as it(structured exact fields) is more efficient than latter.

Resources