How to remove field from document which matches a pattern in elasticsearch using Java? - elasticsearch

I have crawled few documents and created an index in elasticsearch. I am using sense to query:
This is my query in elasticsearch:
POST /index/_update_by_query
{
"script": {
"inline": "ctx._source.remove(\"home\")"
},
"query": {
"wildcard": {
"url": {
"value": "http://search.com/*"
}
}
}
}
This is my Java program:
Client client = TransportClient.builder().addPlugin(ReindexPlugin.class)
.build().addTransportAddress(new InetSocketTransportAddress(
InetAddress.getByName("127.0.0.1"), 9300));
UpdateByQueryRequestBuilder ubqrb = UpdateByQueryAction.INSTANCE
.newRequestBuilder(client);
Script script1 = new Script("ctx._source.remove" +FieldName);
BulkIndexByScrollResponse r = ubqrb.source("index").script(script1)
.filter(wildcardQuery("url", patternvalue)).get();
FieldName(where home is saved as a string) is the name of the field which I want to remove from my documents. patternvalue is where pattern "http://search.com/*" is stored. When I run this Java program, it doesn't remove home field from my documents. It adds a new field in my documents called remove. I might be missing something. Any help would be appreciated

If FieldName is the string home, then the expression "ctx._source.remove" +FieldName will be equal to "ctx._source.removehome" which is not the correct script. The correct code for that line is:
Script script1 = new Script("ctx._source.remove(\"" + FieldName + "\")");
This way the script will be:
ctx._source.remove("home")
That is the same as you wrote in json in:
"inline": "ctx._source.remove(\"home\")"
(\" in that json is just a " escaped in the json syntax)

Related

How to use carriage return in a script template with a runtime mapping field?

Here is an example that illustrates the problem we are having with "mustache" and the carriage return.
In our script template, we need :
a runtime mapping field : to compute a result (with a big script in our real case)
conditional template : to build search criteria according to params existence (many criteria in our real case)
We use Elasticsearch 7.16 and kibana debug console to make our tests.
We create this script template with this request :
POST _scripts/test
{
"script": {
"lang": "mustache",
"source": """{
"runtime_mappings": {
"result": {
"type": "long",
"script": {
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
}
}
}
{{#foo}}
,"fields": [
"result"
]
{{/foo}}
}"""
}
}
Here are 2 examples of requests that show how this script works:
Request 1 : Search request with param
Return the computed field "result" with the "foo" parameter value (12345)
GET _search/template
{
"id": "test",
"params": {
"foo": 12345
}
}
Request 2 : Search request without param
Don't return computed field "result".
GET _search/template
{
"id": "test"
}
Like i said before, in our real case we have a very big "painless" script in the computed field.
For more readability, we therefore wrote this script on several lines and that's when a problem appears.
An error happened when we declare:
"source": "
emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})
"
instead of:
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
Due to the JSON specifications, we cannot use carriage returns otherwise we get the following error:
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
We also cannot use the notation with """ because it will conflict with the one used to declare the source of the script template.
Is there a trick to set the computed field script to multiple lines in Kibana debug console ?

Compare two fields in same document without using script elasticsearch

We are using elastic version 7.10.2. I want to compare two fields from a same document.Scripting is disabled in my organization.
Kindly help in building below query without using script.
Here my query is : nickname is null or nickname is empty or nickname is equal to firstname.
Hard part is how to build query to get the records which have nickname is equal to firstname
Relevant script query to be converted to normal query :
{
"query": {
"bool": {
"must": [{
"script": {
"script": {
"inline": "doc['nickname.keyword'].value==null || doc['nickname.keyword'].value =='' || doc['nickname.keyword'].value == doc['firstname.keyword'].value",
"lang": "painless",
}
}
}]
}
}
}
I see you are already comparing the nickname.keyword to your firstname also mentioned this is the hard part, for this why you need a script, you can simply use the search query on this keyword field and get the result you want.
You can use below term query for it.
{
"query": {
"term": {
"nickname.keyword": {
"value": "your-nickname", // provide your nickname as value
}
}
}
}

Getting Distinct fields from Elasticsearch

I have 1Million documents which has a field called id.The id field of all the 1Million docs are different.
Eg:1.id:http://www.bing.com/search?q=malaysia. 2.id:http://www.google.com/search?q=singapore. 3.id:http://www.bing.com/search?q=india. 4.id:http://www.google.com/search?q=america 5.id:http://www.duckduckgo.com/?q=africa 6.id:http://www.duckduckgo.com/?q=asia
Can someone help me to form a query to get only the 3 distinct urls here.I just want to get google.com,bing.com,duckduckgo.com .
Well can text the syntax, but this should work. Just use a script to split your url string.
{
"aggs": {
"urls": {
"terms": {
"field": "id",
"script" : "def path = doc['id'].value; int currentSplit = path.indexOf("//"); if (currentSplit > 0) { path = path.substring(currentSplit + 1); currentSplit = path.indexOf("/"); if (currentSplit > 0) { path = path.substring(0, currentSplit) } } return path"
}
}
}
}
The best practice should be to index the domain name on the document if you need this aggregation a lot :).

Couchbase full-text search and compound keys

I have the following data in Couchbase:
Document 06001:
{
"type": "box",
"name": "lxpag",
"number": "06001",
"materials": [
{
"type": "material",
"number": "070006",
"name": "hosepipe"
},
{
"type": "material",
"number": "080006",
"name": "Philips screw 4mm"
},
}
Document 12345:
{
"type": "material",
"number": "12345",
"name": "Another screw"
}
Now I want to be able to query by type and name or number: for a given query type only the documents with the respective type property shall be returned. Furthermore, a second query string specifies which kinds of materials should be searched for. If a material's id or name contains (not starts with) the search term, it shall be included. If one of the materials inside a box matches the term accordingly, the whole box shall be included.
What I have come up with is:
function (doc, meta) {
if (doc.type === 'box' && Array.isArray(doc.materials)) {
var queryString = "";
for (i = 0; i < doc.materials.length; ++i) {
var material = doc.materials[i];
if (material.name && material.number) {
queryString += " " + material.name + " " + material.number;
}
}
emit([doc.type, queryString], doc);
} else if (doc.type === 'material') {
var queryString = doc.name + " " + doc.number;
emit([doc.type, queryString], doc);
}
}
I see that this view might not be fit for substring searches (Do I need ElasticSearch for this?). Nevertheless, when I use the following query parameters:
startKey=["box","pag"]&endKey=["box\u02ad","pag\u02ad"]
...not only do I get the box but also all other documents that are returned by the view. Thus, with these keys, nothing is filtered. On the other hand, searching by key works.
How is this possible?
There is no good way of doing substring search with view keys. Your options are either integrating with ElasticSearch, or using N1QL, which lets you do wildcard string matches: "SELECT * FROM bucket WHERE type = 'material' and name LIKE '%screw%'"
I just saw the flaw in the queries: the parameters must be written in lowercase, otherwise they are not recognized by Couchbase and ignored (it would be really helpful if I got an error message here instead of the usual result list...). So instead, I have to query with
startKey=["box","pag"]&endKey=["box\u02ad","pag\u02ad"]
What I have not precisely found out so far is how to manage the substring search. Since pag is a substring of lxpag above query would not return any results. Any ideas no this matter?

Elasticsearch - Create field using script if doesn't exist

Is there a way to dynamically add fields using scripts? I am running a script that checks whether a field exists. If not then creates it.
I'm trying out:
script: 'if (ctx._source.attending == null) { ctx._source.attending = { events: newField } } else if (ctx._source.attending.events == null) { ctx._source.attending.events = newField } else { ctx._source.attending.events += newField }'
Except unless I have a field in my _source explicitly named attending in my case, I get:
[Error: ElasticsearchIllegalArgumentException[failed to execute script];
nested: PropertyAccessException[
[Error: could not access: attending; in class: java.util.LinkedHashMap]
To check whether a field exists use the ctx._source.containsKey function, e.g.:
curl -XPOST "http://localhost:9200/myindex/message/1/_update" -d'
{
"script": "if (!ctx._source.containsKey(\"attending\")) { ctx._source.attending = newField }",
"params" : {"newField" : "blue" },
"myfield": "data"
}'
I would consider if it's really necessary to see if the field exists at all. Just apply the new mapping to ES and it will add it if it's required and do nothing if it already exists.
Our system re-applies the mappings on every application startup.

Resources