filter with special character in ElasticSearch 6.0.0 - elasticsearch

I am trying to filter all data which contains some special character like '#', '.','/' etc. But not able to succeed.
I am willing to fetch the city which contains the # or dot(.), so i need a query which provide me the output that contains the special character.
I am quite new here in Elasticsearch query. So please help me.
Thanks
Below is index:
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "student",
"_type" : "data",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Mirja",
"city" : "pune # bandra",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Rohan",
"city" : "BBSR /. patia",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Diya",
"city" : "pune_bandra",
"contact number" : 9723124343
}
}
}
]
}
}```

You need to check the analyzer on your city field. If it's standard analyzer, it will remove special characters when creating tokens. Instead use the below mapping on city field and search using a regular match query
PUT test_index
{
"mappings": {
"properties": {
"city": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}

Related

Is it possible to extract the stored value of a keyword field when _source is disabled in Elasticsearch 7

I have the following index:
{
"articles_2022" : {
"mappings" : {
"_source" : {
"enabled" : false
},
"properties" : {
"content" : {
"type" : "text",
"norms" : false
},
"date" : {
"type" : "date"
},
"feed_canonical" : {
"type" : "boolean"
},
"feed_id" : {
"type" : "integer"
},
"feed_subscribers" : {
"type" : "integer"
},
"language" : {
"type" : "keyword",
"doc_values" : false
},
"title" : {
"type" : "text",
"norms" : false
},
"url" : {
"type" : "keyword",
"doc_values" : false
}
}
}
}
}
I have a very specific one-time need and I want to extract the stored values from the url field for all documents. Is this possible with Elasticsearch 7? Thanks!
Since in your index mapping, you have defined url field as of keyword type and have "doc_values": false. Therefore you cannot perform terms aggregation on this.
As far as I can understand your question, you only need to get the value of the of the url field in several documents. For that you can use exists query
Adding a working example
Index Mapping:
PUT idx1
{
"mappings": {
"properties": {
"url": {
"type": "keyword",
"doc_values": false
}
}
}
}
Index Data:
POST idx1/_doc/1
{
"url":"www.google.com"
}
POST idx1/_doc/2
{
"url":"www.youtube.com"
}
Search Query:
POST idx1/_search
{
"_source": [
"url"
],
"query": {
"exists": {
"field": "url"
}
}
}
Search Response:
"hits" : [
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"url" : "www.google.com"
}
},
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"url" : "www.youtube.com"
}
}
]
As your
"_source" : { "enabled" : false }
You can add mapping "store:true" for the field that you want to extract value of.
As
PUT indexExample2
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"url": {
"type": "keyword",
"doc_values": false,
"store": true
}
}
}
}
Now once you index data, #ESCoder Thanks for example.
POST indexExample2/_doc/1
{
"url":"www.google.com"
}
POST indexExample2/_doc/2
{
"url":"www.youtube.com"
}
You can extract only the stored field in your search queries, even if _source is disabled.
POST indexExample2/_search
{
"query": {
"exists": {
"field": "url"
}
},
"stored_fields": ["url"]
}
This will o/p as:
"hits" : [
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"url" : [
"www.google.com"
]
}
},
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"url" : [
"www.youtube.com"
]
}
}
]

Adding a new document to a separate index using Elasticsearch processors

Is there a way to populate a separate index when I index some document(s)?
Let's assume I have something like:
PUT person/_doc/1
{
"name": "Jonh Doe",
"languages": ["english", "spanish"]
}
PUT person/_doc/2
{
"name": "Jane Doe",
"languages": ["english", "russian"]
}
What I want is that every time a person is added, a language is added to a language index.
Something like:
GET languages/_search
would give:
...
"hits" : [
{
"_index" : "languages",
"_type" : "doc",
"_id" : "russian",
"_score" : 1.0,
"_source" : {
"value" : "russian"
}
},
{
"_index" : "languages",
"_type" : "doc",
"_id" : "english",
"_score" : 1.0,
"_source" : {
"value" : "english"
}
},
{
"_index" : "languages",
"_type" : "doc",
"_id" : "spanish",
"_score" : 1.0,
"_source" : {
"value" : "spanish"
}
}
...
Thinking of pipelines, but I don't see any processor that allow such a thing.
Maybe the answer is to create a custom processor. I have one already, but not sure how could I insert a document in a separate index there.
Update: Use transforms as described in #Val answer works, and seems to be the right answer indeed...
However, I am using Open Distro for Elasticsearch and transforms are not available there. Some alternative solution that works there would be greatly appreciated :)
Update 2: Looks like OpenSearch is replacing Open Distro for Elasticsearch. And there is a transform api \o/
Each document entering an ingest pipeline cannot be cloned or split like it is doable in Logstash for instance. So from a single document, you cannot index two documents.
However, just after indexing your person documents, it's definitely possible to hit the _transform API endpoint and create the languages index from the person one:
First create the transform:
PUT _transform/languages-transform
{
"source": {
"index": "person"
},
"pivot": {
"group_by": {
"language": {
"terms": {
"field": "languages.keyword"
}
}
},
"aggregations": {
"count": {
"value_count": {
"field": "languages.keyword"
}
}
}
},
"dest": {
"index": "languages",
"pipeline": "set-id"
}
}
You also need to create the pipeline that will set the proper ID for your language documents:
PUT _ingest/pipeline/set-id
{
"processors": [
{
"set": {
"field": "_id",
"value": "{{language}}"
}
}
]
}
Then, you can start the transform:
POST _transform/languages-transform/_start
And when it's done you'll have a new index called languages whose content is
GET languages/_search
=>
"hits" : [
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "english",
"_score" : 1.0,
"_source" : {
"count" : 4,
"language" : "english"
}
},
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "russian",
"_score" : 1.0,
"_source" : {
"count" : 2,
"language" : "russian"
}
},
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "spanish",
"_score" : 1.0,
"_source" : {
"count" : 2,
"language" : "spanish"
}
}
]
Note that you can also set that transform on schedule so that it runs regularly, or you can run it manually whenever suits you, to rebuild the languages index.
OpenSearch has its own _transform API. It works slightly different, the transform could be created this way:
PUT _plugins/_transform/languages-transform
{
"transform": {
"enabled": true,
"description": "Insert languages",
"schedule": {
"interval": {
"period": 1,
"unit": "minutes"
}
},
"source_index": "person",
"target_index": "languages",
"data_selection_query": {
"match_all": {}
},
"page_size": 1,
"groups": [{
"terms": {
"source_field": "languages.keyword",
"target_field": "value"
}
}]
}
}
You will just need to change your _index field name in the ingest pipeline:
{
"description" : "sets the value of count to 1",
"set": {
"if": "[*your condition here*]",
"field": "_index",
"value": "languages",
"override": true
}
}

Why can't I get Elasticsearch's completion suggester to sort based on a field?

I'm trying to get autocomplete suggestions from Elasticsearch, but sorted by an internal popularity score that I supply in the data, so that the most popular ones show at the top. My POST looks like this:
curl "http://penguin:9200/node/_search?pretty" --silent --show-error \
--header "Content-Type: application/json" \
-X POST \
-d '
{
"_source" : [
"name",
"popular_score"
],
"sort" : [ "popular_score" ],
"suggest" : {
"my_suggestion" : {
"completion" : {
"field" : "searchbar_suggest",
"size" : 10,
"skip_duplicates" : true
},
"text" : "f"
}
}
}
'
I get back valid autocomplete suggestions, but they aren't sorted by the popular_score field:
{
...
"suggest" : {
"my_suggestion" : [
{
"text" : "f",
"offset" : 0,
"length" : 1,
"options" : [
{
"text" : "2020 Fact Longlist",
"_index" : "node",
"_type" : "_doc",
"_id" : "245105",
"_score" : 1.0,
"_source" : {
"popular_score" : "35",
"name" : "2020 Fact Longlist"
}
},
{
"text" : "Fable",
"_index" : "node",
"_type" : "_doc",
"_id" : "125903",
"_score" : 1.0,
"_source" : {
"popular_score" : "69.33333333333333333333333333333333333333",
"name" : "Fable"
}
},
{
"text" : "Fables",
"_index" : "node",
"_type" : "_doc",
"_id" : "172986",
"_score" : 1.0,
"_source" : {
"popular_score" : "24",
"name" : "Fables"
}
}
...
]
}
]
}
}
My mappings are:
{
"mappings": {
"properties": {
"nodeid": {
"type": "integer"
},
"name": {
"type": "text",
"copy_to": "searchbar_suggest"
},
"popular_score": {
"type": "float"
},
"searchbar_suggest": {
"type": "completion"
}
}
}
}
What am I doing wrong?

Returning all documents when query string is empty

Say I have the following mapping:
{
'properties': {
{'title': {'type': 'text'},
{'created': {'type': 'text'}}
}
}
Sometimes the user will query by created, and sometimes by title and created. In both cases I want the query JSON to be as similar as possible. What's a good way to create a query that filters only by created when the user is not using the title to query?
I tried something like:
{
bool: {
must: [
{range: {created: {gte: '2010-01-01'}}},
{query: {match_all: {}}}
]
}
}
But that didn't work. What would be the best way of writing this query?
Your query didn't work cause created is of type text and not date, range queries on string dates will not work as expected, you should change your mappings from type text to date and reindex your data.
Follow this to reindex your data (with the new mappings) step by step.
Now if I understand correctly you want to use a generic query which filters title or/and created depending on the user input.
In this case, my suggestion is to use Query String.
An example (version 7.4.x):
Mappings
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"created": { -------> change type to date instead of text
"type": "date"
}
}
}
}
Index a few documents
PUT my_index/_doc/1
{
"title":"test1",
"created": "2010-01-01"
}
PUT my_index/_doc/2
{
"title":"test2",
"created": "2010-02-01"
}
PUT my_index/_doc/3
{
"title":"test3",
"created": "2010-03-01"
}
Search Query (created)
GET my_index/_search
{
"query": {
"query_string": {
"query": "created:>=2010-02-01",
"fields" : ["created"]
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "test2",
"created" : "2010-02-01"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "test3",
"created" : "2010-03-01"
}
}]
Search Query (title)
GET my_index/_search
{
"query": {
"query_string": {
"query": "test2",
"fields" : ["title"]
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9808292,
"_source" : {
"title" : "test2",
"created" : "2010-02-01"
}
}
]
Search Query (title and created)
GET my_index/_search
{
"query": {
"query_string": {
"query": "(created:>=2010-02-01) AND test3"
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808292,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.9808292,
"_source" : {
"title" : "test3",
"created" : "2010-03-01"
}
}
]
fields in query string - you can mention both fields. if you remove fields then the query will apply on all fields in your mappings.
Hope this helps

Elasticsearch neglecting special characters

word search in elasticsearch is working fine, but it seems to neglect all special characters. For example, i have this data (123) apple and 123 pear, but when i query "(123)", i expect "(123) apple" to be the first that appear instead of "123 pear". I have tried to change tokeniser from standard tokenizer to whitespace tokenizer, but still not working. Kindly advice. Thanks!
Data:
(123) apple
123 pear
Query: "(123)"
Expected:
(123) apple
123 pear
Actual result:
123 pear
(123) apple
I tried with whitespace tokenizer, it worked
PUT /index25
{
"mappings": {
"properties": {
"message":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
Data:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 1.0,
"_source" : {
"message" : "123 pear"
}
},
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 1.0,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "(123)"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 0.47000363,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "123"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 0.9808292,
"_source" : {
"message" : "123 pear"
}
}
]

Resources