Elasticsearch view indexed data - elasticsearch

I have filter which replace characters
char_filter:
lt_characters:
type: mapping
mappings: ["a=>bbbbbb", "c=>tttttt", "ddddddd=>k" ]
I'm add this filter to index, now how to check does this filter work, where I can found indexed data ?
I mean exactly view replacments.

To see what tokens are created with your char_filter you can use the Analyze API.
curl -XGET 'localhost:9200/_analyze?char_filters=lt_characters' -d 'this is a test'

Related

Elasticsearch - check which all fields are indexed in an index?

I browse to curl -XGET 'http://localhost:9200/<index_name>/_mappings and it returns the fields in an index.
This is one of the field.
{"dob" : { "type": "string", "analyzer" : "my_custom_analyzer"}}
With above response, does this mean DOB field is by default indexed? or index: true has to be explicitly there for this field to be indexed?
It seems you're using a very old version of Elasticsearch, likely 2.x or earlier.
However, based on the mapping you've shared, string fields are indexed by default, in your case, dob is analyzed by a custom analyzer called my_custom_analyzer and the resulting tokens will be indexed automatically.

Elastic search simple query to find all ids

I am trying to get all id's for a type, but I am pulling my hair out.
Please see my attacment.
HERE IS THE cURL call :
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json'
-d'{ "query": { "wildcard" : { "id" : "Account[enter image description here][1]*" } }}'
cURL call with no results
I would guess there is an issue with the way your id-field is analyzed. You can retrieve the mapping by using the _mapping endpoint (described in the docs). Your id field should be analyzed as a string (with break characters, tokenizers and all) for the wildcard query to work. If it is not analyzed, as you might expect for an id-field, the wildcard query will not work. Then you would need to change the mapping and reindex your data to make it work.

Can't deal with accents in Elasticsearch indexing and search

I have an issue with elasticsearch and the way the data are indexed/retrieved. I don't understand what happens.
This is the mapping I use (sorry, it's yaml format) :
The idea is simple, in theory... I have a string analyzer with lowercase and asciifolding filters. I don't want to care about case or accents, and I would like to use this analyzer to index and search.
settings:
index:
analysis:
filter:
autocomplete_filter:
type: edgeNGram
side: front
min_gram: 1
max_gram: 20
analyzer:
autocomplete:
type: custom
tokenizer: standard
filter: [lowercase, asciifolding, autocomplete_filter]
string_analyzer:
type: custom
tokenizer: standard
filter: [lowercase, asciifolding]
types:
city:
mappings:
cityName:
type: string
analyzer: string_analyzer
search_analyzer: string_analyzer
location: {type: geo_point}
When I run this query :
{
"query": {
"prefix":{
"cityName":"per"
}
}
,
"size":20
}
I get some results like "Perpezat", "Pern", "Péreuil" which is the excepted result.
But if I run the following query :
{
"query": {
"prefix":{
"cityName":"pér"
}
}
,
"size":20
}
Then I get no result at all.
If you have any clue or help, I would be happy to know it.
Thanks
In the Prefix Query, your search input is not analyzed like in other cases:
Matches documents that have fields containing terms with a specified prefix (not analyzed)
Your first example works because the documents are analyzed at index time using your analyzer with lowercase and asciifolding, so they contain a term starting with per (perpezat, pern, pereuil).
Your second example does not work because those documents don't contain any terms starting with pér.
Since I couldn't find a way to tell Elasticsearch to analyze the prefix before performing the search, you could achieve your goal by manually adding this step:
Ask Elastisearch to analyze your input calling the Analyze API
Use the output from step 1 (it should be per in the examples) for the prefix query
For this to work, your search input should be a single term (I think that could be why Elasticsearch doesn't want to analyze it in the first place)
#mario-trucco Finally, I've found this post that explains a better way to analyze the strings.
What is an effective way to search world-wide location names with ElasticSearch?
Of course it doesn't answer my initial question and I still don't understand what happened, but it solves my problem by removing it.
Thanks again for your help and time.

How to search fields with '-' characters in elastic search

I am new to elastic search. I have got following document where one of the field "eventId" has "-" in value.
When i try to search with complete value of eventId, i don't get any results.
Sample Document app/event
{
"tags": {}
"eventId": "cc98d57b-c6bc-424c-b54c-df1e3df0d942",
}
I haven't created any explicit settings for my index.
Thanks.
you should check if the tokenizer splits your value into multiple fields. Maybe your value is stored as 5 fields: "cc98d57b", "c6bc", "424c", "b54c" and "df1e3df0d942"
You can analyze that with the 'Kopf' Plugin (https://github.com/lmenezes/elasticsearch-kopf).
If that is your problem you should change your field mapping, so that the value is not analyzed ("index" : "not_analyzed").
For an example how to set that mapping see here: Elasticsearch mapping settings 'not_analyzed' and grouping by field in Java
After that, you should be able to search for your specific value.

Is there a way to "escape" ElasticSearch stop words?

I am fairly new to ElasticSearch and have a question on stop words. I have an index that contains state names for the USA....ex: New York/NY, California/CA,Oregon/OR. I believe Oregon's abbreviation, 'OR' is a stop word, so when I insert the state data into the index, I cannot search on 'OR'. Is there a way I can set up custom stopwords for this or am I doing something wrong?
Here is how I am building the index:
curl -XPUT http://localhost:9200/test/state/1 -d '{"stateName": ["California","CA"]}'
curl -XPUT http://localhost:9200/test/state/2 -d '{"stateName": ["New York","NY"]}'
curl -XPUT http://localhost:9200/test/state/3 -d '{"stateName": ["Oregon","OR"]}'
A search for 'NY', works fine. Ex:
curl -XGET 'http://localhost:9200/test/state/_search?pretty=1' -d '
{
"query" : {
"match" : {
"stateName" : "NY"
}
}
}'
But a search for 'OR', returns zero hits:
curl -XGET 'http://localhost:9200/test/state/_search?pretty=1' -d '
{
"query" : {
"match" : {
"stateName" : "OR"
}
}
}'
I believe this search returns no results because OR is stop word, but I don't know how to work around this. Thanks for you help.
You can (and definitely should) control the way you index data by modifying your mapping according to your data and the way you want to search against it.
In your case I would disable stopwords for that specific field rather than modifying the stopword list, but you could do the latter too if you wish to. The point is that you're using the default mapping which is great to start with, but as you can see you need to tweak it depending on your needs.
For each field, you can specify what analyzer to use. An analyzer defines the way you split your text into tokens (tokenizer) that will be indexed and also additional changes you can make to each token (even remove or add new ones) using token filters.
You can specify your mapping either while creating your index or update it afterwards using the put mapping api (as long as the changes you make are backwards compatible).

Resources