Is there a way to "escape" ElasticSearch stop words? - elasticsearch

I am fairly new to ElasticSearch and have a question on stop words. I have an index that contains state names for the USA....ex: New York/NY, California/CA,Oregon/OR. I believe Oregon's abbreviation, 'OR' is a stop word, so when I insert the state data into the index, I cannot search on 'OR'. Is there a way I can set up custom stopwords for this or am I doing something wrong?
Here is how I am building the index:
curl -XPUT http://localhost:9200/test/state/1 -d '{"stateName": ["California","CA"]}'
curl -XPUT http://localhost:9200/test/state/2 -d '{"stateName": ["New York","NY"]}'
curl -XPUT http://localhost:9200/test/state/3 -d '{"stateName": ["Oregon","OR"]}'
A search for 'NY', works fine. Ex:
curl -XGET 'http://localhost:9200/test/state/_search?pretty=1' -d '
{
"query" : {
"match" : {
"stateName" : "NY"
}
}
}'
But a search for 'OR', returns zero hits:
curl -XGET 'http://localhost:9200/test/state/_search?pretty=1' -d '
{
"query" : {
"match" : {
"stateName" : "OR"
}
}
}'
I believe this search returns no results because OR is stop word, but I don't know how to work around this. Thanks for you help.

You can (and definitely should) control the way you index data by modifying your mapping according to your data and the way you want to search against it.
In your case I would disable stopwords for that specific field rather than modifying the stopword list, but you could do the latter too if you wish to. The point is that you're using the default mapping which is great to start with, but as you can see you need to tweak it depending on your needs.
For each field, you can specify what analyzer to use. An analyzer defines the way you split your text into tokens (tokenizer) that will be indexed and also additional changes you can make to each token (even remove or add new ones) using token filters.
You can specify your mapping either while creating your index or update it afterwards using the put mapping api (as long as the changes you make are backwards compatible).

Related

Elastic search simple query to find all ids

I am trying to get all id's for a type, but I am pulling my hair out.
Please see my attacment.
HERE IS THE cURL call :
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json'
-d'{ "query": { "wildcard" : { "id" : "Account[enter image description here][1]*" } }}'
cURL call with no results
I would guess there is an issue with the way your id-field is analyzed. You can retrieve the mapping by using the _mapping endpoint (described in the docs). Your id field should be analyzed as a string (with break characters, tokenizers and all) for the wildcard query to work. If it is not analyzed, as you might expect for an id-field, the wildcard query will not work. Then you would need to change the mapping and reindex your data to make it work.

ElasticSearch - How to make a 1-to-1 copy of an existing index

I'm using Elasticsearch 2.3.3 and trying to make an exact copy of an existing index. (using the reindex plugin bundled with Elasticsearch installation)
The problem is that the data is copied but settings such as the mapping and the analyzer are left out.
What is the best way to make an exact copy of an existing index, including all of its settings?
My main goal is to create a copy, change the copy and only if all went well switch an alias to the copy. (Zero downtime backup and restore)
In my opinion, the best way to achieve this would be to leverage index templates. Index templates allow you to store a specification of your index, including settings (hence analyzers) and mappings. Then whenever you create a new index which matches your template, ES will create the index for you using the settings and mappings present in the template.
So, first create an index template called index_template with the template pattern myindex-*:
PUT /_template/index_template
{
"template": "myindex-*",
"settings": {
... your settings ...
},
"mappings": {
"type1": {
"properties": {
... your mapping ...
}
}
}
}
What will happen next is that whenever you want to index a new document in any index whose name matches myindex-*, ES will use this template (+settings and mappings) to create the new index.
So say your current index is called myindex-1 and you want to reindex it into a new index called myindex-2. You'd send a reindex query like this one
POST /_reindex
{
"source": {
"index": "myindex-1"
},
"dest": {
"index": "myindex-2"
}
}
myindex-2 doesn't exist yet, but it will be created in the process using the settings and mappings of index_template because the name myindex-2 matches the myindex-* pattern.
Simple as that.
The following seems to achieve exactly what I wanted:
Using Snapshot And Restore I was able to restore to a different index:
POST /_snapshot/index_backup/snapshot_1/_restore
{
"indices": "original_index",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "original_index",
"rename_replacement": "replica_index"
}
As far as I can currently tell, it has accomplished exactly what I needed.
A 1-to-1 copy of my original index.
I also suspect this operation has better performance than re-indexing for my purposes.
I'm facing the same issue when using the reindex API.
Basically I'm merging daily, weekly, monthly indices to reduce shards.
We have a lot of indices with different data inputs, and maintaining a template for all cases is not an option. Thus we use dynamic mapping.
Due to dynamic mapping the reindex process can produce conflicts if your data is complicated, say json stored in a string field, and the reindexed field can end up as something else.
Sollution:
Copy the mapping of your source index
Create a new index, applying the mapping
Disable dynamic mapping
Start the reindex process.
A script can be created, and should of course have error checking in place.
Abbreviated scripts below.
Create a new empty index with the mapping from an original index.:
#!/bin/bash
SRC=$1
DST=$2
# Create a temporary file for holding the SRC mapping
TMPF=$(mktemp)
# Extract the SRC mapping, use `jq` to get the first record
# write to TMPF
curl -f -s "${URL:?}/${SRC}/_mapping | jq -M -c 'first(.[])' > ${TMPF:?}
# Create the new index
curl -s -H 'Content-Type: application/json' -XPUT ${URL:?}/${DST} -d #${TMPF:?}
# Disable dynamic mapping
curl -s -H 'Content-Type: application/json' -XPUT \
${URL:?}/${DST}/_mapping -d '{ "dynamic": false }'
Start reindexing
curl -s -XPOST "${URL:?}" -H 'Content-Type: application/json' -d'
{
"conflicts": "proceed",
"source": {
"index": "'${SRC}'"
},
"dest": {
"index": "'${DST}'",
"op_type": "create"
}
}'

Elasticsearch view indexed data

I have filter which replace characters
char_filter:
lt_characters:
type: mapping
mappings: ["a=>bbbbbb", "c=>tttttt", "ddddddd=>k" ]
I'm add this filter to index, now how to check does this filter work, where I can found indexed data ?
I mean exactly view replacments.
To see what tokens are created with your char_filter you can use the Analyze API.
curl -XGET 'localhost:9200/_analyze?char_filters=lt_characters' -d 'this is a test'

Why are Elasticsearch aliases not unique

The Elasticsearch documentation describes aliases as feature to reindex data with zero downtime:
Create a new index and index the whole data
Let your alias point to the new index
Delete the old index
This would be a great feature if aliases would be unique but it's possible that one alias points to multiple indexes. Considering that maybe the deletion of the old index fails my application might speak to two indexes which might not be in sync. Even worse: the application doesn't know about that.
Why is it possible to reuse an alias?
It allows you to easily have several indexes that are both used individually and together with other indexes. This is useful for example when having a logging index where sometimes you want to query the most recent (logs-recent alias) and sometimes want to query everything (logs alias). There are probably lots of other use cases but this one pops up as the first for me.
As per the documentation you can send both the remove and add in one request:
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}'
After that succeeds you can remove your old index and if that fails you will just have an extra index taking up some space until its cleaned out.

Substring and similarity matching in elasticsearch

I am learning to use elastisearch as alternative for database queries and I am not able to perform substring matches on the built index.
The mapping I have used to create index is
"mappings" : {
"user" : {
"properties" : {
"name" : {"type": "string"},
"specialty" : {"type": "string" ,"analyzer":"snowball"},
"address : {"type": "string" ,"analyzer":"snowball"}
}
}
}
The document I am indexing is
{
"name" : "John Doe",
"speciality": ["pediatrician","Child Doctor"],
"address": ["#123 park road Abbeyville","#423 park road AbbeyTown" ]
}
When I perform a search like
curl -XGET localhost:9200/test/user/_search?q=speciality:pediatrician
I get the correct document
However when I search strings like
curl -XGET localhost:9200/test/user/_search?q=speciality:pedia
curl -XGET localhost:9200/test/user/_search?q=speciality:pediatricians
No results are retrieved
P.S I know that wild cards can be used for matching but I need to be able to search for both the whole word and parts of the words based on user input so as to return the most relevant documents.
Did you try reindexing after changing the mapping? Also try setting the search analyzer to snowball in the settings.
UPDATE:
You can go for wild card search. Better go for trailing wild card search alone instead of both leading and trailing wild card search.
curl -XGET localhost:9200/test/user/_search?q=speciality:pedia*
curl -XGET localhost:9200/test/user/_search?q=speciality:pediatricians*

Resources