How to ingest an elastic search database definition file - elasticsearch

I got a file with some definitions for an elastic search database in the following format:
PUT /drafts
{
"settings": {
"max_result_window" : "100000"
}
}
PUT /drafts/draft/_mapping
{
"draft":{
"properties":{
"id":{
"type":"keyword"
},
"analysis_id":{
"type":"keyword"
}
}
}
}
PUT /variants
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase"]
}
}
},
"max_result_window" : "100000"
}
}
How can I ingest that into my elastic search server in bulk?
I tried the _bulk api call but that does not seem to work
curl localhost:9200/_bulk -d #file
I have seen this format in the elastic search tutorials but it never states how to run those files...Is it even possible

Bulk API is for indexing/ updating / deleting . Operations mentioned above are either altering mapping or adding settings to indexing. This is not possible with bulk API.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Related

Update Index Settings Analyzer in Java

I want to update filter in my analyzer, so saw this UpdateSettingsRequestBuilder but there we need to pass the whole updated settings string. Can we just pass the updated filter like in Elastic?
My Index Settings::
"settings": {
"number_of_shards": "3",
"index.mapping.total_fields.limit": 10000,
"analysis": {
"filter": {
"minimal_english": {
"type": "stemmer",
"language": "minimal_english"
},
"synonym_graph": {
"type": "synonym_graph",
"updateable": "true"
}
}
}
}
Elastic query:
PUT /test_index2/_settings
{
"analysis" : {
"filter": {
"synonym_graph": {
"type": "synonym_graph",
"updateable": "true",
"synonyms": ["i-phone, i phone => iphone"]
}
}
}
}
Is there a way we can pass just this filter as in elastic to update the filter in java.
Updating an index settings with the put settings API is currently not possible directly in Spring Data Elasticsearch API.
You can create such a request by yourself and send it to Elasticsearch using the ElasticsearchTemplate.execute() method:
ElasticsearchTemplate restTemplate; // injected
// setup the request
UpdateSettingsRequest updateSettingsRequest = new UpdateSettingsRequest();
AcknowledgedResponse response = restTemplate.execute(client ->
client.indices().putSettings(updateSettingsRequest, RequestOptions.DEFAULT));

Elasticsearch Text with Path Hierarchy vs KeyWord using Prefix query performance

I'm trying to achieve the best way to filter results based on folder hierarchies. We will use this to simulate a situation where we want to get all assets/documents in provided folder and all subfolders (recursive search).
So for example for such a structure
/someFolder/someSubfolder/1
/someFolder/someSubfolder/1/subFolder
/someFolder/someSubfolder/2
/someFolder/someSubfolder/2/subFolder
If we search for /someFolder/someSubfolder/1
We want to get as results
/someFolder/someSubfolder/1
/someFolder/someSubfolder/1/subFolder
Now I've found two ways to do this. Not sure which one would be better from performance perspective.
Use Text property with path_hierarchy Tokenizer
Use Keyword property and use Query prefix to get results
Both of the above seem to work as I want them to (unless I missed something). Not sure which one would be better. On one hand I've read that filtering should be done on Keywords. On the other hand path_hierarchy Tokenizer seems to be created exactly for these scenarios but we can only use it with Text field.
Below I prepared a sample code.
Create index and push some test data into it.
PUT test-index-2
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "path_hierarchy"
}
}
}
},
"mappings": {
"properties": {
"folderPath": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
POST test-index-2/_doc/
{
"folderPath": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces"
}
POST test-index-2/_doc/
{
"folderPath": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces/SomeTestValue/11"
}
Now both of below queries will return two results for matching partial path hierarchy.
1.
GET test-index-2/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "folderPath": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces" }}
]
}
}
}
GET test-index-2/_search
{
"query": {
"prefix" : { "folderPath.keyword": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces" }
}
}
Now the question would be: Which solution is better if we want to get a subset of results ?

Elastic Search - Not giving the correct search result when bulk inserting data

I would like to apply snowball analyser to search the data however today I got a very strange issue. Let me explain step by step.
I have created the below index using kibana development tool
PUT test_index_version_1
{
"settings": {
"number_of_shards": 5,
"analysis": {
"analyzer": {
"ana_tenderinfo": {
"tokenizer": "standard",
"filter": ["lowercase","snowball"]
}
}
}
},
"mappings": {
"properties": {
"workDesc" : {
"type": "text",
"analyzer": "ana_tenderinfo"
}
}
}
}
I have inserted dummy data using kibana
POST test_index_version_1/_doc
{
"workDesc" : "he works hard today"
}
POST test_index_version_1/_doc
{
"workDesc" : "I worked yesterday"
}
POST test_index_version_1/_doc
{
"workDesc" : "work"
}
POST test_index_version_1/_doc
{
"workDesc" : "I am working"
}
Now I search the result using the below query
GET test_index_version_1/_search
{
"query":
{
"bool":
{
"must":[
{
"intervals":
{
"workDesc":{
"match":{
"query":"working",
"max_gaps":5
}
}
}
}
]
}
},
"size":10
}
It gives me the expected result.
But search query is not giving the correct result when
a. I created an index using kibana
b. Transferred the bulk data from SQL server to elastic search using bulk API.
Even more strange thing is,
a. I created an index using kibana,
b. Added dummy data using kibana and
c. Transferred the bulk data from SQL server
d. Then the search query gives the proper result.
Can anyone give any explanation for the above strange behavior?

Elasticsearch - Do searches for alternative country codes

I have a document with a field called 'countryCode'. I have a term query that search for the keyword value of it. But having some issues with:
Some records saying UK and some other saying GB
Some records saying US and some other USA
And the list goes on..
Can I instruct my index to handle all those variations somehow, instead of me having to expand the terms on my query filter?
What you are looking for is a way to have your tokens understand similar tokens which may or may not be having similar characters. This is only possible using synonyms.
Elasticsearch provides you to configure your synonyms and have your query use those synonyms and return the results accordingly.
I have configured a field using a custom analyzer using synonym token filter. I have created a sample mapping and query so that you can play with it and see if that fits your needs.
Mapping
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"usa, us",
"uk, gb"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
},
"mappings": {
"mydocs": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_synonyms"
}
}
}
}
}
Sample Document
POST my_index/mydocs/1
{
"name": "uk is pretty cool country"
}
And when you make use of the below query, it does return the above document as well.
Query
GET my_index/mydocs/_search
{
"query": {
"match": {
"name": "gb"
}
}
}
Refer to their official documentation to understand more on this. Hope this helps!
Handling within ES itself without using logstash, I'd suggest using a simple ingest pipeline with gsub processor to update the field in it's place
{
"gsub": {
"field": "countryCode",
"pattern": "GB",
"replacement": "UK"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/master/gsub-processor.html

Change settings and mappings on existing index in Elasticsearch

I would like the following settings and mapping set on an already existing index in Elasticsearch:
{
"analysis": {
"analyzer": {
"dot-analyzer": {
"type": "custom",
"tokenizer": "dot-tokenizer"
}
},
"tokenizer": {
"dot-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
}
{
"doc": {
"properties": {
"location": {
"type": "string",
"index_analyzer": "dot-analyzer",
"search_analyzer": "keyword"
}
}
}
}
I have tried to add these two lines of code:
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
But this is the result:
org.elasticsearch.index.mapper.MapperParsingException: Analyzer [dot-analyzer] not found for field [location]
Anyone? Thanks a lot,
Stine
This seems to work:
if (client.admin().indices().prepareExists(Index).execute().actionGet().exists()) {
client.admin().indices().prepareClose(Index).execute().actionGet();
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings.string()).execute().actionGet();
client.admin().indices().prepareOpen(Index).execute().actionGet();
client.admin().indices().prepareDeleteMapping(Index).setType(Type).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
} else {
client.admin().indices().prepareCreate(Index).addMapping(Type, mapping).setSettings(settings).execute().actionGet();
}
If you look at your settings after sending the changes you'll notice that the analyzer is not there. In fact you can't change the analysis section of the settings on a live index. Better to create it with the desired settings, otherwise you can just close it:
curl -XPOST localhost:9200/index_name/_close
While the index is closed you can send the new settings. After that you can reopen the index:
curl -XPOST localhost:9200/index_name/_open
While the index is closed it doesn't use any cluster resource, but it is not readable nor writable. If you want to close and reopen the index using the Java API you can use the following code:
client.admin().indices().prepareClose(indexName).execute().actionGet();
//TODO update settings
client.admin().indices().prepareOpen(indexName).execute().actionGet();

Resources