python code fails to create correct mapping - elasticsearch

The following curl command works as expected. It returns the correct mapping but python code returns blank.
curl -X PUT localhost:9200/geotest/
curl -X PUT localhost:9200/geotest/geotest/_mapping -d '{
"geotest": {
"properties": {
"location": {
"type": "geo_point",
"lat_lon": true,
"geohash": true
}
}
}
}'
curl -XGET localhost:9200/geotest/_mapping
{"geotest":{"mappings":{"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":true,"geohash":true}}}}}}
I expect this python code to be same as above...
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest', body = mymapping)
curl -XGET localhost:9200/geotest/_mapping
{"geotest":{"mappings":{}}}
Why does python code does not create correct mapping the way curl does?
Update:
Using the put_mapping method I am not able to create wikipedia content index.
import urllib2
myfile=urllib2.urlopen('https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json').read()
import ast
myfile1=ast.literal_eval(myfile)['content']['page']['properties']
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
es.indices.delete(index ='enwiki_todel')
es.indices.create(index ='enwiki_todel')
es.indices.put_mapping(index ='enwiki_todel', doc_type='page', body = myfile1)
update 2
I tried to keep only content using ast module. And still getting mapper parsing exception.
import urllib2
myfile=urllib2.urlopen('https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json').read()
import ast
myfile1=ast.literal_eval(myfile)['content']
import elasticsearch
es = elasticsearch.Elasticsearch('http://ec2-52-91-179-95.compute-1.amazonaws.com:9200/')
es.indices.delete(index ='enwiki_todel')
es.indices.create(index ='enwiki_todel')
es.indices.put_mapping(index ='enwiki_todel', doc_type='page', body = myfile1)

You're almost there. If you want to create an index with a mapping in one shot, you need to use the "mappings": {} structure in the body of your create index call. Like this:
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"mappings": {"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}}
^
|
enclose your mapping in "mappings"
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest', body = mymapping)
An alternate solution is to use put_mapping after the call to create and you'll be able to use the same structure you initially had, i.e. without the "mappings: {} structure.
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest')
es.indices.put_mapping(index = 'geotest', body = mymapping)

Related

elasticsearch.helpers.BulkIndexError: 500 document(s) failed to index

I'm very new to this so I would appreciate help very much
Code is below:
import pandas as pd
from elasticsearch import Elasticsearch
from elasticsearch import helpers
elastic_user = "elastic"
elastic_password = "pass"
SOURCE = 'netflix_titles.csv'
netflix_df = pd.read_csv(SOURCE)
elastic_client = Elasticsearch("https://localhost:9200",verify_certs=False,basic_auth=(elastic_user,elastic_password))
def doc_generator(df):
df_iter = df.iterrows()
for index, document in df_iter:
yield {
"_index": "netflix_shows",
"_source": document,
}
helpers.bulk(elastic_client, doc_generator(netflix_df))
When I try to push the df into the index I get:
elasticsearch.helpers.BulkIndexError: 500 document(s) failed to index.

elasticsearch wrapper query not works with base64 encoded string

elasticsearch wrapper query not works with base64 encoded string
ES version : 5.2.3
To encode i have used base64:
char[] data = Base64Coder.encode(text.getBytes());
return data.ToString();
Note: text is the underline json query.
query:
curl -d XPOST 'http://localhost:9200/entitymaster_qa_t4/_search' -d '{
"query" : {
"wrapper" : {
"query" : "W0NAMTZiN2MzYw=="
}
}
}'
Response:
{"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"entitymaster_qa_t4","node":"8WVaVr9ATmaqOPDHGpNyHw","reason":{"type":"parse_exception","reason":"Failed to derive xcontent"}}]},"status":400}
The wrapper query appeared in ES 6.0 according to the docs, so if you want to use it you need to update your version. Also, the base64 string must decode into a valid query, not just a piece of data.

elasticsearch-pyspark : not getting specific fields from document (getting all fields) even after specifying with spark

I'm trying to extract some data from elasticsearch with pyspark. I want to extract only few fields (not all) from the documents. So, I'm making a post request from the software "Postman" (for testing purpose) with following url and body.It's giving perfect output as expected. But when I'm using same body with spark code, it's extracting all the fields from the specified documents which is not desired. Can anyone tell what might be the reason for such weird behavior ? Thanks in advance !
Spark version 2.3, Elasticsearch version 6.2, postman body type = application/json
This is what I'm doing with postman :
`url : localhost:9200/test-index4/school/_search`
`body :
{
"query":
{
"ids":
{
"values":["8","9","10"]
}
},
"_source":
{
"includes":["name"]
}
}`
Below is what I'm doing with pyspark :
`body = "{"query":{"ids":{"values":["8","9","10"]}},"_source":{"includes":["name"]}}"
df = self.__sql_context.read.format("org.elasticsearch.spark.sql") \
.option("es.nodes", "localhost") \
.option("es.port", "9200") \
.option("es.query", body) \
.option("es.resource", "test-index4/school") \
.option("es.read.metadata", "true") \
.option("es.read.metadata.version", "true") \
.option("es.read.field.as.array.include", "true") \
.load()
`
Try setting es.read.field.include in config with value as comma seperated field list.
e.g. "es.read.field.include","field1,field2,..."

Elasticsearch Multiple indice wilcard querystring not working

In the Current [5.0] elasticsearch doc it was said that
All multi indices API support the following url query string :
ignore_unavailable and allow_no_indices
I delete all exiting indice and try to create a new one with mapping
curl -XDELETE "http://elastic:elastic#127.0.0.1:9200/mail-*?pretty=true"
curl -XPUT "http://elastic:elastic#127.0.0.1:9200/mail-*?ignore_unavailable=true&pretty=true" -d ' {
"mappings": {
"ex": {
"properties": {
...
I got this error :
"request [/mail-*] contains unrecognized parameter: [ignore_unavailable]"
i need to create this mapping because index are created by logstash with a new index every day index => "mail-%{+YYYY.MM.dd}"
if i remove the wilcard in indice name it works !
why i need to do this beacause i use the geoip filter in logstash but the geoip.location is not in the type "geo_point" and kibana tile map doesnt work without this

Using python how can I register specific mapping definition?

In the elasticsearch-py docs I can't find an example registering a mapping, which performs what these REST API docs say:
The put mapping API allows to register specific mapping definition for a specific type.
$ curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{
"tweet" : {
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
'
Doing anything with the index involves the Indices APIs. PUTing mapping is also one of many Indices API. They can be found in the Indices section under API Documentation of the Python client docs.
And you need this: put_mapping(*args, **kwargs).
Here is a complete example :
from elasticsearch import Elasticsearch
def fMap(document_type):
mapping={document_type:{\
"properties":{\
"a_field":{"type":"integer","store":"yes"},\
"other_field": {"type":"string","index":"analyzed","store":"no"},\
"title":{"type":"string","store":"yes","index": "analyzed","term_vector":"yes","similarity":"BM25"},\
"content":{"type":"string","store":"yes","index": "analyzed","term_vector": "yes","similarity":"BM25"}\
}}}
return mapping
def dSetting(nShards,nReplicas):
dSetting={
"settings":{"index":{"number_of_shards":nShards,"number_of_replicas":nReplicas}},\
"analysis":{\
"filter":{\
"my_english":{"type":"english","stopwords_path":"english.txt"},\
"english_stemmer":{"type":"stemmer","language":"english"}},\
"analyzer":{"english":{"filter":["lowercase","my_english","english_stemmer"]}}}}
return dSetting
def EsSetup(con,idx,docType,dset,mapping):
con.indices.delete(index=idx,ignore=[400, 404])
con.indices.create(index=idx,body=dset,ignore=400)
con.indices.put_mapping(index=idx,doc_type=docType,body=mapping)
conEs=Elasticsearch([{'host':'localhost','port':9200,'timeout':180}])
dMap = fMap('docTypeName')
dSet = dSetting(8,1)
EsSetup(conEs,'idxName','docTypeName',dset,dMap)

Resources