I'm very new to this so I would appreciate help very much
Code is below:
import pandas as pd
from elasticsearch import Elasticsearch
from elasticsearch import helpers
elastic_user = "elastic"
elastic_password = "pass"
SOURCE = 'netflix_titles.csv'
netflix_df = pd.read_csv(SOURCE)
elastic_client = Elasticsearch("https://localhost:9200",verify_certs=False,basic_auth=(elastic_user,elastic_password))
def doc_generator(df):
df_iter = df.iterrows()
for index, document in df_iter:
yield {
"_index": "netflix_shows",
"_source": document,
}
helpers.bulk(elastic_client, doc_generator(netflix_df))
When I try to push the df into the index I get:
elasticsearch.helpers.BulkIndexError: 500 document(s) failed to index.
Related
I'm trying to do a bulk insert using elastic search py, but I don't want to specify a type, but it won't allow me to specify None or "" for the value of type. How can I get around this?
bulk_data = []
this_merchant_product = {'field1': 'value1'}
op_dict = {
"index": {
"_index": "product",
"_type": None,
"_id": str(this_merchant_product_id)
}
}
bulk_data.append(op_dict)
bulk_data.append(this_merchant_product)
es = Elasticsearch()
res = es.bulk(index='product', body=bulk_data)
I've also tried to set _type to "", but that doesn't work either.
These are the error messages.
This is the error when I set _type to None:
elasticsearch.exceptions.RequestError: RequestError(400, 'action_request_validation_exception', 'Validation Failed: 1: type is missing;
and this is the error I get when I set _type to "":
java.lang.IllegalArgumentException: name cannot be empty string
Each index has one mapping type in Elasticsearch 6.x+. In Elasticsearch 7.x+ type is removed. In version 2.x - 5.6 you could have used more then one mapping. Assuming that you’re using version 6.x, you need to have type of documents in index.
Can anybody knows the process to upload bulk data in elastic search, I am not able to upload data in elastic search.
Thanks
Gaurav Singh
from elasticsearch import helpers, Elasticsearch
data = [{'id':1, 'content':'some content'}]
INDEX_NAME = 'index'
TYPE = 'list'
def get_actions():
actions = []
for d in data:
action = {
'_op_type': 'update',
"_index": INDEX_NAME,
'_type': TYPE,
"_id": d['id'],
}
actions.append(action)
return(actions)
result = helpers.bulk(Elasticsearch(IP), get_actions())
I have filtered the bitrates from live streams and got the output below. I have constructed an API with Python and piped continuous data into influxdb, which should be monitored, like python api.py | python influx.py. However, I am unable to store this output into influxdb. If necessary I can show my API code.
Click here to show output to be stored in influxdb
#!usr/bin/python
import sys
import datetime
from influxdb import InfluxDBClient
from influxdb.client import InfluxDBClientError
from influxdb import DataFrameClient
import os
import time
client=InfluxDBClient('localhost',8086,'admin','admin',database='stackanm')
client.create_database('stackanm')
def store(bitrate,time):
json=[
{
"measurement":"bitrates",
"tags":{
"time":time,
"fields":{
"bitrate":bitrate
}
}
}
]
client.write_points(json,time_precision='u')
f = os.fdopen(sys.stdin.fileno(),'r',0)
for line in f:
elements = line.strip().split()
if len(elements) == 1:
bitrate = elements[0]
unixtime = elements[1].split('.')
stdtime = datetime.datetime.utcfromtimestamp(long(float(unixtime[1]))).strftime('%Y-%m-%dT%H:%M:%S')
influxtime = ".".join([stdtime,unixtime[1]])
store(bitrate,float(elements[1]),influxtime)
You probably solved this by now but I just stumbled across it.
Are you sure that time is supposed to be listed as a tag?
I've been working on some python and influxdb stuff lately and have noticed that time seems to be outside of the tags in the JSON body.
Similar to:
"points": [
{
"measurement": "cpu_load_short",
"tags": {
"host": "server01",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"value": 0.64
}
}
]
The following curl command works as expected. It returns the correct mapping but python code returns blank.
curl -X PUT localhost:9200/geotest/
curl -X PUT localhost:9200/geotest/geotest/_mapping -d '{
"geotest": {
"properties": {
"location": {
"type": "geo_point",
"lat_lon": true,
"geohash": true
}
}
}
}'
curl -XGET localhost:9200/geotest/_mapping
{"geotest":{"mappings":{"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":true,"geohash":true}}}}}}
I expect this python code to be same as above...
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest', body = mymapping)
curl -XGET localhost:9200/geotest/_mapping
{"geotest":{"mappings":{}}}
Why does python code does not create correct mapping the way curl does?
Update:
Using the put_mapping method I am not able to create wikipedia content index.
import urllib2
myfile=urllib2.urlopen('https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json').read()
import ast
myfile1=ast.literal_eval(myfile)['content']['page']['properties']
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
es.indices.delete(index ='enwiki_todel')
es.indices.create(index ='enwiki_todel')
es.indices.put_mapping(index ='enwiki_todel', doc_type='page', body = myfile1)
update 2
I tried to keep only content using ast module. And still getting mapper parsing exception.
import urllib2
myfile=urllib2.urlopen('https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json').read()
import ast
myfile1=ast.literal_eval(myfile)['content']
import elasticsearch
es = elasticsearch.Elasticsearch('http://ec2-52-91-179-95.compute-1.amazonaws.com:9200/')
es.indices.delete(index ='enwiki_todel')
es.indices.create(index ='enwiki_todel')
es.indices.put_mapping(index ='enwiki_todel', doc_type='page', body = myfile1)
You're almost there. If you want to create an index with a mapping in one shot, you need to use the "mappings": {} structure in the body of your create index call. Like this:
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"mappings": {"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}}
^
|
enclose your mapping in "mappings"
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest', body = mymapping)
An alternate solution is to use put_mapping after the call to create and you'll be able to use the same structure you initially had, i.e. without the "mappings: {} structure.
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest')
es.indices.put_mapping(index = 'geotest', body = mymapping)
I have a following query and I want to change that query into PyES:
{
"facets": {
"participating-org.name": {
"terms": {
"field": "participating-org.name"
},
"nested": "participating-org"
}
}
}
I have searched in PyES documentation about:
class pyes.facets.TermsFacetFilter(field=None, values=None, _name=None, execution=None, **kwargs)
And I don't know how to use it plus I couldn't find any examples related to it. Hoping to see PyES guys coming out with good documentation with examples in future.
I have just found out myself:
from pyes import *
from pyes.facets import *
conn = ES('localhost:9200', default_indices='org', default_types='activity')
q2 = MatchAllQuery().search()
q2.facet.add_term_facet('participating-org.role', nested="participating-org")
# Displays the ES JSON query.
print q2
resultset = conn.search(q2)
# To display the all resultsets.
for r in resultset:
print r
# To display the facet counts.
print resultset.facets
This code gives the above JSON Code and gives the exact count for me.