Load elasticsearch data in knime - elasticsearch

I have a database loaded in elasticsearch and I would like to access it from KNIME to be able to make a recommendation system. How can I connect both applications?

You can do it with python code in knime.
Code is above
from pandas import DataFrame
output_table = DataFrame()
from elasticsearch import Elasticsearch
import json
import pandas as pd
esclient = Elasticsearch(['http://XX.YYYY.ZZZ.TTT:9200'])
response = esclient.search(
index='rtimlog-2018.11.02',
body={
"query": {
"match": {
"message.eventParameters.Msisdn.keyword": {
"query": "532XXXXX",
}
}
}
}
)
columns=['A']
df = pd.DataFrame( columns=columns)
for hit in response['hits']['hits']:
X= json.dumps(hit['_source']['message'])
df.loc[i]=X
output_table = df

This is an older topic, and back then there were the REST, Python, or JDBC workarounds. I vaguely remember that the REST way was fiddly because KNIME does not support GET requests with body.
Since a few years there's fortunately now a dedicated commercial extension which supports reading from and writing to Elasticsearch (currently for version 6 and 7).
You can find more information here: https://nodepit.com/product/elasticsearch

Related

PDF ingesting through KIBANA

I am new to Elasticsearch and there are some requirements where I need to ingest and index pdf using Kibana. I have figured out that we have to create a pipeline for the above purpose but do not know which processor to use and how should I configure those. I discovered that the node of my Elasticsearch has ingest-attachment plugin installed. The version which I am using is Elasticsearch 7.14,so any help on it is appreciated thank you.
This might be useful for you, the ingest atachment processor plugin uses base64 for a pdf to extract and ingest data. You would be require to get base64 abd ingest it into a pipeline. For example:
encoded_data = base64.b64encode(data).decode('utf-8') # data is the file that you are parsing
body = {
'query': {
'bool': {
"filter": [
{"ids": { 'values': [contentDocumentId]}},
{"term": {"contentVersionId": contentVersionId}}
]
}
},
'script': {
'source': 'ctx._source["file_data"] = params._file_data',
'params': {'_file_data': encoded_data}
}
}
response = client.update_by_query(conflicts='proceed', index=_index, pipeline='attachment', body=json.dumps(body))
I am using update by query for my used case you can check if you want to use update or update by query

bitrate and time into influxdb

I have filtered the bitrates from live streams and got the output below. I have constructed an API with Python and piped continuous data into influxdb, which should be monitored, like python api.py | python influx.py. However, I am unable to store this output into influxdb. If necessary I can show my API code.
Click here to show output to be stored in influxdb
#!usr/bin/python
import sys
import datetime
from influxdb import InfluxDBClient
from influxdb.client import InfluxDBClientError
from influxdb import DataFrameClient
import os
import time
client=InfluxDBClient('localhost',8086,'admin','admin',database='stackanm')
client.create_database('stackanm')
def store(bitrate,time):
json=[
{
"measurement":"bitrates",
"tags":{
"time":time,
"fields":{
"bitrate":bitrate
}
}
}
]
client.write_points(json,time_precision='u')
f = os.fdopen(sys.stdin.fileno(),'r',0)
for line in f:
elements = line.strip().split()
if len(elements) == 1:
bitrate = elements[0]
unixtime = elements[1].split('.')
stdtime = datetime.datetime.utcfromtimestamp(long(float(unixtime[1]))).strftime('%Y-%m-%dT%H:%M:%S')
influxtime = ".".join([stdtime,unixtime[1]])
store(bitrate,float(elements[1]),influxtime)
You probably solved this by now but I just stumbled across it.
Are you sure that time is supposed to be listed as a tag?
I've been working on some python and influxdb stuff lately and have noticed that time seems to be outside of the tags in the JSON body.
Similar to:
"points": [
{
"measurement": "cpu_load_short",
"tags": {
"host": "server01",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"value": 0.64
}
}
]

Can Kibana issue a command similar to Splunks 'stats'?

I'm pretty well versed in Splunk, and am trying to pickup ELK.
I have an instance up and running, but I am struggling to build a mental map of ELK (likely due to my experiment with Splunk)
Is there a 'stats' like command in ELK, where I could say something like
* | stats count by Variable
or even better
* | stats p50(Variable)
what would those commands be? (or is my mental model incorrect?)
I don't know Splunk so I can't really tell what stats means.
But I guess you want to run here an elasticsearch aggregation which looks like:
GET test/_search
{
"aggs": {
"my_stats": {
"stats": {
"field": "variable"
}
},
"my_p50": {
"percentiles": {
"field": "variable"
}
}
}
}
Note that this is to run on elasticsearch. Kibana proposes some visualizations to do the same but using the frontend. There is no CLI in Kibana.
You can run the query I pasted in Kibana Console available at http://0.0.0.0:5601/app/kibana#/dev_tools/console if you are using elastic stack 5.0.

python code fails to create correct mapping

The following curl command works as expected. It returns the correct mapping but python code returns blank.
curl -X PUT localhost:9200/geotest/
curl -X PUT localhost:9200/geotest/geotest/_mapping -d '{
"geotest": {
"properties": {
"location": {
"type": "geo_point",
"lat_lon": true,
"geohash": true
}
}
}
}'
curl -XGET localhost:9200/geotest/_mapping
{"geotest":{"mappings":{"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":true,"geohash":true}}}}}}
I expect this python code to be same as above...
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest', body = mymapping)
curl -XGET localhost:9200/geotest/_mapping
{"geotest":{"mappings":{}}}
Why does python code does not create correct mapping the way curl does?
Update:
Using the put_mapping method I am not able to create wikipedia content index.
import urllib2
myfile=urllib2.urlopen('https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json').read()
import ast
myfile1=ast.literal_eval(myfile)['content']['page']['properties']
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
es.indices.delete(index ='enwiki_todel')
es.indices.create(index ='enwiki_todel')
es.indices.put_mapping(index ='enwiki_todel', doc_type='page', body = myfile1)
update 2
I tried to keep only content using ast module. And still getting mapper parsing exception.
import urllib2
myfile=urllib2.urlopen('https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json').read()
import ast
myfile1=ast.literal_eval(myfile)['content']
import elasticsearch
es = elasticsearch.Elasticsearch('http://ec2-52-91-179-95.compute-1.amazonaws.com:9200/')
es.indices.delete(index ='enwiki_todel')
es.indices.create(index ='enwiki_todel')
es.indices.put_mapping(index ='enwiki_todel', doc_type='page', body = myfile1)
You're almost there. If you want to create an index with a mapping in one shot, you need to use the "mappings": {} structure in the body of your create index call. Like this:
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"mappings": {"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}}
^
|
enclose your mapping in "mappings"
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest', body = mymapping)
An alternate solution is to use put_mapping after the call to create and you'll be able to use the same structure you initially had, i.e. without the "mappings: {} structure.
import elasticsearch
es = elasticsearch.Elasticsearch('http://some_site.com:9200/')
mymapping={"geotest":{"properties":{"location":{"type":"geo_point","lat_lon":True,"geohash":True}}}}
es.indices.delete(index = 'geotest')
es.indices.create(index = 'geotest')
es.indices.put_mapping(index = 'geotest', body = mymapping)

Translating ElasticSearch Facets Query into PyES

I have a following query and I want to change that query into PyES:
{
"facets": {
"participating-org.name": {
"terms": {
"field": "participating-org.name"
},
"nested": "participating-org"
}
}
}
I have searched in PyES documentation about:
class pyes.facets.TermsFacetFilter(field=None, values=None, _name=None, execution=None, **kwargs)
And I don't know how to use it plus I couldn't find any examples related to it. Hoping to see PyES guys coming out with good documentation with examples in future.
I have just found out myself:
from pyes import *
from pyes.facets import *
conn = ES('localhost:9200', default_indices='org', default_types='activity')
q2 = MatchAllQuery().search()
q2.facet.add_term_facet('participating-org.role', nested="participating-org")
# Displays the ES JSON query.
print q2
resultset = conn.search(q2)
# To display the all resultsets.
for r in resultset:
print r
# To display the facet counts.
print resultset.facets
This code gives the above JSON Code and gives the exact count for me.

Resources