Elasticsearch query in Julia - elasticsearch

How do I connect Julia with Elasticsearch? Has anyone ever tried it, or found a package that is ready to use?
I know that in Julia we can use python package, but I still have no idea how to use it.

There it is:
#Installation
using Conda
Conda.add("elasticsearch")
# loading module and getting connection
using PyCall
elasticsearch = pyimport("elasticsearch")
es = elasticsearch.Elasticsearch() # <== this is the connection to ES
es.info() # connection information
# put some data
dat = Dict("a1"=>"blaaa", "a2"=>"hello")
res = es.index(index="data", doc_type="data", id="1", body=dat)
# fetch some data
q1 = Dict("query"=>Dict("match"=>Dict("a1"=>Dict("query"=>"blaaa"))))
es.search("data",body=q1)["hits"]["hits"]

Related

Dialogflow CX - Location settings have to be initialized - FAILED_PRECONDITION

I am automating Dialogflow CX using Python client libraries. That includes agent/intent/entity etc. creation/updation/deletion.
But for the first time run, I am encountering the below error from python.
If I login to console and set the location from there and rerun the code, it is working fine. I am able to create agent.
Followed this URL of GCP -
https://cloud.google.com/dialogflow/cx/docs/concept/region
I am looking for code to automate the region & location setting before running the python code. Kindly provide me with the code.
Below is the code I am using to create agent.
Error -
google.api_core.exceptions.FailedPrecondition: 400 com.google.apps.framework.request.FailedPreconditionException: Location settings have to be initialized before creating the agent in location: us-east1. Code: FAILED_PRECONDITION
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "com.google.apps.framework.request.FailedPreconditionException: Location settings have to be initialized before creating the agent in location: us-east1. Code: FAILED_PRECONDITION"
debug_error_string = "{"created":"#1622183899.891000000","description":"Error received from peer ipv4:142.250.195.170:443","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"com.google.apps.framework.request.FailedPreconditionException: Location settings have to be initialized before creating the agent in location: us-east1. Code: FAILED_PRECONDITION","grpc_status":9}"
main.py -
# Import Libraries
import google.auth
import google.auth.transport.requests
from google.cloud import dialogflowcx as df
from google.protobuf.field_mask_pb2 import FieldMask
import os, time
import pandas as pd
# Function - Authentication
def gcp_auth():
cred, project = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
auth_req = google.auth.transport.requests.Request()
cred.refresh(auth_req)
# Function - Create Agent
def create_agent(agent_name, agent_description, language_code, location_id, location_path):
if location_id == "global":
agentsClient = df.AgentsClient()
else:
agentsClient = df.AgentsClient(client_options={"api_endpoint": f"{location_id}-dialogflow.googleapis.com:443"})
agent = df.Agent(display_name=agent_name, description=agent_description, default_language_code=language_code, time_zone=time_zone, enable_stackdriver_logging=True)
createAgentRequest = df.CreateAgentRequest(agent=agent, parent=location_path)
agent = agentsClient.create_agent(request=createAgentRequest)
return agent```
Currently, Dialogflow does not support configuring the location settings through the API, thus you can not initialise location settings through it. You can only set the location through the Console.
As an alternative, since the location setting has to be initialised only once for each region per project you could set the location and automate the agent creation process, some useful links: 1 and 2.
On the other hand, if you would find this feature useful, you can file a Feature Request, here. It will be evaluated by the Google's product team.
Many thanks Alexandre Moraes. I have raised a feature request for the same.

how to change spark.r.backendConnectionTimeout value in RStudio?

I am using RStudio to connect to my HDFS file using SparkR. When I leave Spark analyses running overnight, I get "R session aborted" error the next day. From Spark's documentation on SparkR (https://spark.apache.org/docs/latest/configuration.html), the default value of spark.r.backendConnectionTimeout is set to 6000s. I would like to change this value to something large that my connection doesn't time out after the analyses is done.
I have tried the following:
sparkR.session(master = "local[*]", sparkConfig = list(spark.r.backendConnectionTimeout = 10))
sparkR.session(master = "local[*]", spark.r.backendConnectionTimeout = 10)
I get the same output for both commands:
Spark package found in SPARK_HOME: C:\Spark\spark-2.3.2-bin-hadoop2.7
Launching java with spark-submit command C:\Spark\spark-2.3.2-bin-hadoop2.7/bin/spark-submit2.cmd sparkr-shell C:\Users\XYZ\AppData\Local\Temp\3\RtmpiEaE5q\backend_port696c18316c61
Java ref type org.apache.spark.sql.SparkSession id 1
It seems that the parameter was not passed correctly. Also, I am not sure where to pass that parameter.
Any help would be appreciated.
A similar post is around, but that involves Zeppelin (how to change spark.r.backendConnectionTimeout value?).
Thanks.
I found the solution: it is to modify the spark-defaults.conf file and add the following line:
spark.r.backendConnectionTimeout = 6000000
(or whatever time limit you want)
IMPORTANT note - restart hadoop and yarn services, and try connecting to Spark with SparkR normally:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local")
You can check if the settings took place or not at http://localhost:4040/environment/
I hope this comes useful for other people.

Gremlin-Python Connecting to existing JanusGraph

I Have created a graph using gremlin console
gremlin> ConfiguredGraphFactory.graphNames
==>MYGRAPH
gremlin> ConfiguredGraphFactory.getConfiguration('MYGRAPH')
==>storage.backend=cql
==>graph.graphname=MYGRAPH
==>storage.hostname=127.0.0.1
==>Template_Configuration=false
gremlin> g.V().properties()
==>vp[name->SFO]
==>vp[country->USA]
==>vp[name->ALD]
==>vp[country->IND]
==>vp[name->BLR]
==>vp[country->IND]
gremlin>
I want to connect with MYGRAPH using gremlin-python.
Can someone please tell me how to access graph named "MYGRAPH" using gremlin-python.
Thanks in advance...
First of all you will need to install some jar files for JanusGraph to handle gremlin-python scripts:
./bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.9
Please note that the version of gremlin-python you install must match the Tinkerpop version JanusGraph is compatible with. You can find compatibility information on the JanusGraph releases page. For example JanusGraph 0.2.2 is compatible with Tinkerpop 3.2.9.
Next you need to start a JanusGraph server using ConfiguredGraphFactory. You just have to use the file conf/gremlin-server/gremlin-server-configuration.yaml from the ditribution.
bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
This file differs from the traditional conf/gremlin-server/gremlin-server.yaml in those few lines
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties
}
Then we need to load the graph MYGRAPH during the initialization script of the server. Please create an init script scripts/init.groovy. Here you can load as many different graphs as you want.
def globals = [:]
myGraph = ConfiguredGraphFactory.open("MYGRAPH")
globals << [myGraphTraversal : myGraph.traversal()]
Make sure this script is executed when gremlin server starts in conf/gremlin-server/gremlin-server-configuration.yaml
scriptEngines: {
gremlin-groovy: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI],
scripts: [scripts/init.groovy]}}
Finally in your Python project, install the gremlin-python package that matches the Tinkerpop version of your version of JanusGraph. In case of JanusGraph 0.2.2, this is version 3.2.9.
pip install gremlin-python==3.2.9
Start a Python shell and start coding:
>>> from gremlin_python import statics
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.process.graph_traversal import __
>>> from gremlin_python.process.strategies import *
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
>>> graph = Graph()
>>> myGraphTraversal = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','myGraphTraversal'))
>>> myGraphTraversal.V().count()

cx_Oracle connection by python3.5

I've tried several attempt to connect Oracle DB but still unable to connect. Following is my code to connect. However, I could connect Oracle DB through the terminal like this:
$ sqlplus64 uid/passwd#192.168.0.5:1521/WSVC
My evironment: Ubuntu 16.04 / 64bit / Python3.5
I wish your knowledge and experience associated with this issue to be shared. Thank you.
import os
os.chdir("/usr/lib/oracle/12.2/client64/lib")
import cx_Oracle
# 1st attempt
ip = '192.168.0.5'
port = 1521
SID = 'WSVC'
dsn_tns = cx_Oracle.makedsn(ip, port, SID)
# dsn_tns = cx_Oracle.makedsn(ip, port, service_name=SID)
db = cx_Oracle.connect('uid', 'passwd', dsn_tns)
cursor = db.cursor()
-------------------------------------------------
# 2nd attempt
conn = "uid/passwd#(DESCRIPTION=(SOURCE_ROUTE=OFF)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.5)(PORT=1521)))(CONNECT_DATA=(SID=WSVC)(SRVR=DEDICATED)))"
db = cx_Oracle.connect(conn)
cursor = db.cursor()
------------------------------------------------------
# ERROR Description
cx_Oracle.InterfaceError: Unable to acquire Oracle environment handle
The error "unable to acquire Oracle environment handle" is due to your Oracle configuration being incorrect. A few things that should help you uncover the source of the problem:
when using Instant Client, do NOT set the environment variable ORACLE_HOME; that should only be set when using a full Oracle Client or Oracle Database installation
the value of LD_LIBRARY_PATH should contain the path which contains libclntsh.so; the value you selected looks like it is incorrect and should be /usr/lib/oracle/12.2/client64/lib instead
you can verify which Oracle Client libraries are being loaded by using the ldd command as in ldd cx_Oracle.cpython-35m-x86_64-linux-gnu.so

Spark app unable to write to elasticsearch cluster running in docker

I have a elasticsearch docker image listening on 127.0.0.1:9200, I tested it using sense and kibana, It works fine, I am able to index and query documents. Now when I try to write to it from a spark App
val sparkConf = new SparkConf().setAppName("ES").setMaster("local")
sparkConf.set("es.index.auto.create", "true")
sparkConf.set("es.nodes", "127.0.0.1")
sparkConf.set("es.port", "9200")
sparkConf.set("es.resource", "spark/docs")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
val rdd = sc.parallelize(Seq(numbers, airports))
rdd.saveToEs("spark/docs")
It fails to connect, and keeps on retrying
16/07/11 17:20:07 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out
16/07/11 17:20:07 INFO HttpMethodDirector: Retrying request
I tried using IPAddress given by docker inspect for the elasticsearch image, that also does not work. However when I use a native installation of elasticsearch, the Spark App runs fine. Any ideas?
Also, set the config
es.nodes.wan.only to true
As mentioned in this answer if you are having issues writing to ES.
Couple things I would check:
The Elasticsearch-Hadoop spark connector version you are working with. Make sure that it is not beta. There was a fixed bug related to the IP resolving.
Since 9200 is the default port, you may remove this line: sparkConf.set("es.port", "9200") and check.
Check that there is no proxy configured in your Spark environment or config files.
I assume that you run Elasticsaerch and Spark on the same machine. Can you try to configure your machine IP address instead of 127.0.0.1
Hope this helps! :)
Had the same problem and a further issue was that the confs set using sparkConf.set() didn't have an effect. But supplying the confs with the saving function worked, like this:
rdd.saveToEs("spark/docs", Map("es.nodes" -> "127.0.0.1", "es.nodes.wan.only" -> "true"))

Resources