I have data set(questions) which are mapped to multiple tags and these tags are hierarchical in nature.
So there is A question which is mapped to t1 and t2 tag.
t1 has parent p1 and p1 has parent p2.(p2 -> p1 - >t1 --mapped to--->A)
So i was storing my data in neo4j and I want to get A as result for p2 tag. I am getting result easily using cypher. but now i have sort and limit by in the same query and since neo4j cant use index in such queries, i am thinking of integrating neo4j with elasticsearch, but I am not able to get how to query?
$query = "MATCH p=(n:messages)-[r:TAGGED_TO]->(k:tags{tag_id:{tag_id}}) RETURN p,n ORDER by n.msgId desc limit 5";
$params['tag_id'] = (int)$tag_id;
$result = $this->dbHandle->run($query,$params);
Now sort and limit are not using index. I want to run this query in optimized way.
You can use Graphaware plugin for connecting neo4j to elastic or the apoc plugin,specifically apoc.es.* functions... see documentation for more.
Related
I have created an elasticsearch index against a news table on sqlserver using logstash via the JDBC Driver. This all looks good in Elasticsearch.
Using Index Server, the type of query that gets built for that takes the following form:
SELECT News.*, fulltextsearch.rank FROM News INNER JOIN CONTAINSTABLE(News, ( Headline, BodyText ), 'ISABOUT("car track race" WEIGHT(0.65), car NEAR track NEAR race)') fulltextsearch ON News.NewsID = fulltextsearch.[Key] WHERE DateSubmitted <= '01/11/2017' ORDER BY fulltextsearch.rank DESC
Is there any kind of query that I can do in Elasticsearch to give a similar/same outcome as the above.
No, elasticsearch (version 5.3) do not support JOIN like this. See https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html.
For example I have next documents.
{sourceIP:1.1.1.1, destIP:2.2.2.2}
{sourceIP:1.1.1.1, destIP:3.3.3.3}
{sourceIP:1.1.1.1, destIP:4.4.4.4}
Is there anyway to automatically aggregate them into one document which will contain next data?
{sourceIP:1.1.1.1, destIP:{2.2.2.2,3.3.3.3,4.4.4.4}}
So it looks like group by in SQL, but generate new documents in elasticsearch instead of old one.
I dont think there is anyway to do indexing time auto-merging of documents.
However , it should be possible to acheive whatever result you are planning to query should be possible by using one of querying options offered by Elasticsearch - while indexing one document for ,
Like ..
You can index seperate documents, query by sourceIP and use aggregations to give dest_ip
Take count of documents if its just to find dest_ips for a source_ip
Also if you want to avoid duplicate source_id + dest_id combinations , you can concat and use it as _id of document
Hope this helps.
I have an Elasticsearch index with a collection of tweets. I'd like to plot a network with Gephi from the relations inferred from the tweets, adding edges among people who replied each other or retweeted.
So I need to somehow aggregate these pairs. If each tweet has an author_name field and rt_user_name and rp_user_name fields, how could I get:
#bob <-> #alice = 7 tweets
#alice <-> #robert = 3 tweets
#robert <-> #bob = 1 tweets
with an aggregation function?
I am planning to use the ruby gem
If each individual document (tweet) has needed information, you could use terms aggregation to get for example most frequent pairs of users. But as Elasticsearch doesn't support JOINs across documents a graph database like Neo4j might provide more flexible data model.
I have a requirement as follows :
Whatever data is there in hadoop, i need to make it searchable (and vice-versa).
So, for this , I use ElasticSearch where we can use elasticsearch-hadoop plug-in to send a data from hadoop to Elastic.And a real-time search is now possible.
But, my question is, isn't there a duplication of data. Whatever the data is in hadoop , same is duplicated in Elastic search with
indexing. Is there any way of get rid of this duplication OR my concept is wrong. I search a lot but don't find any clue about this duplication issue.
If you specify an immutable ID for each rows in elasticsearch (eg : a customerID for example), all inserts of existing datas will be only updates.
Extract from official documentation about insertion method (cf http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html#_operation):
index (default) :new data is added while existing data (based on its
id) is replaced (reindexed).
If you have "customer" dataset in pig, just store datas like that :
A = LOAD '/user/hadoop/customers.csv' USING PigStorage()
....;
B = FOREACH A GENERATE customerid, ...;
STORE B INTO 'foo/customer' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes = localhost','es.http.timeout = 5m','es.index.auto.create = true','es.input.json = true','es.mapping.id =customerid','es.batch.write.retry.wait = 30', 'es.batch.size.entries = 500');
--,'es.mapping.parent = customer');
To perform a new search on Hadoop just use the custom loader :
A = LOAD 'foo/customer' USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?me*');
I'm working on a simple side project, and have a tech stack that involves both a SQL database and ElasticSearch. I only have ElasticSearch because I assumed that as my project grows, my full text searching would be most efficiently performed by ES. My ES schema is very simple - documents that I insert into ES have 2 fields, one being the id and the other being the field with the body of text to search. The id being inserted into ES corresponds to that document's primary key id from the SQL database.
insert record into SQL -> insert record into ES using PK from SQL
Searching would be the reverse of that. Query ES and grab all the matching ids, and then turn around and use those ids to get records from SQL.
search ES can get all PK ids -> use those ids to get documents from SQL
The problem that I am facing though, is that ES can only return documents in a paginated manner. This is a problem because I also have a WHERE clause on my SQL query, beyond just the ids. My SQL query might look like this ...
SELECT * FROM foo WHERE id IN (1,2,3,4,5) AND bar != 'baz'
Well, with ES paginating the results, my WHERE clause will always only be querying a subset of the full results from ES. Even if I utilize ES' skip and take, I'm still only querying SQL using a subset of document ids.
Is there a way to get Elastic Search to only return the entire list of matching document ids? I realize this is here to not allow me to shoot myself in the foot, because doing this across all shards and many many documents is not efficient. Is there no way, though?
After putting in some hours on this project, I've only now realized that I've poorly engineered this, unless I can get all of these ids from ES. Some alternative implementations that I've thought of would be to store the things that I'm filtering on, in SQL, in ES as well. A problem there is that I'd have to update the ES document every time I update the document in SQL. This would require a pretty big rewrite to some of my data access code. I could scrap ElasticSearch all together and just perform searching in Postgres, for now, until I can think of a better way to structure this.
The elasticsearch not support return each and every doc match to you queries. Because it Ll overload the system. Instead of this.. Use scroll concept in elasticsearch.. It's lik cursor concept in db's..
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
For more examples refer the Github repo. https://github.com/sidharthancr/elasticsearch-java-client
Hope it helps..
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
please have a look into the elastic search document where you can specify only particular fields that return from the match documents
hope this resolves your problem
{
"fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}