Messages aggregation in elasticsearch - elasticsearch

For example I have next documents.
{sourceIP:1.1.1.1, destIP:2.2.2.2}
{sourceIP:1.1.1.1, destIP:3.3.3.3}
{sourceIP:1.1.1.1, destIP:4.4.4.4}
Is there anyway to automatically aggregate them into one document which will contain next data?
{sourceIP:1.1.1.1, destIP:{2.2.2.2,3.3.3.3,4.4.4.4}}
So it looks like group by in SQL, but generate new documents in elasticsearch instead of old one.

I dont think there is anyway to do indexing time auto-merging of documents.
However , it should be possible to acheive whatever result you are planning to query should be possible by using one of querying options offered by Elasticsearch - while indexing one document for ,
Like ..
You can index seperate documents, query by sourceIP and use aggregations to give dest_ip
Take count of documents if its just to find dest_ips for a source_ip
Also if you want to avoid duplicate source_id + dest_id combinations , you can concat and use it as _id of document
Hope this helps.

Related

Elasticsearch query to return limited amount of result (10) which will contain 2 from each specified keyword

I have articles stored in Elasticsearch and I've been wondering if there is a way I can query by date but the result to contain a specific amount of articles from each publisher. More specifically, I have 5 different publishers and I want to get the 10 latest articles, 2 from each publisher. I'm storing the publishers name as a keyword field in elastic.
The only idea I've come up with is to run a query for each publisher separately and limit the result to the first 2 (and then merge the results programmatically), but it will be more efficient I think if there is way I can do this in a single query.
Thanks
This sounds like a case for field collapsing.
You would collapse on the publisher field (as long as it is a keyword or a number) and then request inner_hits, the actual articles.

Elasticsearch extract/add id's from multiple queries

I have multiple queries that need to filter data on elasticsearch. This queries are returning document ids from indexes that match the filter.
However i need to do another operation depending from user selection, to extract/add document unique id's from previous sum of queries with current query. The maximum number of query search is 5.
Is there an option in elastic so it will extract/add document id's from previous query? Right now i am doing this part in PHP with foreach iteration that takes a lot of time.
Edit
Example :
Ok let say we have one query on same index that contains :
{"query":{"bool":{"filter":[{"wildcard":{"182_empanalyzed":"example"}}]}}}
we will need to substract the document ids from the following query on same index :
{"query":{"bool":{"must_not":[{"nested":{"path":"184","query":{"exists":{"field":"184.*"}}}}]}}}
Keep in mind that this queries are example with only one condition in it, there might be more complexes queries with many fields to be searched on in each query. And from each following query there is an option to substract/add documents ids

Possible to use GroupBy in ElasticSearch querystring?

I have a few records in my elasticsearch collection and i want to use a GroupBy aggregation in elasticsearch querystring.
I want to know if it is possible, because i tried to google it always give result about this
i want to use this something like this in the query string , which can
give me records in the group.
For i.e.
http://localhost:9200/_all/tweets/_count?q=user:Pu*+user:Kim*
This will give me count of all the records which has name starts from Pu and Kim,
But i want to know that how many records are there has name starting with Pu
and Kim,
aggregations need to be specified in addition in the search request, you cannot specify them as part of a query string query.
You could also just execute two queries to find out this particular requirement...

Lucene: Filter query by doc ID

I want to have in the search response only documents with specified doc id. In stackoverflow I found this question (Lucene filter with docIds) but as far as I understand there is created the additional field in the document and then doing search by this field. Is there another way to deal with it?
Lucene's docids are intended only to be internal keys. You should not be using them as search keys, or storing them for later use. Those ids are subject to change without warning. They will be changed when updating or reindexing documents, and can change at other times, such as segment merges, as well.
If you want your documents to have a unique identifier, you should generate that key separate from the docId, and index it as a field in your document.

How to retrieve all document ids matching a search, in elastic search?

I'm working on a simple side project, and have a tech stack that involves both a SQL database and ElasticSearch. I only have ElasticSearch because I assumed that as my project grows, my full text searching would be most efficiently performed by ES. My ES schema is very simple - documents that I insert into ES have 2 fields, one being the id and the other being the field with the body of text to search. The id being inserted into ES corresponds to that document's primary key id from the SQL database.
insert record into SQL -> insert record into ES using PK from SQL
Searching would be the reverse of that. Query ES and grab all the matching ids, and then turn around and use those ids to get records from SQL.
search ES can get all PK ids -> use those ids to get documents from SQL
The problem that I am facing though, is that ES can only return documents in a paginated manner. This is a problem because I also have a WHERE clause on my SQL query, beyond just the ids. My SQL query might look like this ...
SELECT * FROM foo WHERE id IN (1,2,3,4,5) AND bar != 'baz'
Well, with ES paginating the results, my WHERE clause will always only be querying a subset of the full results from ES. Even if I utilize ES' skip and take, I'm still only querying SQL using a subset of document ids.
Is there a way to get Elastic Search to only return the entire list of matching document ids? I realize this is here to not allow me to shoot myself in the foot, because doing this across all shards and many many documents is not efficient. Is there no way, though?
After putting in some hours on this project, I've only now realized that I've poorly engineered this, unless I can get all of these ids from ES. Some alternative implementations that I've thought of would be to store the things that I'm filtering on, in SQL, in ES as well. A problem there is that I'd have to update the ES document every time I update the document in SQL. This would require a pretty big rewrite to some of my data access code. I could scrap ElasticSearch all together and just perform searching in Postgres, for now, until I can think of a better way to structure this.
The elasticsearch not support return each and every doc match to you queries. Because it Ll overload the system. Instead of this.. Use scroll concept in elasticsearch.. It's lik cursor concept in db's..
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
For more examples refer the Github repo. https://github.com/sidharthancr/elasticsearch-java-client
Hope it helps..
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
please have a look into the elastic search document where you can specify only particular fields that return from the match documents
hope this resolves your problem
{
"fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}

Resources