Paging problems in Elasticsearch SQL API - elasticsearch

My existing system has some search SQL procedures that returns the data based on some filters. Now, to improve searches we have decided to use Elasticsearch for all our searches. We are in phase of making a prototype for now.
Below is what i have done till now:-
De-normalize all the data from my RDBMS and store into Elasticsearch using Logstash.
Query data from Elasticsearch based on the parameters using Elastisearch SQL API.
The main problem is the Pagination. Elasticsearch Sql has support for sending fetch_size parameter and in result it returns the cursor for the next set of records.
Cursor is fine if you want to get to the next paged set of results, but if a user wants to go from page 10 to page 100, how can we achieve that ?
I also searched for offset and skip support in elasticsearch SQL but could not find any references.
Has anyone faced such an issue ? I would appreciate any help or suggestions.
I tried to follow the link https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-pagination.html
{
"query" : "Select client_clientid, clientpolicy_policyname from client_paged_list group by client_clientid, clientpolicy_policyname",
"fetch_size": 5
}

Related

elasticsearch query statistics and analysis in near real time

I am pretty new to elasticsearch and I want to create statistics and kibana dashboards on queries sent to elasticsearch index , what is the best approach to do so ? Any advice or recommendations will be highly appreciated?
The idea is to analyze all queries sent to the index and do some performance optimisation in the future when the userbase increase ...
I am planning for the moment to store the logs in different index , but parsing seems to be kind of complex activity ...
Ideally I need to have:
-Counting of user queries
-Counting of queries that returned no results
-Logging of all search terms
-Sorting of queries, and queries that returned no results, by most frequently contained search term
-A view of top queries, including the search term not found results for and the exact query
-A view of top queries returning no results, including the search term not found results for and the exact query
Thanks
There is no OOTB functionality available in Elasticsearch for search analysis. But there are some workaround you can do for same and get information what you are asking.
First option, you can enable slow log in Elasticsearch by executing below command and it will log each and every request to coming to Elasticsearch.
PUT /my-index-000001/_settings
{
"index.search.slowlog.threshold.query.info": "0s",
"index.search.slowlog.threshold.fetch.info": "0s"
}
Second option, You can log all the query the application layer or intermediate level using which application and elasticsearch talking to each other.
Once you have logs, You can configured Logstash / Filebeat / Fleet to read log and transform and index to Elasticsearch. Logstash provide differnt kind of filter which you can use and easily transofrm your plain text logs to strcture logs (grok filter).

Elasticsearch slow performance for huge data retrieval with source field

I'm using ElasticSearch to search from more than 10 million records, most records contains 1 to 25 words. I want to retrieve data from it, the method I'm using now is drastically slow for big data retrieval as I'm trying to get data from the source field. I want a method that can make this process faster. I'm free to use other database or anything with ElasticSearch. Can anyone suggest some good Ideas and Example for this?
I've tried searching for solution on google and one solution I found was pagination and I've already applied it wherever it's possible but pagination is not an option when I want to retrieve many(5000+) hits in one query.
Thanks in advance.
Try using scroll
While a search request returns a single “page” of results, the scroll
API can be used to retrieve large numbers of results (or even all
results) from a single search request, in much the same way as you
would use a cursor on a traditional database.

How to create an index from search results, all on the server?

I will be getting documents from a filtered query (quite a lot of documents). I will then immediately create an index from them (in Python, using requests to directly query the REST API), without any modification.
Is it possible to make this operation directly on the server, without the round-trip of data to the script and back?
Another question was similar (in the intent) and the only answer is to go via Logstash (equivalent to using my code, though possibly more efficient)
refer http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html
in short what you need to do is
0.) ensure you have _source set to true
1.) use scan and scroll API , pass your filtered query with search type scan,
2.)fetch documents using scroll id
2.) bulk index the result using the source field which returns you the json used to index data
refer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
guide/en/elasticsearch/guide/current/bulk.html
guide/en/elasticsearch/guide/current/reindex.html
es 2.3 has an experimental feature that allows reindex from a query
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Elasticsearch query SQL Server LAG function analog

I am looking for a SQL Server LAG/LEAD functions analog in Elasticsearch.
Assume I have a list of documents in result set found by particular criteria. The result set is also ordered in some order.
I know the id of one of the documents in that result set and I need to find next and/or previous document in the same result set.
SQL Server 2012 and above has LAG/LEAD functions to get next/previous row in the recordset. So I wondering if there is such functionality in the elasticsearch.
Could you please point me on the corresponding documentation/examples please?
There isn't. Lots of stuff from relational land doesn't translate directly into Elasticsearch land. What do you want to do with LAG/LEAD? Just getting the ids is simple enough by asking for more results and looking up or down the list. I imagine its something more fun but I don't want to speculate.

How to get a response from Elastic Search after indexing?

I'm using CouchDB river plugin with Elastic Search. In my web application, I am using CouchDB's bulk insert to insert documents into CouchDB. This triggers the changes feed and ES reads this to index my documents. The problem now is that my web ui isn't showing anything because ES is still indexing the documents.
I'm using PyES to "talk" to ES by the way. Is there any function I can call to know whether Elastic Search is busy indexing?
Thanks a million.
Even if ES is indexing, ES should answer to queries.
Could you check with a
curl localhost:9200/_search?q=*
That your index has docs in it while indexing from couchDb?
[UPDATE]
You have to know that Elasticsearch is a Near Real Time search engine. So, you have to wait some seconds to be able to search for your docs.
You can retrieve your docs immediatly but you need to wait for the refresh process.
You can trigger manually the refresh API. But it could slow down dramatically your insertions.
Does it help?

Resources