When I try to profile large query I only get aggregate results, how to profile query parts? - memgraphdb

When I try to profile large query I only get aggregate results. How can I profile query parts? Sometimes I get an an error during execution of my PROFILE query.
Can I somehow get at least part of the results if I can't get the whole?

At the moment I would advise you to remove part of the query and profile only the relevant part or to profile one part at the time. I can see that this issue has been reported to Memgraph developers already.

Related

Nested queries in SmallRye GraphQL

1. I want to create a GraphQL query that can filter by multiple records.
It is about filtering details of statistics. For example, the statistic contains the fields "Number of deaths", "Number of cases", "Number of recovered".
I have already written queries that can filter by the individual fields. Now I want to program a query that uses multiple filters, or a query in which multiple queries are nested.
I have already tried to define the individual steps of each query in a common query. You can see this in the attached images. The program compiles first. However, when I execute the query in the GraphQL UI, I get error messages
2. Unfortunately, I have not yet received any helpful tips regarding my query or my error.
Screenshot
At the top left you can see the individual queries, at the top right the merged query and at the bottom the errors as soon as I try to execute the query.

access _cat/indices information using query string query

When you run either curl http://<node_ip>:9200/_cat/indices or GET _cat/indices (the latter one in the Dev Tools console, you get a summary of all the indices present in your cluster, as well as some size and counts statistics.
Is there a way to access that information via query string query?
I mean, is there an internal ES index with all that information available, that I can query to get the same/similar information?
No there isn't. That information is not kept in an index, but in the cluster state which is stored in a different location.

Spring Data ElasticSearch: returned scores are off

I have a Spring Boot project with org.springframework.boot:spring-boot-starter-data-elasticsearch:jar:2.0.0.RELEASE connecting to a elasticsearch-6.3.1 server.
I have the following scenario: for some elasticsearch query (which involves a should bool), I get different scores from when I run the query manually, using curl.
Steps I have tried: Extract query with debugger from SearchQuery before calling the repo, extract query from elasticsearch logs (using "index.search.slowlog.threshold.fetch.debug" : "0s", "index.search.slowlog.threshold.query.debug" : "0s"); in both cases, running the queries manually, with curl, gives a set of scores that are different from the ones given by Java api.
I mention that I couldn't find a pattern by looking at the diff between the two score sets. The scores returned by the manual query seem to be the correct ones, because I expect some of them to have the same value, which does not happen for the ones returned by the api.
If you have any ideas on what might cause this or how to continue the investigation it is much appreciated.
I have managed to make the api return the same scores as the manual run by wrapping the inner query with a constantScoreQuery, it seems that the TF/IDF criteria was the 'culprit'.
It is still curious, though, why the manual query behaved as ignoring TF/IDF in the first place ..

How to handle pagination when the source data changes frequently

Specifically, I'm using Elasticsearch to do pagination, but this question could apply to any database.
Elasticsearch provides methods to paginate search results with handy from and to parameters.
So I run a query get me the most recent data from result 1 to 10
This works great.
The user clicks "next page" and the query is:
get me the most recent data from result 11 to 20
The problem is that in the time between the two queries, 2 new records have been added to the backing database, which means the paginated results will overlap (the last 2 from the first page show up as first two on the second page).
What's the best solution to avoid this? Right now, I'm adding a filter to the query that tell it to only include results later than the last result of the previous query. But it just seems hackish.
A filter is not a bad option, if you're already indexing a relevant timestamp. You have to track that timestamp on the client side in order to correctly prepare your queries. You also have to know when to get rid of it. But those aren't insurmountable problems.
The Scroll API is a solid option for this, because it effectively snapshots in time on the Elasticsearch side. The intent of the Scroll API is to provide a stable search query for deep pagination, which has to deal with the exact issue of change that you're experiencing.
You begin a Scrolling Search by supplying your query and the scroll parameter, for which Elasticsearch returns a scroll_id. You then make requests to /_search/scroll supplying that ID, each of which return a page of results and a new scroll_id for the next request.
(Note that you don't want the scan search type here. That's used to extract documents en masse, and does not apply any sorting.)
Compared to filtering, you do still have to track a value: the scroll_id for your next page of results. Whether that's easier than tracking a timestamp depends on your app.
There are other potential downsides to consider. Elasticsearch persists the context for your search on a single node within the cluster. Conceivably these could accumulate in your cluster, depending on how heavily you rely on scrolling search. You'll want to test the performance implications there. And if I recall correctly, scrolling searches also do not persist through a node failure or restart.
The ES documentation for the Scroll API provides good details on all of the above.
Bottom line: filtering by timestamp is actually not a bad choice. The Scroll API is another valid option, designed for a similar use case, but is not without its drawbacks.
Realise this is a bit old but with ElasticSearch 6.3 there's now the search_after feature for the request body which allows for cursor type paging:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
It is very similar to the scroll API but unlike it, the search_after parameter is stateless, it is always resolved against the latest version of the searcher.
You need to use scan API for this. Scan and scroll API let's you do point in time search and pagination.
Scan API -

solr query- get results without scanning files

I would like to execute a solr query and get only the uniquKey I've defined.
The documents are very big so defining fl='my_key' is not fast enough - all the matching documents are still scanned and the query can take hours (even though the search itself was fast - numFound takes few seconds to return).
I should mention that all the data is stored, and creating a new index is not an option.
One idea I had was to get the docIds of the results and map them to my_key in the code.
I used fl=[docid], thinking it doesn't need scanning to get this info, but it still takes too long to return.
Is there a better way to get the docIds?
Or a way to unstore certain fields without reindexing?
Or perhapse a compeletly different way to get the results without scanning all the fields?
Thanks,
Dafna
Sorry, but the only way is to break your gigantic documents in more than one. I don't see how it will be possible to only match the fields you specified and let the documents alone. This is not how Lucene works.
One could make a document that used only indexed fields that are needed to query to turn the job easier, or break the document based on the queries that are needed. Or simply adding another documents with the structure needed for these new queries. It's up to you.

Resources