I have to delete the queries I don't use any longer in Power Query. One way is to identify which query is not called by any other query (let's call it top query) and see if it is still needed. If it is not then I can delete it safely.
One way to achieve this is to go through each query and see if another query is called. If another query is called than that query is not a top query
Besides this manual method, is there a better one ?
If a query is being referenced by another query, it won't let you delete it. Thats the best way to trace things back other then looking at code
Related
Horsing around the CosmosDB .NET SDK, I found an intriguing issue.
I wanted to use the SDK LINQ support for the following query:
SELECT VALUE COUNT(1) FROM c WHERE <predicate>
Well, immediately after writing the LINQ query, I realized there might be no way to handle such queries as document queries. However, since document queries allow you to capture query metrics and unblock a thread between pages of results, I strongly prefer it. Here's the code:
client.CreateDocumentQuery<TEntity>.Where(<predicate>).Count()
Even though I understand that the result type of Count() isn't IQueryable, is there a way to handle "count" queries as document queries?
Of course there is a way to do that.
There is a CountAsync extension built into the SDK.
Simply construct your IQueryable and use .CountAsync() when you are ready to get the count.
Keep in mind however that there will be no metric collection as the result is aggregated in the SDK.
If you really need metrics then you can use the usual DocumentQuery created from SQL rather than LINQ and the while.HasMoreResults, ExecuteNextAsync logic and capture the metrics from the paginated result, per iteration.
I have a Neo4j database of size ~ 70Gb. It has 8 datasets that are of the same structure, just different nodes. A simple Cypher query presented below that retrieves some data from one dataset takes forever to run. There are not so many nodes in the dataset, just several thousands. Here is the query:
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene)
WHERE c.DATASET = "cd1_e165" AND g.geneName = "1010001B22Rik"
RETURN c.tsneX, c.tsneY, ex.expr, c.cellId
There is huge amount of :EXPRESSES relationships in total, but if we limit only to the c.DATASET I am sure it should run way faster. Maybe the issue is somehow related to the fact that I am having c.DATASET property in each :Cell, and not having it as a kind of index. What could be done to speed up the query?
First of all you should use the indexes on both properties.
CREATE INDEX ON :Cell(DATASET);
CREATE INDEX ON :Gene(geneName);
Next I would rewrite the query like this (not sure whether this will help but this makes more sense to me and cypher behaves often just like you would expect it to do and in that case it seems rather clear that it should use the indexes and not start searching for all possible paths):
MATCH (c:Cell{DATASET:'cd1_e165'})-[ex:EXPRESSES]->(g:Gene{geneName:'1010001B22Rik'})
RETURN c.tsneX, c.tsneY, ex.expr, c.cellId
As InverseFalcon mentioned: PROFILE and EXPLAINcan always help you understanding what your query does and whether it fits your expectation. Take a look at at the docs.
how I have 2 index one is called assignment and the other user in sql had a data field fk but I do not know how to perform an inner join in elasticsearch someone can support me
So you have a couple options which might be useful, without knowing your specific use case I'm going to list a potentially useful links.
1)
parent child mapping, really useful when you want to return all documents associated with a specific document. To make the mapping process a bit easier I typically index the data the retrieve the mapping using the /_mapping endpoint, modify the mapping, delete the index, then reingest the data. Sometimes that isn't an option in the case of short lived data.
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html
after updating the current mapping it's possible to use one of the joining queries.
https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html
2)
When deleting the index and re ingesting the data isn't an option, create a new index, modify the data as described above, but instead of deleting the index use the re index API to get the information to the new index.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
3)
It might also be possible to use an ingest processor to join the tables
https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest-processors.html
4)
possibly the quickest until you get your head wrapped around how elasticsearch works is to either join the information prior to ingesting or write a script joining the tables using one of the sdk's.
https://elasticsearch-py.readthedocs.io/en/master/
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/index.html
plus a lot more build by the community.
I am looking for a SQL Server LAG/LEAD functions analog in Elasticsearch.
Assume I have a list of documents in result set found by particular criteria. The result set is also ordered in some order.
I know the id of one of the documents in that result set and I need to find next and/or previous document in the same result set.
SQL Server 2012 and above has LAG/LEAD functions to get next/previous row in the recordset. So I wondering if there is such functionality in the elasticsearch.
Could you please point me on the corresponding documentation/examples please?
There isn't. Lots of stuff from relational land doesn't translate directly into Elasticsearch land. What do you want to do with LAG/LEAD? Just getting the ids is simple enough by asking for more results and looking up or down the list. I imagine its something more fun but I don't want to speculate.
I would like to execute a solr query and get only the uniquKey I've defined.
The documents are very big so defining fl='my_key' is not fast enough - all the matching documents are still scanned and the query can take hours (even though the search itself was fast - numFound takes few seconds to return).
I should mention that all the data is stored, and creating a new index is not an option.
One idea I had was to get the docIds of the results and map them to my_key in the code.
I used fl=[docid], thinking it doesn't need scanning to get this info, but it still takes too long to return.
Is there a better way to get the docIds?
Or a way to unstore certain fields without reindexing?
Or perhapse a compeletly different way to get the results without scanning all the fields?
Thanks,
Dafna
Sorry, but the only way is to break your gigantic documents in more than one. I don't see how it will be possible to only match the fields you specified and let the documents alone. This is not how Lucene works.
One could make a document that used only indexed fields that are needed to query to turn the job easier, or break the document based on the queries that are needed. Or simply adding another documents with the structure needed for these new queries. It's up to you.