PowerBI / Power Query Dynamic data in query - powerquery

Using PowerBI/PowerQuery I'm using the web connector to query a rest api which returns json. I'm converting that to a table and ending up with a single column of values (id's). Looks like this:
These ids now need to be used in subsequent web queries, one query for each id. This will then yield the 'real' data which I need to get to.
How can I iterate through those id's using each in a new web query?
I have a complete solution in Python and can just import the json files from the Python script into PBI, but I really want to be able to give the PBI report to a colleague who would not touch Python, so I'm keen to find a simple way to achieve this in PBI/PQ.
Would appreciate any pointers on how this could be achieved as simply as possible.

Related

Is there a way to know which query to Hasura generated a given SQL output?

Assuming you've identified queries to inspect on a relational database that are likely running into the pitfall of sending too many too small queries and want to figure out where they come from to give the team sending them a heads up, is there any way to tell what graphql query generated it from the compiled SQL output?
Doing things the other way around where you inspect the compiled output of a known graphql query is easy. But there doesn't seem to be any easy way of acting on feedback from the actual DB?
The Hasura Query log is probably a good place to start. Do you have these logs enabled for your Hasura installation?
If you look for logs of type query-log you'll get a structured JSON object with properties that will have the operation name as well as the GQL query that was submitted to Hasura and the generated_sql that was produced.
You'd be able to match on the generated_sql and then find the actual GQL that caused it using that approach

Query to read data from URL

I am using Greenplum Database.
Need to write query or may be function to read data from a URL.
Say a sharepoint URL is there, which contains some tabular data.
I need to write a query to fetch that Data from within the sql query or function.
I can get http_get, but its not helpful because the version is 8.2x.
I also tried python as pg language, it is also not working as it is listed as untrusted language. Hence looking for some alternative.
Have you tried using web external tables:
https://gpdb.docs.pivotal.io/5170/admin_guide/external/g-creating-and-using-web-external-tables.html

Apache Nifi - Federated Search

My team’s been thrown into the deep end and have been asked to build a federated search of customers over a variety of large datasets which hold varying degrees of differing data about each individuals (and no matching identifiers) and I was wondering how to go about implementing it.
I was thinking Apache Nifi would be a good fit to query our various databases, merge the result, deduplicate the entries via an external tool and then push this result into a database which is then queried for use in an Elasticsearch instance for the applications use.
So roughly speaking something like this:-
For examples sake the following data then exists in the result database from the first flow :-

Then running https://github.com/dedupeio/dedupe over this database table which will add cluster ids to aid the record linkage, e.g.:-

Second flow would then query the result database and feed this result into Elasticsearch instance for use by the applications API for querying which would use the cluster id to link the duplicates.
Couple questions:-
How would I trigger dedupe to run on the merged content was pushed to the database?
The corollary question - how would the second flow know when to fetch results for pushing into Elasticsearch? Periodic polling?
I also haven’t considered any CDC process here as the databases will be getting constantly updated which I'd need to handle, so really interested if anybody had solved a similar problem or used different approach (happy to consider other technologies too).
Thanks!
For de-duplicating...
You will probably need to write a custom processor, or use ExecuteScript. Since it looks like a Python library, I'm guessing writing a script for ExecuteScript, unless there is a Java library.
For triggering the second flow...
Do you need that intermediate DB table for something else?
If you do need it, then you can send the success relationship of PutDatabaseRecord as the input to the follow-on ExecuteSQL.
If you don't need it, then you can just go MergeContent -> Dedupe -> ElasticSearch.

elastic search join/scripted query ,using output of subquery

I have a situation to write search query in elasticsearch having data as follows
{id:"p1",person:{name:"name",age:"12"},relatedTO:{id:"p2"}}
{id:"p2",person:{name:"name2",age:"15"},relatedTO:{id:"p3"}}
{id:"p3",person:{name:"name3",age:"17"},relatedTO:{id:"p1"}}
scenario:- user's want to search people related to p2,and using each related person find who they are related to
1.first find who are related to p2 answer= p1
2.Now find people related to p1 answer=p3. (the requirement as of now is to go only 1-level) so no need to find people related to p3.the final result should be p2,p1,p3.
Normal scenario's we will write a nested sql to get results.How do we achieve this using elastic query language in one-shot
With one shot you will need to use Parent-Child-Relationships, but I wouldn't recommand it to you in the first place, because it is not very performant. Btw: also Grandparents and Grandchildren are supported.
You could also use Application Side Joins - meaning you execute several queries, until you get what you want. (Be aware that the first result sets should be very tiny, otherwise this could get costly)
What I would really recommand to you is read this docu and rethink your use case.
In case you want to model relationships like in facebook or google+ I would tell you to look for a NoSQL Graph Database.
Note: Ideally in Elasticsearch the data is flat, which means denormalized.

What is the easiest way to save a LINQ query for later use?

I have a request for a feature to be able to save a user's search for later.
Right now I'm building LINQ statements on the fly based on what the user has specified.
So I started wondering, is there an easy way for me to simply take the query that the user built, and persist it somewhere, preferably my database, so that I can retrieve it later?
Is there some way of persisting the query as XML or perhaps JSON, and then reconstituting the query later?
Never done this before, but I've had this idea:
Rather than having the query run against your database directly, if you were to have it run against an OData endpoint, you could conceivably extract the URL that is generated as the query string, and save that URL for later use. Since OData has a well-though-out spec already, you would be able to profit from other people's labor.
I'd go with a domain-specific object here even if such goodies did exist -- what happens when you save serialized queries in LINQ and your underlying model changes, invalidating everyone's saved queries. Using your own data format should shield you from this to some extent.
Take a look at the Expression class. This will allow you to pre-compile a query. Although persisting this for later use to the DB for better performance is questionable.
I'm writing this as I watch this presentation at PDC10. Just after the 1-hour mark, he shows how he's built a JSON serializer for expression trees. You might find that interesting.

Resources