How to execute pure aql query? - go

I'm using Official Aerospike Package for golang.
Is there any way to get list of all existing indexes this like?
aql> show indexes
Also I haven't found any way to execute pure aql query.
How can I do this?
Update:
I'm looking for something similar to this but for aerospike (example from Rails)
custom_query = "select * from users"
result = ActiveRecord::Base.connection.execute(custom_query)

aql>show indexes
is a valid aql command and should show you all the secondary indexes you currently have on the server.
aql runs the C api underneath. You can do pretty much everything with aql at a rudimnetary level.
Type: aql>help it will throw all the aql commands at you, cut and paste!
aql also stores command history in a text file - so persists over sessions.
aql>run 'filepath/filename' is a handy way to store all aql commands in a text file and run them.
Re: aql query -- look at: select * from ns. where ... you can do equality and range queries provided you have pre-built the secondary indexes.
Aerospike ver 3.12+ introduced predicate filtering - ie ~complex queries - I don't think aql has been updated to run those yet.
HTH.

AQL is an admin and data browsing tool. It's not really Aerospike's SQL, as Aerospike doesn't natively implement a query language. Instead, all the Aerospike clients give you an API to make direct get, put, scan, query calls, and those are procedural, not declarative like SQL (where you state how you want the result and the server figures out a query plan). Piyush mentioned the predicate filtering API which is fantastic and lets you create complex queries over scans and secondary-index queries.
Specifically to your question about getting all the indexes, that's the type of thing you should use the info command for. Aerospike allows you to get and set config parameters through it, and get a wide range of metrics, run microbenchmark, etc. Everything you need for admin and monitoring.
You can run sindex through the standalone asinfo tool, or you can call it using the info command that any client provides.
asinfo -v "sindex"

Related

How much of Talend functionality is translated in SQL-Query and how much in Java?

I am facing an internship and they asked me to learn how to use talend ETL.
I did it, not so difficult.
One of the extra-tasks that have been assigned to me is to verify how much of the operations I set on the design workspace is executed in java and what is done through the use of queries.
I've set up a simple Join using the TMap component and I monitored the SQLdatabase through the use of SQL Profiler. the result is that only the essential create/drop and the select/insert of the table is done via sql while every other thing like the actual join is made "Java" side.
As long as it is an simple operation like join, wouldn't it be convenient to execute it through a query without having to bother java to perform it?
For those who also know SAP, in terms of performance is there so much difference between Talend and SAP?
Only operations in tDB components (create,select,insert, etc) are actually done through SQL. All operations done in other talend components (tMap, tFilter, aggregate, etc) are done through java.
Indeed you'll have better performances doing operations SQL-side. You then have to find the right balance between an "all-in-sql" type of job and an "all-java" one. (it could be harder for a talend developer to debug operations if all the sql part is done through a unique query inside a single component...).
You could definitely have your joins inside a tDBInput component, and output the result in a single output flow.
You can also check ELT* components : they let you use SQL-engine instead of java-engine to perform all operations (join,aggregate,filter) while using a talend interface.

Apollo GraphQL DataLoader DynamoDb

I'm new to GraphQL and am reading about N+1 issue and the dataloader pattern to increase performance. I'm looking at starting a new GraphQL project with DynamoDB for the database. I've done some initial research and found a couple of small NPM packages for dataloader and DynamoDb but they do no seem to be actively supported. So, it seems to me, from my initial research, that DynamoDB may not be the best choice supporting an Apollo GraphQL app.
Is it possible to implement dataloader pattern against DynamoDb database?
Dataloader doesn't care what kind of database you have. All that really matters is that there's some way to batch up your operations.
For example, for fetching a single entity by its ID, with SQL you'd have some query that's a bit like this:
select * from product where id = SOME_ID_1
The batch equivalent of this might be an in query as follows:
select * from product where id in [SOME_ID_1, SOME_ID_2, SOME_ID_3]
The actual mechanism for single vs batch querying is going to vary depending on what database you're using, it may not always be possible but it usually is. A quick search shows that DynamoDB has BatchGetItem which might be what you need.
Batching up queries that take additional parameters (such as pagination, or complex filtering) can be more challenging and may or may not be worth investing the effort. But batching anything that looks like "get X by ID" is always worth it.
In terms of finding libraries that support Dataloader and DynamoDB in particular, I wouldn't worry about it. You don't need this level of tooling. As long as there's some way of constructing the database query, and you can put it inside a function that takes an array of IDs and returns a result in the right shape, you can do it -- and this usually isn't complicated enough to justify adding another library.

Apache Nifi - Federated Search

My team’s been thrown into the deep end and have been asked to build a federated search of customers over a variety of large datasets which hold varying degrees of differing data about each individuals (and no matching identifiers) and I was wondering how to go about implementing it.
I was thinking Apache Nifi would be a good fit to query our various databases, merge the result, deduplicate the entries via an external tool and then push this result into a database which is then queried for use in an Elasticsearch instance for the applications use.
So roughly speaking something like this:-
For examples sake the following data then exists in the result database from the first flow :-

Then running https://github.com/dedupeio/dedupe over this database table which will add cluster ids to aid the record linkage, e.g.:-

Second flow would then query the result database and feed this result into Elasticsearch instance for use by the applications API for querying which would use the cluster id to link the duplicates.
Couple questions:-
How would I trigger dedupe to run on the merged content was pushed to the database?
The corollary question - how would the second flow know when to fetch results for pushing into Elasticsearch? Periodic polling?
I also haven’t considered any CDC process here as the databases will be getting constantly updated which I'd need to handle, so really interested if anybody had solved a similar problem or used different approach (happy to consider other technologies too).
Thanks!
For de-duplicating...
You will probably need to write a custom processor, or use ExecuteScript. Since it looks like a Python library, I'm guessing writing a script for ExecuteScript, unless there is a Java library.
For triggering the second flow...
Do you need that intermediate DB table for something else?
If you do need it, then you can send the success relationship of PutDatabaseRecord as the input to the follow-on ExecuteSQL.
If you don't need it, then you can just go MergeContent -> Dedupe -> ElasticSearch.

elastic search join/scripted query ,using output of subquery

I have a situation to write search query in elasticsearch having data as follows
{id:"p1",person:{name:"name",age:"12"},relatedTO:{id:"p2"}}
{id:"p2",person:{name:"name2",age:"15"},relatedTO:{id:"p3"}}
{id:"p3",person:{name:"name3",age:"17"},relatedTO:{id:"p1"}}
scenario:- user's want to search people related to p2,and using each related person find who they are related to
1.first find who are related to p2 answer= p1
2.Now find people related to p1 answer=p3. (the requirement as of now is to go only 1-level) so no need to find people related to p3.the final result should be p2,p1,p3.
Normal scenario's we will write a nested sql to get results.How do we achieve this using elastic query language in one-shot
With one shot you will need to use Parent-Child-Relationships, but I wouldn't recommand it to you in the first place, because it is not very performant. Btw: also Grandparents and Grandchildren are supported.
You could also use Application Side Joins - meaning you execute several queries, until you get what you want. (Be aware that the first result sets should be very tiny, otherwise this could get costly)
What I would really recommand to you is read this docu and rethink your use case.
In case you want to model relationships like in facebook or google+ I would tell you to look for a NoSQL Graph Database.
Note: Ideally in Elasticsearch the data is flat, which means denormalized.

Adding Advanced Search in ASP.NET MVC 3 / .NET

In a website I am working on, there is an advanced search form with several fields, some of them dynamic that show up / hide depending on what is being selected on the search form.
Data expected to be big in the database and records are spread over several tables in a very normalized fashion.
Is there a recommendation on using a 3rd part search engine, sql server full text search, lucene.net, etc ... other than using SELECT / JOIN queries?
Thank you
Thinking a little outside the box here -
Check out CSLA.NET; Using this framework you can create business objects and "denormalise" your search algorithm.
Either way, be sure the database has proper indexes in place for better performance.
On the frontend youre going to need to use some javascript to map which top level fields show sub level fields. Its pretty straight forward.
For the actual search, I would recommend some flavor of Lucene.
You have your option of the .NET flavor of Lucene.NET which Stackoverflow uses, Solr which is arguably easier to setup and get running than Lucene is, or the newest kid on the block which is ElasticSearch which aims to be schema free and infinitely scalable simply by dropping more instances in the cluster.
I have only used Solr myself, and it has a nice .NET client (SolrNet).
first index your database field that is important and very usable
and for search better use full text search
i try it and result is very different from when i dont use full text
and better use select and join query in stored proc and call sp from your program

Resources