How do I update MongoDB query results using inner query? - spring

BACKGROUND
I have a collection of json documents that represent chemical compounds. A compound will have an id and a name. An external process generates new compound documents at intervals, and ids may change across iterative generations. Compound documents whose compound ids have changed need to be updated to point to the most recent iterations ids, and as such, a "lastUpdated" field and "relatedCompoundIds" field will be added. To demonstrate, consider the following compounds across 3 steps:
Step 1: initial compound document for 'acetone' is generated with id="001".
{
"id": "001",
"name": "acetone",
"lastUpdated": "2000-01-01",
}
Step 2: another iteration generates acetone, but with a different id.
{
"id": "001",
"name": "acetone",
"lastUpdated": "2000-01-01"
}
{
"id": "002",
"name": "acetone",
"lastUpdated": "2000-01-02"
}
Step 3: compound with id of "001" will append a "relatedCompoundIds" array pointing to any other compounds with the same name.
{
"id": "001",
"name": "acetone",
"lastUpdated": "2000-01-02",
"relatedCompoundIds": ["002"]
}
{
"id": "002",
"name": "acetone",
"lastUpdated": "2000-01-02"
}
I'm using MongoDB to house these records, and to resolve relatedCompoundId "pointers". I'm accessing Mongo using Spring ReactiveMongoTemplate. My process is as follows:
Upsert newly generated compounds into MongoDB.
For each record where "lastUpdated" is before now:
Get all related compounds (searching by name), and set "relatedCompoundIds".
CODE
public class App {
public static void main(String[] args) {
public static ReactiveMongoTemplate mongoOps = new ReactiveMongoTemplate(MongoClients.create(),
"CompoundStore");
Date updatedDate = new Date();
upsertAll(updatedDate, readPath);
setRelatedCompounds(updatedDate);
}
private static void upsertAll(Date updatedDate, String readPath) {
// [upsertion code here] <- this is working fine
}
private static void setRelatedCompounds(Date updatedDate) {
mongoOps.find(//
Query.query(Criteria.where("lastUpdated").lt(updatedDate)), Compound.class, "compound")//
.doOnNext(compound -> {
findRelatedCompounds(updatedDate, compound)//
.doOnSuccess(rc -> {
if (rc.size() > 0) {
compound.setRelatedCompoundIDs(rc);
mongoOps.save(Mono.just(compound)).subscribe();
}
})//
.subscribe();
}).blockLast();
}
private static Mono<List<String>> findRelatedCompounds(Date updatedDate, Compound compound) {
Query query = new Query().addCriteria(new Criteria().andOperator(//
Criteria.where("lastUpdated").gte(updatedDate), //
Criteria.where("name").is(compound.getName)));
query.fields().include("id");
return mongoOps.find(query, Compound.class)//
.map(c -> c.getId())//
.filter(cid -> !StringUtils.isEmpty(cid))//
.distinct().collectSortedList();
}
}
ERROR
Upon running, I get the following error:
17:08:35.957 [Thread-12] ERROR org.mongodb.driver.client - Callback onResult call produced an error
com.mongodb.MongoException: org.springframework.data.mongodb.UncategorizedMongoDbException: Too many operations are already waiting for a connection. Max number of operations (maxWaitQueueSize) of 500 has been exceeded.; nested exception is com.mongodb.MongoWaitQueueFullException: Too many operations are already waiting for a connection. Max number of operations (maxWaitQueueSize) of 500 has been exceeded.
at com.mongodb.MongoException.fromThrowableNonNull(MongoException.java:79)
Is there a better way to accomplish what I'm trying to do?
How do I adjust backpressure so as not to overload the mongo?
Other advice?
EDIT
The above error can be resolved by adding a limitRate modifier after the find method inside setRelatedCompounds.
private static void setRelatedCompounds(Date updatedDate) {
mongoOps.find(//
Query.query(Criteria.where("lastUpdated").lt(updatedDate)), Compound.class, "compound")//
.limitRate(500)//
.doOnNext(compound -> {
// do work here
}).subscribe();
}).blockLast();
}
Still open to suggestions for alternative solutions.

Related

GraphQL java: return a partial response and inform a user about it

I have a SpringBoot application that uses GraphQL to return data to a request.
What I have
One of my queries returns a list of responses based on a list of ids supplied. So my .graphqls file is a follows:
type Query {
texts(ids: [String]): [Response]
}
type Response {
id: String
text: String
}
and the following are request & response:
Request
texts(ids:["id 1","id 2"]){
id
text
}
Response
{
"data": [
{
"id": "id 1",
"text": "Text 1"
},
{
"id": "id 2",
"text": "Text 2"
}
]
}
At the moment, if id(s) is/are not in aws, then exception is thrown and the response is an error block saying that certain id(s) was/were not found. Unfortunately, the response for other ids that were found is not displayed - instead the data block returns a null. If I check wether data is present in the code via ssay if/else statment, then partial response can be returned but I will not know that it is a partial response.
What I want to happen
My application fetches the data from aws and occasionally some of it may not be present, meaning that for one of the supplied ids, there will be no data. Not a problem, I can do checks and simply never process this id. But I would like to inform a user if the response I returned is partial (and some info is missing due to absence of data).
See example of the output I want at the end.
What I tried
While learning about GraphQL, I have encountered an instrumentation - a great tool for logging. Since it goes through all stages of execution, I thought that I can try and change the response midway - the Instrumentation class has a lot of methods, so I tried to find the one that works. I tried to make beginExecution(InstrumentationExecutionParameters parameters) and instrumentExecutionResult(ExecutionResult executionResult, InstrumentationExecutionParameters parameters) to work but neither worked for me.
I think the below may work, but as comments suggests there are parts that I failed to figure out
#Override
public GraphQLSchema instrumentSchema(GraphQLSchema schema, InstrumentationExecutionParameters parameters) {
String id = ""; // how to extract an id from the passed query (without needing to disect parameters.getQuery();
log.info("The id is " + id);
if(s3Service.doesExist(id)) {
return super.instrumentSchema(schema, parameters);
}
schema.transform(); // How would I add extra field
return schema;
}
I also found this post that seem to offer more simpler solution. Unfortunately, the link provided by host does not exist and link provided by the person who answered a question is very brief. I wonder if anyone know how to use this annotation and maybe have an example I can look at?
Finally, I know there is DataFetcherResult which can construct partial response. The problem here is that some of my other apps use reactive programming, so while it will be great for Spring mvc apps, it will not be so great for spring flux apps (because as I understand it, DataFetcherResult waits for all the outputs and as such is a blocker). Happy to be corrected on this one.
Desired output
I would like my response to look like so, when some data that was requested is not found.
Either
{
"data": [
{
"id": "id 1",
"text": "Text 1"
},
{
"id": "id 2",
"text": "Text 2"
},
{
"id": "Non existant id",
"msg": "This id was not found"
}
]
}
or
{
"error": [
"errors": [
{
"message": "There was a problem getting data for this id(s): Bad id 1"
}
]
],
"data": [
{
"id": "id 1",
"text": "Text 1"
},
{
"id": "id 2",
"text": "Text 2"
}
]
}
So I figured out one way of achieving this, using instrumentation and extension block (as oppose to error block which is what I wanted to use initially). The big thanks goes to fellow Joe, who answered this question. Combine it with DataFetchingEnviroment (great video here) variable and I got the working solution.
My instrumentation class is as follows
public class CustomInstrum extends SimpleInstrumentation {
#Override
public CompletableFuture<ExecutionResult> instrumentExecutionResult(
ExecutionResult executionResult,
InstrumentationExecutionParameters parameters) {
if(parameters.getGraphQLContext().hasKey("Faulty ids")) {
Map<Object, Object> currentExt = executionResult.getExtensions();
Map<Object, Object> newExtensionMap = new LinkedHashMap<>();
newExtensionMap.putAll(currentExt == null ? Collections.emptyMap() : currentExt);
newExtensionMap.put("Warning:", "No data was found for the following ids: " + parameters.getGraphQLContext().get("Faulty ids").toString());
return CompletableFuture.completedFuture(
new ExecutionResultImpl(
executionResult.getData(),
executionResult.getErrors(),
newExtensionMap));
}
return CompletableFuture.completedFuture(
new ExecutionResultImpl(
executionResult.getData(),
executionResult.getErrors(),
executionResult.getExtensions()));
}
}
and my DataFetchingEnviroment is in my resolver:
public CompletableFuture<List<Article>> articles(List<String> ids, DataFetchingEnvironment env) {
List<CompletableFuture<Article>> res = new ArrayList<>();
// Below's list would contain the bad ids
List<String> faultyIds = new ArrayList<>();
for(String id : ids) {
log.info("Getting article for id {}",id);
if(s3Service.doesExist(id)) {
res.add(filterService.gettingArticle(id));
} else {
faultyIds.add(id);// if data doesn't exist then id will not be processed
}
}
// if we have any bad ids, then we add the list to the context for instrumentations to pick it up, right before returning a response
if(!faultyIds.isEmpty()) {
env.getGraphQlContext().put("Faulty ids", faultyIds);
}
return CompletableFuture.allOf(res.toArray(new CompletableFuture[0])).thenApply(item -> res.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList()));
}
You can obviously separate error related ids to different contexts but for my simple case, one will suffice. I however still interested in how can the same results be achieved via error block, so i will leave this question hanging for a bit before accepting this as a final answer.
My response looks as follows now:
{
"extensions": {
"Warning:": "No data was found for the following ids: [234]"
},
"data": { ... }
My only concern with this approach is security and "doing the right thing" - is this correct thing to do, adding something to the context and then using instrumentation to influence the response? Are there any potential security issues? If someone know anything about it and could share, it will help me greatly!
Update
After further testing it appears if exception is thrown it will still not work, so it only works if you know beforehand that something goes wrong and add appropriate exception handling. Cannot be used with try/catch block. So I am a half step back again.

Springdocs: Specifying an explicit type for Paged responses

I'm working on a "global search" for my application.
Currently, I'm using hibernate-search to search for instances of multiple different objects and return them to the user.
The relevant code looks as follows:
Search.session(entityManager)
.search(ModelA.classs, ModelB.class)
.where(...)
.sort(...)
.fetch(skip, count);
Skip and count are calculated based on a Pageable and the result is used to create an instance of Page, which will be returned to the controller.
This works as I'd expect, however, the types generated by swagger-docs obviously doesn't know, what the type within the Page is, and therefore uses Object.
I'd like to expose the correct types, as I use them to generate the types for the frontend application.
I was able to set the type to an array, when overwriting the schema like this:
#ArraySchema(schema = #Schema(anyOf = {ModelA.class, ModelB.class}))
public Page<?> search(Pageable pageable) {
However, this just disregards the Page and also isn't correct.
The next thing I tried is extending the PageImpl, overwriting the getContent method, and specifying the same schema on this method, but this wasn't included in the output at all.
Next was implementing Page<T> myself (and later removing the implements reference to Page<T>) and specifying the same schema on getContent, iterator, and the field itself, but also to no effect.
How do I tell spring-docs, what the content of the resulting Page might be?
I stumbled upon this when trying to solve a similar problem
Inspired from this thread Springdoc with a generic return type i came up with the following solution, and it seems to apply to your case also. Code examples are in Kotlin.
I introduced a stub class that will just act as the Schema for the response:
private class PageModel(
#Schema(oneOf = [ModelA::class, ModelB::class]))
content: List<Object>
): PageImpl<Object>(content)
Then i annotated my Controller like this:
#Operation(
responses = [
ApiResponse(
responseCode = "200",
content = [Content(schema = Schema(implementation = PageModel::class))]
)
]
)
fun getPage(pageable: Pageable): Page<Object>
This generated this api response:
"PageModel": {
"properties": {
"content": {
"items": {
"oneOf": [
{
"$ref": "#/components/schemas/ModelA"
},
{
"$ref": "#/components/schemas/ModelB"
}
],
"type": "object"
},
"type": "array"
},
... -> more page stuff from spring's PageImpl<>
And in the "responses" section for the api call:
"responses": {
"200": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/PageModel"
}
}
},
"description": "OK"
}
All generated openapi doc is similar to the autogenerated json when returning a Page, it just rewrites the "content" array property to have a specific type.

Bulk Insert object in Elasticsearch

I am trying create an index and then do a bulk insert using RestHighLevelClient to my ES (the code is in Kotlin).
The bulk insert code is :
private fun insertEntity(entityList: List<Person>, indexName: String) {
var count = 0
val bulkRequest = BulkRequest()
entityList.forEach {
bulkRequest.add(IndexRequest(indexName).source(it,XContentType.JSON))
count++
if (count == batchSize) {
performBulkInsert(bulkRequest)
}
}
}
When executing this, I am getting an exception saying : Limit of 1000 fields is crossed.
On analysing my code, I feel the implementation is wrong, because :
bulkRequest.add(IndexRequest(indexName).source(it,XContentType.JSON))
source takes a String type but I am passing the Person (it)object itself. So I believe that is causing some issue related to 1000 fields based on my mapping or something.
Not sure if my assumption is correct. If yes, how can I achieve the bulk insert then ?
EDIT
Index creation:
private fun createIndex(indexName: String) {
val request = CreateIndexRequest(indexName)
val settings = FileUtils.readFileToString(
ResourceUtils.getFile(
ResourceUtils.CLASSPATH_URL_PREFIX + "settings/settings.json"), "UTF-8")
val mappings = FileUtils.readFileToString(
ResourceUtils.getFile(
ResourceUtils.CLASSPATH_URL_PREFIX + "mappings/personMapping.json"), "UTF-8")
request.settings(Settings
.builder()
.loadFromSource(settings, XContentType.JSON))
.source(mappings, XContentType.JSON)
restHighLevelClient.indices().create(request, RequestOptions.DEFAULT)
}
Mapping.json
Please note original has 16 fields.
{
"properties": {
"accessible": {
"type": "boolean"
},
"person_id": {
"type": "long"
},
"person_name": {
"type": "string",
"analyzer": "lower_keyword"
}
}
}
Thanks.
Looks like you are using the dynamic mapping and due to some mistake when you index a document it ends up creating new fields in your index which crossed the 1000 fields limit.
Please see if you can use the static mapping or debug the code which prepares the document and compare it with your mapping to see if its creating new fields.
Please refer this SO answer to increase the limit if its legitimate or use static mapping or debug the code to figure out why you are adding new fields to elasticsearch index.

azure logic app with table storage get last rowKey

How can I use the "Get Entity for Azure table storage" connector in a Logic App to return the last rowKey.
This would be used in situation where the rowkey is say an integer incremented each time a new entity is added. I recognize the flaw in design of this but this question is about how some sort of where clause or last condition could be used in the Logic app.
Currently the Logic App code view snippet looks like this:
"actions": {
"Get_entity": {
"inputs": {
"host": {
"connection": {
"name": "#parameters('$connections')['azuretables']['connectionId']"
}
},
"method": "get",
"path": "/Tables/#{encodeURIComponent('contactInfo')}/entities(PartitionKey='#{encodeURIComponent('a')}',RowKey='#{encodeURIComponent('b')}')"
},
"runAfter": {},
"type": "ApiConnection"
}
Where I have the hard coded:
RowKey='#{encodeURIComponent('b')}'
This is fine if I always want this rowKey. What I want though is the last rowKey so something sort of like:
RowKey= last(RowKey)
Any idea on how this can be achieved?
This is fine if I always want this rowKey. What I want though is the last rowKey so something sort of like: RowKey= last(RowKey)
AFAIK, there is no build-in functions for you to achieve this purpose. I assumed that you could use the Azure Functions connector to retrieve the new RowKey value. Here are the detailed steps, you could refer to them:
For test, I created a C# Http Trigger function, then add a Azure Table Storage Input, then retrieve all the items under the specific PartitionKey, then order by the RowKey and calculate the new Row Key.
function.json:
{
"bindings": [
{
"authLevel": "function",
"name": "req",
"type": "httpTrigger",
"direction": "in"
},
{
"name": "$return",
"type": "http",
"direction": "out"
},
{
"type": "table",
"name": "inputTable",
"tableName": "SampleTable",
"take": 50,
"connection": "AzureWebJobsDashboard",
"direction": "in"
}
],
"disabled": false
}
run.csx:
#r "Microsoft.WindowsAzure.Storage"
using Microsoft.WindowsAzure.Storage.Table;
using System.Net;
public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, IQueryable<SampleTable> inputTable,TraceWriter log)
{
log.Info("C# HTTP trigger function processed a request.");
// parse query parameter
string pk = req.GetQueryNameValuePairs()
.FirstOrDefault(q => string.Compare(q.Key, "pk", true) == 0)
.Value;
// Get request body
dynamic data = await req.Content.ReadAsAsync<object>();
// Set name to query string or body data
pk = pk ?? data?.pk;
if(pk==null)
return req.CreateResponse(HttpStatusCode.BadRequest, "Please pass a pk on the query string or in the request body");
else
{
var latestItem=inputTable.Where(p => p.PartitionKey == pk).ToList().OrderByDescending(i=>Convert.ToInt32(i.RowKey)).FirstOrDefault();
if(latestItem==null)
return req.CreateResponse(HttpStatusCode.OK,new{newRowKey=1});
else
return req.CreateResponse(HttpStatusCode.OK,new{newRowKey=int.Parse(latestItem.RowKey)+1});
}
}
public class SampleTable : TableEntity
{
public long P1 { get; set; }
public long P2 { get; set; }
}
Test:
For more details about Azure Functions Storage table bindings, you could refer to here.
azure table storage entities are sorted lexicographically. So choose a row key that actually decrements every time you add a new entity, ie. if your row key is an integer that gets incremented when new entity is created than choose your row key as Int.Max - entity.RowKey. The latest entity for that partition key will always be on the top since it is going to have the lowest row key, so all you need to do then to retrieve it is query only with partition key and Take(1). This is called Log Tail pattern, if you want to read more about it.

Changing bags into arrays in Pig Latin

I'm doing some transformations on some data set and need to publish to a sane looking format. Current my final set looks like this when I run describe:
{memberId: long,companyIds: {(subsidiary: long)}}
I need it to look like this:
{memberId: long,companyIds: [long] }
where companyIds is the key to an array of ids of type long?
I'm really struggling with how to manipulate things in this way? Any ideas? I've tried using FLATTEN and other commands to know avail. I'm using AvroStorage to write the files into this schema:
The field schema I need to write this data to looks like this:
"fields": [
{ "name": "memberId", "type": "long"},
{ "name": "companyIds", "type": {"type": "array", "items": "int"}}
]
There is no array type in PIG (http://pig.apache.org/docs/r0.10.0/basic.html#data-types). However, if all you need is a good looking output and if you don't have too many elements in companyIds, you may want to write a simple UDF that converts the bag into a nice formatted string.
Java code
public class BagToString extends EvalFunc<String>
{
#Override
public String exec(Tuple input) throws IOException
{
List<String> strings = new ArrayList<String>();
DataBag bag = (DataBag) input.get(0);
if (bag.size() == 0) {
return null;
}
for (Iterator<Tuple> it = bag.iterator(); it.hasNext();) {
Tuple t = it.next();
strings.add(t.get(0).toString());
}
return StringUtils.join(strings, ":");
}
}
PIG script
foo = foreach bar generate memberId, BagToString(companyIds);
I know this is a bit old, but I recently ran into the same problem.
Based on the avrostorage documentation, using the latest version of pig and avrostorage, it is possible to directly cast bag to avro array.
In your case, you may want something like:
STORE blah INTO 'blah' USING AvroStorage('schema','{your schema}');
where the array field in the schema is
{
"name":"companyIds",
"type":[
"null",
{
"type":"array",
"items":"long"
}
],
"doc":"company ids"
}

Resources