Where do additional action paramters go in the ElasticSearch bulk api? - elasticsearch

I'm building a data backfill script for random dataloss in elasticSearch. I only want to add missing documents in an index from the backup; new versions may already exist and I don't want to lose any updates.
The elasticSearch index api allows me to specify the OpType to not update records:
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-index
e.g. [opType=create]
and I'm trying to use the bulk api to make it more efficient: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/docs-bulk.html
What I can't figure out is where to put the 'opType' in the bulk api.
is it in the metadata field so it looks like the following?
{ 'index': { '_index': indexName, '_type': 'data', '_id': <my Id>, 'opType': 'create' } }
{data for item}
or do I put it somewhere else?

As explained in the link you refer to, if you want to use the same semantics as opType: create you need to use the create command instead of the index one:
change this
|
v
{ 'create': { '_index': indexName, '_type': 'data', '_id': <my Id>} }
{ data for item }

Related

How to insert many mutations in a single GraphQL request?

I using https://hygraph.com/, and I want insert (create many products) in a single GraphQL request.
At the moment I know how to insert one product:
mutation {
createProduct(data: { title: "Face Mask", slug: "dfavce-mask", price: 1000 }) {
id
}
}
I read the documentation, but I didn't see information about bulk creation records.
Link for hygraph documentation:
https://hygraph.com/docs/api-reference/content-api/mutations#create-entries
The top-level query you show is just a query against the Mutation type (or another type specified in the schema). Like any other query, it can have multiple fields. At a technical level, the only special thing about GraphQL mutations is that, if you do have multiple fields, they execute sequentially.
Also like other queries, if you want to request the same field multiple times (run similarly-named mutations) you need to use an alias to disambiguate the results.
mutation {
createFaceMask: createProduct(data: { title: "Face Mask" }) { id }
createHandSanitizer: createProduct(data: { title: "Hand Sanitizer" }) { id }
}

Updating Apollo Cache for external query after entity mutation

I'd like to display a list of users, based on a filtered Apollo query
// pseudo query
if (user.name === 'John) return true
User names can be edited. Unfortunately, if I change a user name to James, the user is still displayed in my list (the query is set to fetch from cache first)
I tried to update this by using cache.modify:
cache.modify({
id: cache.identify({
__typename: 'User',
id: userId,
}),
fields: {
name: () => {
return newName; //newName is the input new value
},
},
});
But I'm not quite sure this is the correct way to do so.
Of course, if I use refetchQueries: ['myUsers'], I get the correct result, but obviously, this is a bit overkill to refetch the whole list every time a name is updated.
Did I miss something?

Can I specif the detail field when I have an elasticsearch index exception?

I got an error message when I try to index elasticsearch:
got response {'took': 1, 'errors': True, 'items': [{'index': {'_index': 'mapstore-development-products', '_type': 'product', '_id': '776896', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'number_format_exception', 'reason': 'empty String'}}}}]}
Is there any way that I can know which specific fields are empty from this return result?
This problem is caused by (1) not creating an index mapping before posting the first record (2) You have a field that is empty (string) for your first record, which later you want it to be number.
Elasticsearch will dynamically assign types to your fields if you do not specify your mapping in advance. I don't think you should find the specific empty string field. Instead, you should create your mapping for your index before posting the first record so this problem should be resolved. For your case, you might need to create a new index with correct mapping and then reindex.
See this:
https://discuss.elastic.co/t/how-to-resolve-numberformatexception-issues-caused-by-an-empty-string/5633
https://medium.com/#eyaldahari/reindex-elasticsearch-documents-is-easier-than-ever-103f63d411c

GraphQL query based on a specific value of a field

I want to be able to retrieve the latest release from GitHub for a specific repo using their GraphQL API. To do that, I need to get the latest release where isDraft and isPrerelease are false. I have managed to get the first part, but cant figure out how to do the "where" part of the query.
Here is the basic query I have gotten (https://developer.github.com/v4/explorer/):
{
repository(owner: "paolosalvatori", name: "ServiceBusExplorer") {
releases(first: 1, orderBy: {field: CREATED_AT, direction: DESC}) {
nodes {
name
tagName
resourcePath
isDraft
isPrerelease
}
}
}
}
Which returns:
{
"data": {
"repository": {
"releases": {
"nodes": [
{
"name": "3.0.4",
"tagName": "3.0.4",
"resourcePath": "/paolosalvatori/ServiceBusExplorer/releases/tag/3.0.4",
"isDraft": false,
"isPrerelease": false
}
]
}
}
}
}
I cant seem to find a way to do this. Part of the reason is that I am new to GraphQL (first time trying to do a query) and I am not sure how to frame my question.
Can one only "query" based on those types that support arguments (like repository and releases below)? Seems like there should be a way to specify a filter on the field values.
Repository: https://developer.github.com/v4/object/repository/
Releases: https://developer.github.com/v4/object/releaseconnection/
Node: https://developer.github.com/v4/object/release/
Can one only "query" based on those types that support arguments
Yes: GraphQL doesn't define a generic query language in the same way, say, SQL does. You can't sort or filter a field result in ways that aren't provided by the server and the application schema.
I want to be able to retrieve the latest [non-draft, non-prerelease] release from GitHub for a specific repo using their GraphQl API.
As you've already found, the releases field on the Repository type doesn't have an option to sort or filter on these fields. Instead, you can iterate through the releases one at a time with multiple GraphQL calls. These would individually look like
query NextRelease($owner: String!, $name: String!, $after: String) {
repository(owner: $owner, name: $name) {
releases(first: 1,
orderBy: {field: CREATED_AT, direction: DESC},
after: $after) {
pageInfo { lastCursor }
nodes { ... ReleaseData } # from the question
}
}
}
Run this in the same way you're running it now (I've split out the information identifying the repository into separate GraphQL variables). You can leave off the after variable for the first call. If (as in your example) it returns "isDraft": false, "isPrerelease": false, you're set. If not, you need to try again: take the value from the lastCursor in the response, and run the same query, passing that cursor value as the after variable value.
{
repository(owner: "paolosalvatori", name: "ServiceBusExplorer") {
releases(first: 1, orderBy: {field: CREATED_AT, direction: DESC}) {
nodes(isDraft :false , isPrerelease :false ) {
name
tagName
resourcePath
isDraft
isPrerelease
}
}
}
}
Alternatively please have look at GraphQL directives, as sometimes it's required to skip or include the fields on the basis of the values
#skip or #include.
The skip directive, when used on fields or fragments, allows us to exclude fields based on some condition.
The include directive, allows us to include fields based on some condition
GraphQL Directives

GridFS - product images & thumbnails - what is the best DB sctructure?

I have a e-commerce website working on MongoDB + GridFS.
Each product may have up to 5 images.
Each image has 3 thumbnails of different sizes.
I need an advice on best DB structure for this.
Currently I'm thinking to store image IDs and also thumb IDs (IDs from GridFS) in each product:
{
'_id': 1,
'title': 'Some Product',
'images': [
{'id': '11', thumbs: {'small': '22', 'medium': '33'},
{'id': '44', thumbs: {'small': '55', 'medium': '66'}
]
}
Or would it be better to store path in GridFS?
{
'_id': '111',
'filename': '1.jpg',
'path': 'product/988/image/111/'
},
{
'_id': '222',
'filename': '1.jpg',
'path': 'product/988/image/111/thumbnail_small'
},
{
'_id': '333',
'filename': '1.jpg',
'path': 'product/988/image/111/thumbnail_large'
}
UPDATE: "path" field in GridFS is a "fake" path, not a real one. Just a quick way to find all related files. It is cheaper to have 1 indexed field than several fields with compound indexes.
If you will store the images with GridFS within MongoDB, I would go with the first one.
The second schema doesn't seem to be correct. I mean GridFS is supposed to store files, so with the id of the image you don't need any path within those documents. If you simply want to store the path of the file, directly embedd it into your primary collection, so you don't need this overhead of a somewhat useless collection.
In general see Storing Images in DB - Yea or Nay? if you really should store your images in dbms.
Additionally if you only save the path you may need few to no changes in case you're switching to some CDN to get your images to the customer.

Resources