Is it possible for me to use the get query to query for a value other than the primary key? Cuz it seems I can only pass in the id column but is there no way in which i could perform the get query with a column other than the id column.
Or can I just do this with a normal list query using maybe a filter or something? Thanks for any help!
Yes you can issue any DynamoDB query through AppSync. This provides a good introduction that covers PutItem, UpdateItem, and GetItem https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-dynamodb-resolvers.html. If you need to get multiple values by a key then you should use the DynamoDB Query operation https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html#aws-appsync-resolver-mapping-template-reference-dynamodb-query.
When using DynamoDB you need to bake your access patterns into the key schema(s) of your DynamoDB table and secondary indexes. For example, if you want to get a record by "email" then you should create a table where the hash key is "email". You would then be able to perform a GetItem operation by "email". If you need to query by email and have records sorted by date, then you would need a table where the hash key is "email" and the sort key is "date". Etc..
You are able to create secondary indexes and if you want to get a bit more advanced create composite index values and overload indexes to optimize your DynamoDB tables for your access patterns. Checkout the DynamoDB docs to learn more https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes.html.
Related
Is there a way to retrieve the underlying values of LowCardinality types in Clickhouse? I would also need to retrieve a mapping (in a separate query) of the underlying values to the logical values. I've tried using lowCardinalityIndices and lowCardinalityKeys but it appears that indices -> keys returned by those functions are a many to many mapping.
Thank you!
Your question does not make sense.
Column with LowCardinality does not have a single dictionary. Each part has multiple dictionaries for a single LowCardinality column. That's why your observe this lowCardinalityIndices/lowCardinalityKeys behaviour.
I'm new to rethinkdb and i love it, but i found some problems when i tried to optimize my query and make it work on bigger datasets.
The problem is simple.
I need to filter my "event" table by timestamp (row.to) , by tag (row.tags), order by timestamp (row.from) and then slice for pagination.
row.tags has a multi index and works well!
row.from and row.to are start/end time of Event.
The slow query (testeded on 100k entries) is this:
r.db("test").table("event")
.getAll(r.args(["148a6e03-b6c3-4092-afa0-3b6d1a4555cd","7008d4b0-d859-49f3-b9e0-2e121f000ddf"]), {"index": "tags"})
.filter(function(row) {return row("to").ge(r.epochTime(1480460400));})
.orderBy(r.asc("from"))
.slice(0,20)
I created an index on 'from' and tried to do
.orderBy(r.asc("from"),{index:'from'})
but i get
e: Indexed order_by can only be performed on a TABLE or TABLE_SLICE in:
I already read about problems about index intersection in Rethinkdb, but maybe i miss something, maybe there is a way of doing this simple task.
Thank you.
The reason RethinkDB complains is this:
getAll returns a selection. When filter is applied to a selection it returns a selection. When orderBy is applied to a selection the index parameter can't be used (it can only be used when orderBy is applied to a table).
orderBy can be applied to a table, sequence or selection. Only when it's applied to table can the index parameter be used. This makes sense as the index is updated when rows are added and removed from the table.
In your case, you are applying orderBy on a result of filter which is a selection. In order to sort a selection the database needs to:
read all elements into memory (by default max is 100,000 elements)
sort them using the provided function or field
and it can't use index in this case.
The way to improve your query might be to sort the table first and then apply the filter. You will be able to use the index in this case.
Given two indexes, I'm trying to sort the first based on values of the second.
For example, Index 1 ('Products') has fields id, name. Index 2 ('Prices') has fields id, price.
Struggling to figure out how to sort 'Products' by the 'Prices'.price, assuming the ids match. Reason for this quest is that hypothetically the 'Products' index becomes very large (with duplicate ids), and updating all documents becomes expensive.
Elasticsearch is a document based store, rather than a column based store. What you're looking for is a way to JOIN the two indices, however this is not supported in Elasticsearch. The 'Elasticsearch way' of storing these documents is to have 1 index that contains all relevant data. If you're worried about update procedures taking very long, look into creating an index with an Alias. When you need to do a major update, do it to a new index and only when you're done switch the alias target to the new index, this will allow you to update you data seamlessly
I am using elasticsearch as a document database and each record I create has a guid id that the system uses for the record id. Business people want to offer a feature to let the user have their own auto file name convention based on date and how many records were created so far this day/month.
What I need is to prevent duplicate user file names. Is there a way to setup an indexed field to be unique? Like a sql unique constraint?
You'd need to use the field that is supposed to be unique as id for your documents. By default a new document with existing id would override the existing document with same id, but you can switch to op_type=create in order to get back an error if a document with same id already exists.
There's no way to have the same behaviour with arbitrary fields though, only the _id field works that way. I would probably consider handling this logic in the application layer instead of within elasticsearch.
One solution will be to use uniqueId field value for specifying document ID and use op_type=create while storing the documents in ES. With this you can make sure your uniqueId field will have unique value and will not be overridden by another same valued document.
For this, the elasticsearch document says:
The index operation also accepts an op_type that can be used to force a create operation, allowing for "put-if-absent" behavior. When create is used, the index operation will fail if a document by that id already exists in the index.
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/es_index/es_type/unique_a?op_type=create' -d '{
"user" : "kimchy",
"uniqueId" : "unique_a"
}'
If you run the above request it is ok, but running it the next time will give you an error.
You can use the _id in the column you want to have unique contraint on.
Here is the sample river that uses postgresql. Yo can change the Database Driver/DB-URL according to your usage.
curl -XPUT localhost:9200/_river/simple_jdbc_river/_meta -d "{\"type\":\"jdbc\",\"jdbc\":{\"strategy\":\"simple\",\"poll\":\"1s\",\"driver\":\"org.postgresql.Driver\",\"url\":\"jdbc:postgresql://DB-URL/DB-INSTANCE\",\"user\":\"USERNAME\",\"password\":\"PASSWORD\",\"sql\":\"select t.id as _id,t.name from topic as t \",\"digesting\" : true},\"index\":{\"index\":\"jdbc\",\"type\":\"topic_jdbc_river1\"}}"
So far as to ES 7.5, there is no such extra "constraint" to ensure uniqueness using a custom field in the mapping.
But you still can walk around it via your own application UUID, which could be used directly explicitly as the _id (which is unique implictly) to achieve your goals.
PUT <your_index_name>/_doc/<your_app_uuid>
{
"a_field": "a_value"
}
Another approach might be to generate the string you store in a field that should be unique by integrating an auto-incrementing integer. This way you ensure from the start that your field values are unique.
You would put your file name together like this:
<current day/month>_<auto-incremented integer>
Auto-incrementing integers are not supported by Elasticsearch per se but you could mimic them using this approach. If you happen to use node.js you can use the es-sequence module.
I am struck in a data operation where I want to sort results of a query by a Ref field.
Lets say I have the following Data Objects.
EmployeeDO {Long id, String name, Ref refCompany}
CompanyDO {Long id, String name}
Now i want to query employees arranged by company name.
I tried the query
Query<EmployeeDO> query = ofy().load().type(EmployeeDO.class).order("refCompany");
Obviously this did not sort results with company name, but this compiled successfully too.
Please suggest if such sorting is possible by this way or some other workaround can be tried?
You can order by refCompany if you #Index refCompany, but it won't sort by company name - it will index by the key (if you aren't using #Parent, just an id order).
There are two 'usual' choices:
Load the data into ram and sort there. This is what an RDBMS would do internally. It's not exactly true that GAE doesn't support joins; it's just that you're the query planner.
Denormalize and pre-index the companyName. Put #Index companyName in the EmployeeDO. This is what you would do with an RDBMS if you the magic sorting performed poorly (say, there are too many employees).