Querying a parameter that’s not an index on DynamoDb - go

TableName : people
id | name | age | location
id_1 | A | 23 | New Zealand
id_2 | B | 12 | India
id_3 | C | 26 | Singapore
id_4 | D | 30 | Turkey
keys: id -> hash and age->range
Question 1
I’m trying to execute a query: “Select * from people where age > 25”
I can get it to work queries like “Select age from people where id = id_1 and age > 25” which is not what I need, just need to select all values.
And if I don’t need age to be a range index, how should i modify my query params to just return the list of records matching the criterion: age > 25?
Question 2
AWS throws an error when either Lines 23 or 24-41 are commented.
: Query Error: ValidationException: Either the KeyConditions or KeyConditionExpression parameter must be specified in the request.
status code: 400, request id: []
Is the KeyConditions/KeyConditionsExpressions parameter required? Does it mean that I cannot query the table on a parameter that's not a part of the index?
func queryDynamo() {
log.Println("Enter queryDynamo")
svc := dynamodb.New(nil)
params := &dynamodb.QueryInput{
TableName: aws.String("people"), // Required
Limit: aws.Long(3),
// IndexName: aws.String("localSecondaryIndex"),
ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
":v_age": { // Required
N: aws.String("25"),
},
":v_ID": {
S: aws.String("NULL"),
},
},
FilterExpression: aws.String("age >= :v_age"),
// KeyConditionExpression: aws.String("id = :v_ID and age >= :v_age"),
KeyConditions: map[string]*dynamodb.Condition{
"age": { // Required
ComparisonOperator: aws.String("GT"), // Required
AttributeValueList: []*dynamodb.AttributeValue{
{ // Required
N: aws.String("25"),
},
// More values...
},
},
"id": { // Required
ComparisonOperator: aws.String("EQ"), // Required
// AttributeValueList: []*dynamodb.AttributeValue{
// S: aws.String("NOT_NULL"),
// },
},
// More values...
},
Select: aws.String("ALL_ATTRIBUTES"),
ScanIndexForward: aws.Boolean(true),
}
//Get the response and print it out.
resp, err := svc.Query(params)
if err != nil {
log.Println("Query Error: ", err.Error())
}
// Pretty-print the response data.
log.Println(awsutil.StringValue(resp))
}

DynamoDB is a NoSQL based system so you will not be able to retrieve all of the records based on a condition on a non-indexed field without doing a table scan.
A table scan will cause DynamoDB to go through every single record in the table, which for a big table will be very expensive in either time (it is slow) or money (provisioned read IOPS).
Using a filter is the correct approach and will allow the operation to complete if you switch from a query to a scan. A query must always specify the hash key.
A word of warning though: if you plan on using a scan operation on a table of more than just a few (less than 100) items that is exposed in a front end you will be disappointed with the results. If this is some type of cron job or backend reporting task where response time doesn't matter this is an acceptable approach, but be careful not to exhaust all of your IOPS and impact front end applications.

Related

Not getting right value from state store

I am trying to use state store to merge multiple kafka streams. As part of it , I am consuming messages from multiple topics and put them in state store with keys for Ex :
message from topic1 saved in state store as key_p1 and value1
message from topic2 saved in state store as key_p2 and value2
message from topic3 saved in state store as key_p3 and value3.
To meet SLA , I tired to query the state store to verify if I received mandatory transactions (for ex on topic2 and topic3 , with values key_p2 and key_p3).
val priorityTxns: KeyValueIterator[String, ValueAndTimestamp[String]] = kvStore.range(key_p2,key_p3)
Though i have the messages in state store , most of the times (not always ) I only get only one message.
Is there a way to refresh the store before querying?
Code in my transform method :
`override def transform(kafkaKey: String, value: String): KeyValue[String,String] ={
var key=""
var taggedMsg=""
if((kafkaKey != null) && (kafkaKey != "")) {
key=kafkaKey+txnKeys.get(msgTag).get // this will be like key_p1,key_p2,key_p3 etc.
kvStore.put(key, ValueAndTimestamp.make(taggedMsg, context.timestamp))
log.info("saving value in state store with key " + key + " and value " + kvStore.get(key))
}
}`
and code in inti(context ProcessorContext)
val priorityTxns: KeyValueIterator[String, ValueAndTimestamp[String]] = kvStore.range(key_p2,key_p3)
val tempP3 = kvStore.get(rangeKey + "_p3")
val tempP2 = kvStore.get(rangeKey + "_p2")
while (priorityTxns.hasNext) {
log.info("available priority keys " + priorityTxns.peekNextKey())
val e = priorityTxns.next()
}

DynamoDB Stream - Lambda to process formula

I've got a DynamoDB Table that contains attributes similar to:
{
"pk": "pk1",
"values": {
"v2": 5,
"v1": 90
},
"formula": "(v1 + v2) / 100",
"calc": 5.56
}
I've a Lambda that is triggered by DDB Stream. Is there any way to calculate the "calc" attribute based on the formula and values? Ideally I'd like to do it during update_item call which is updating this table every time Stream sends a message.
Your lambda function can trigger an event like this
def lambda_handler(event, context):
records = event['Records']
for record in records:
new_record = record['dynamodb']['NewImage']
calc = new_record.get('calc')
# do your stuff here
calc = some_functions()
return event

Is it possible to aggregate Loki logs by day on Grafana?

I have a set of logs like this:
{"action": "action_a", "username": "user_1", "ts": "2021-09-10T02:18:14.103Z"}
{"action": "action_a", "username": "user_2", "ts": "2021-09-10T02:17:14.103Z"}
{"action": "action_a", "username": "user_1", "ts": "2021-09-09T02:16:14.103Z"}
{"action": "action_a", "username": "user_1", "ts": "2021-09-08T02:15:14.103Z"}
Is it possible to group the logs by date and username to get a count of unique users per day?
I currently have the following query:
sum by (username) (count_over_time({job="my-app"} | json | username != "" [$__range]))
This effectively gives me a pie chart of unique users for the current dashboard range. Instead, I would like a time-series to show the number of unique users per day (for the past 60 days, for example). In other words, the Daily Active Users (DAU).
With the above logs, I need something like:
{"2021-09-10": 2}
{"2021-09-09": 1}
{"2021-09-08": 1}
Is this possible with Loki or should I look to something like Elasticsearch instead?
To aggregate by day with LogQL, you can replace $__range with your desired time grouping interval.
E.g.
sum by (username) (
count_over_time(
{job="my-app"} | json | username != ""
[1d]) # <-- put your desired time interval here instead of $__range
)
You can then use a time series visualization to show you your data:
Useful links:
supported time intervals
LogQL metric queries documentation
Maybe creating a new label using label_format would do the trick?
Labels format expression
sum by (day) (
count_over_time({job="my-app"} | json | label_format day=`{{.ts| substr 0 10}}` | username != ""[$__range])
)

Golang - nested map doesn't support indexing on inner level, while outer is fine

Almost a go-newbie, and for the first time I have to made a question about it, about a problem with interfaces, types and maps.
So, my starting point is a database query that retrieves an object like this one:
+-------------+---------------------+----------+------------+
| category_id | category_name | group_id | group_name |
+-------------+---------------------+----------+------------+
| 1 | Category1 | 1 | Group1 |
| 1 | Category1 | 2 | Group2 |
| 1 | Category1 | 3 | Group3 |
| 1 | Category2 | 4 | Group4 |
| 2 | Category2 | 5 | Group5 |
+-------------+---------------------+----------+------------+
and my final goal is having a json object with the groups under the same category under that category, like this one:
{
"id": 1,
"name": "category1",
"groups": [
{
"id": 1,
"name": "Group1"
},
{
"id": 2,
"name": "Group2"
},
{
"id": 3,
"name": "Group3"
}
]
},
{
"id": 2,
"name": "Category2",
"groups": [
{
"id": 4,
"name": "Group4"
},
{
"id": 5,
"name": "Group5"
}
]
}
I don't want to use multiple queries, cause this is just a part of the final query, I used just 2 field to be more clear with this example. In my current situation I just have 5 levels...
So I created a struct that should be used on all levels of my object, that implements an interface:
type NestedMapObjs interface {
getOrderedKeys() []int
}
and the type that implements this interface, that should be a map of int in order to append elements to the correct map:
type BuilderMapObjs map[int]NestedMapObj
when NestedMapObject is:
type NestedMapObj struct {
ID int
Name *string
NestedObj NestedMapObjs
}
so, on my method that builds the map object that I want, I have no problem to add the first level of my object (Category) but, I found some problems on the second level, the group one. In particular, this is my function that adds a new row:
func (m BuilderMapObjs) addNewRow(scanned audienceBuilderScannedObject) error {
if _, ok := m[scanned.CategoryID]; !ok {
var innerObjs BuilderMapObjs
innerObjs = make(BuilderMapObjs, 0)
m[scanned.CategoryID] = NestedMapObj{
ID: scanned.CategoryID,
Name: &scanned.CategoryName,
NestedObj: innerObjs,
}
}
if _, ok := m[scanned.CategoryID].NestedObj[scanned.GroupID]; !ok {
m[scanned.CategoryID].NestedObj[scanned.GroupID] = NestedMapObj{
ID: scanned.GroupID,
Name: &scanned.GroupName,
}
}
return nil
}
(I know, I can refactor and make this code more readable, but this is not the point now...)
The problem is when I try to get the inner object by its key, and when I try to add it. This line:
m[scanned.CategoryID].NestedObj[scanned.GroupID]
produce this error: invalid operation: m[scanned.CategoryID].NestedObj[scanned.GroupID] (type NestedMapObjs does not support indexing)
Actually, I just found that with a better implementation, implementing two more methods in the interface (getIndex and addToIndex) I fixed the problem, but I'd like to understand this problem.
Why I have an error on the inner object and not on the outer one?
Thanks for reading until this point!

Neo4j query: conditional match

I have this model:
Bob and Alice are Users
CVI is a clinic
Pluto is an animal
The users have a property called identityId (CONSTRAINT UNIQUE) to identify the user.
I would like to select the User with a given id only if it is the user itself (same identityId) or if it exists a relationship SHARED_WITH between the Alice and Bob.
In terms of performance, is the query below the best query for that?
MATCH (u:User)
WHERE id(u) = {id} AND ((u.identityId = {identityId})
OR ((:User { identityId: {identityId} }) - [:OWNS] -> (:Clinic) <- [:SHARED_WITH] - (u)))
RETURN u
Example
Alice { id: 6, identityId: "5678"}
Bob { id: 3, identityId: "1234"}
Mallory { id: 5, identityId: "2222"}
First case: The caller is Alice
MATCH (u:User)
WHERE id(u) = 6 AND ((u.identityId = "5678")
OR ((:User { identityId: "5678" }) - [:OWNS] -> (:Clinic) <- [:SHARED_WITH] - (u)))
RETURN u
The u is Alice
Second case: The caller is Bob
MATCH (u:User)
WHERE id(u) = 6 AND ((u.identityId = "1234")
OR ((:User { identityId: "1234" }) - [:OWNS] -> (:Clinic) <- [:SHARED_WITH] - (u)))
RETURN u
The u is Alice
Third case: The caller is Mallory
MATCH (u:User)
WHERE id(u) = 6 AND ((u.identityId = "2222")
OR ((:User { identityId: "2222" }) - [:OWNS] -> (:Clinic) <- [:SHARED_WITH] - (u)))
RETURN u
The u is NULL (mallory is neither the user nor the user with Alice has shared its user)
Taking into account your additional explanations:
// Get user by id
MATCH (I:User) WHERE id(I) = {id}
// Whom with given {identityId} shard with him
OPTIONAL MATCH (U:User {identityId: {identityId} })
-[:OWNS]->()<-[:SHARED_WITH]-
(I)
WITH I, COUNT(U) as UC
// Test user {identityId}
// or there are those who with {identityId} are with him shares
WHERE I.identityId = {identityId} OR UC > 0
RETURN I

Resources