I have a set of logs like this:
{"action": "action_a", "username": "user_1", "ts": "2021-09-10T02:18:14.103Z"}
{"action": "action_a", "username": "user_2", "ts": "2021-09-10T02:17:14.103Z"}
{"action": "action_a", "username": "user_1", "ts": "2021-09-09T02:16:14.103Z"}
{"action": "action_a", "username": "user_1", "ts": "2021-09-08T02:15:14.103Z"}
Is it possible to group the logs by date and username to get a count of unique users per day?
I currently have the following query:
sum by (username) (count_over_time({job="my-app"} | json | username != "" [$__range]))
This effectively gives me a pie chart of unique users for the current dashboard range. Instead, I would like a time-series to show the number of unique users per day (for the past 60 days, for example). In other words, the Daily Active Users (DAU).
With the above logs, I need something like:
{"2021-09-10": 2}
{"2021-09-09": 1}
{"2021-09-08": 1}
Is this possible with Loki or should I look to something like Elasticsearch instead?
To aggregate by day with LogQL, you can replace $__range with your desired time grouping interval.
E.g.
sum by (username) (
count_over_time(
{job="my-app"} | json | username != ""
[1d]) # <-- put your desired time interval here instead of $__range
)
You can then use a time series visualization to show you your data:
Useful links:
supported time intervals
LogQL metric queries documentation
Maybe creating a new label using label_format would do the trick?
Labels format expression
sum by (day) (
count_over_time({job="my-app"} | json | label_format day=`{{.ts| substr 0 10}}` | username != ""[$__range])
)
Related
Problem
With UA, I was able to get the number of sessions per region per minute (a combination of minute, region, and sessions), but is this not possible with GA4?
If not, is there any plan to support this in the future?
Detail
I ran GA4 Query Explorer with date, hour, minute, region in Dimensions and sessions in Metrics.
But I got an incompatibility error.
What I tried
I have checked with GA4 Dimensions & Metrics Explorer and confirmed that the combination of minute and region is not possible. (see image below).
(updated 2022/05/16 15:35)Checked by Code Execution
I ran it with ruby.
require "google/analytics/data/v1beta/analytics_data"
require 'pp'
require 'json'
ENV['GOOGLE_APPLICATION_CREDENTIALS'] = '' # service acount file path
client = ::Google::Analytics::Data::V1beta::AnalyticsData::Client.new
LIMIT_SIZE = 1000
offset = 0
loop do
request = Google::Analytics::Data::V1beta::RunReportRequest.new(
property: "properties/xxxxxxxxx",
date_ranges: [
{ start_date: '2022-04-01', end_date: '2022-04-30'}
],
dimensions: %w(date hour minute region).map { |d| { name: d } },
metrics: %w(sessions).map { |m| { name: m } },
keep_empty_rows: false,
offset: offset,
limit: LIMIT_SIZE
)
ret = client.run_report(request)
dimension_headers = ret.dimension_headers.map(&:name)
metric_headers = ret.metric_headers.map(&:name)
puts (dimension_headers + metric_headers).join(',')
ret.rows.each do |row|
puts (row.dimension_values.map(&:value) + row.metric_values.map(&:value)).join(',')
end
offset += LIMIT_SIZE
break if ret.row_count <= offset
end
The result was an error.
3:The dimensions and metrics are incompatible.. debug_error_string:{"created":"#1652681913.393028000","description":"Error received from peer ipv4:172.217.175.234:443","file":"src/core/lib/surface/call.cc","file_line":953,"grpc_message":"The dimensions and metrics are incompatible.","grpc_status":3}
Error in your code, Make sure you use the actual dimension name and not the UI name. The correct name of that dimension is dateHourMinute not Date hour and minute
dimensions: %w(dateHourMinute).map { |d| { name: d } },
The query explore returns this request just fine
results
Limited use for region dimension
The as for region. As the error message states the dimensions and metrics are incompatible. The issue being that dateHourMinute can not be used with region. Switch to date or datehour
at the time of writing this is a beta api. I have sent a message off to google to find out if this is working as intended or if it may be changed.
I'm looking for a method in pure Ruby of taking a timezone string such as "US/Eastern" and using it with a string representation of a timestamp to convert into a timestamp with timezone.
All I've found so far is strptime which supports a short timezone name like "EST" or a timezone offset like "-0500", but not a full timezone string.
I need to be able to run this as part of a Logstash ruby filter. I have some JSON that contains timestamps with no timezone that looks like this:
{
"event": {
"created": "2021-02-15_11-26-29",
},
"Accounts": [
{
"Name": "operator",
"Caption": "SERVER\\operator",
"Domain": "SERVER",
"PasswordChangeable": "False",
"PasswordRequired": "True",
"PasswordExpires": "False",
"Disabled": "False",
"Lockout": "False",
"LocalAccount": "True",
"FullName": "operator",
"Status": "OK",
"LastLogon": "07/08/2020 2:14:13 PM"
},
...
]
}
For the event.created field I can just use a date filter:
date {
match => [ "[event][created]", "yyyy-MM-dd_HH-mm-ss" ]
timezone => "${TIMEZONE}"
target => "[event][created]"
}
Where ${TIMEZONE} is an environment variable holding the full timezone name, i.e. "US/Eastern". But for the Acounts.LastLogin field I can't use a date filter because it resides in a list of variable length, so I have to resort to a ruby filter. The closest I was able to come was this:
ruby {
code => 'event.get("[Accounts]").each_index {|x|
tz = "-0500"
last_logon_str = event.get("[Accounts][#{x}][LastLogon]")
last_logon = DateTime.strptime(last_logon_str + " " + tz, "%m/%d/%Y %I:%M:%S %p %z")
event.set("[users][#{x}][last_logon]", last_logon.strftime("%Y-%m-%dT%H:%M:%S%z"))
}'
}
But of course this is using a hardcoded timezone offset and not the variable containing the full name.
The docs I looked at for the Time object at https://ruby-doc.org/core-2.6/Time.html stated that a Time object can be created using a timezone:
Or a timezone object:
tz = timezone("Europe/Athens") # Eastern European Time, UTC+2
Time.new(2002, 10, 31, 2, 2, 2, tz) #=> 2002-10-31 02:02:02 +0200
Which I could then use to extract the offset, but I couldn't find a reference to timezone anywhere.
What's the best way to handle this?
timezone is part of the TZInfo class. You need to require it. The following code
ruby {
code => '
require "tzinfo"
tz = TZInfo::Timezone.get(ENV["TIMEZONE"])
event.set("offset", tz.observed_utc_offset())
'
}
gets me "offset" => -18000, when $TIMEZONE is "US/Eastern".
I'm using Elasticsearch 6.8 and python 3.
I'm running on my laptop with 1 node, and there are no threads/processes that insert/update/delete docs into the index while I'm running the multi search
I'm running the following multi search command:
es = Elasticsearch()
search_arr = []
# search-1
search_arr.append({'index': 'test1', 'type': 'type1'})
search_arr.append({"query": {"term": {"confidence": "1"}}})
# search-2
search_arr.append({'index': 'test1', 'type': 'type1'})
search_arr.append({"query": {"match_all": {}}, 'from': 0, 'size': 2})
request = ''
for each in search_arr:
request += '%s \n' % json.dumps(each)
res = es.msearch(body=request)
print("First Query, num of results = ", res['responses'][0]['hits']['total'])
print("Second Query, num of results = ", res['responses'][1]['hits']['total'])
Each time I run this code, I'm getting different results (as I wrote before, there are no processes that insert/delete/update documents)
Why I'm getting different result each time ?
And what I need to do in order to fix it and get consistent results ?
I found the problem cause.
I needed to add refresh = True after the first time I added new data.
(I bulked thousands of new documents and right after use the multi search).
Here is my JSON file:
[{
"name": "chetan",
"age": 23,
"hobby": ["cricket", "football"]
}, {
"name": "raj",
"age": 24,
"hobby": ["cricket", "golf"]
}]
Here is the golang code I tried but didn't work as expected.
id:= "ket"
c.EnsureIndexKey("hobby")
err = c.Find(bson.M{"$hobby": bson.M{"$search": id,},}).All(&result)
It gives error:
$hobby exit status 1
From $search I'm assuming you're trying to use a text index/search, but in your case that wouldn't work. Text index doesn't support partials. You can still use regex to find those documents, but performance wise it wouldn't be a wise choice probably, unless you can utilize the index - which in your case wouldn't happen.
Still, you could achieve what you want with:
id := "ket"
regex := bson.M{"$regex": bson.RegEx{Pattern: id}}
err = c.Find(bson.M{"hobby": regex}).All(&result)
TableName : people
id | name | age | location
id_1 | A | 23 | New Zealand
id_2 | B | 12 | India
id_3 | C | 26 | Singapore
id_4 | D | 30 | Turkey
keys: id -> hash and age->range
Question 1
I’m trying to execute a query: “Select * from people where age > 25”
I can get it to work queries like “Select age from people where id = id_1 and age > 25” which is not what I need, just need to select all values.
And if I don’t need age to be a range index, how should i modify my query params to just return the list of records matching the criterion: age > 25?
Question 2
AWS throws an error when either Lines 23 or 24-41 are commented.
: Query Error: ValidationException: Either the KeyConditions or KeyConditionExpression parameter must be specified in the request.
status code: 400, request id: []
Is the KeyConditions/KeyConditionsExpressions parameter required? Does it mean that I cannot query the table on a parameter that's not a part of the index?
func queryDynamo() {
log.Println("Enter queryDynamo")
svc := dynamodb.New(nil)
params := &dynamodb.QueryInput{
TableName: aws.String("people"), // Required
Limit: aws.Long(3),
// IndexName: aws.String("localSecondaryIndex"),
ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
":v_age": { // Required
N: aws.String("25"),
},
":v_ID": {
S: aws.String("NULL"),
},
},
FilterExpression: aws.String("age >= :v_age"),
// KeyConditionExpression: aws.String("id = :v_ID and age >= :v_age"),
KeyConditions: map[string]*dynamodb.Condition{
"age": { // Required
ComparisonOperator: aws.String("GT"), // Required
AttributeValueList: []*dynamodb.AttributeValue{
{ // Required
N: aws.String("25"),
},
// More values...
},
},
"id": { // Required
ComparisonOperator: aws.String("EQ"), // Required
// AttributeValueList: []*dynamodb.AttributeValue{
// S: aws.String("NOT_NULL"),
// },
},
// More values...
},
Select: aws.String("ALL_ATTRIBUTES"),
ScanIndexForward: aws.Boolean(true),
}
//Get the response and print it out.
resp, err := svc.Query(params)
if err != nil {
log.Println("Query Error: ", err.Error())
}
// Pretty-print the response data.
log.Println(awsutil.StringValue(resp))
}
DynamoDB is a NoSQL based system so you will not be able to retrieve all of the records based on a condition on a non-indexed field without doing a table scan.
A table scan will cause DynamoDB to go through every single record in the table, which for a big table will be very expensive in either time (it is slow) or money (provisioned read IOPS).
Using a filter is the correct approach and will allow the operation to complete if you switch from a query to a scan. A query must always specify the hash key.
A word of warning though: if you plan on using a scan operation on a table of more than just a few (less than 100) items that is exposed in a front end you will be disappointed with the results. If this is some type of cron job or backend reporting task where response time doesn't matter this is an acceptable approach, but be careful not to exhaust all of your IOPS and impact front end applications.