I've this JSON data, extracted from qbittorrent:
[
{
"hash": "333333333333333333333333333",
"name": "testtosearchcaseinsensitive",
"magnet_uri": "magnet:somedata",
"size": 1243989552,
"progress": 1.0,
"dlspeed": 0,
"upspeed": 0,
"priority": 0,
"num_seeds": 0,
"num_complete": 2,
"num_leechs": 0,
"num_incomplete": 32,
"ratio": 0.0,
"eta": "1.01:11:52",
"state": "stalledUP",
"seq_dl": false,
"f_l_piece_prio": false,
"category": "category",
"tags": "",
"super_seeding": false,
"force_start": false,
"save_path": "/data/path/",
"added_on": 1567358333,
"completion_on": 1567366287,
"tracker": "somedata",
"dl_limit": null,
"up_limit": null,
"downloaded": 1244073666,
"uploaded": 0,
"downloaded_session": 0,
"uploaded_session": 0,
"amount_left": 0,
"completed": 1243989552,
"ratio_limit": 1.0,
"seen_complete": 1567408837,
"last_activity": 1567366979,
"time_active": "1.01:00:41",
"auto_tmm": true,
"total_size": 1243989552,
"max_ratio": 1,
"max_seeding_time": 2880,
"seeding_time_limit": 2880
},
{
"hash": "44444444444444",
"name": "dontmatch",
"magnet_uri": "magnet:somedata",
"size": 2996838603,
"progress": 1.0,
"dlspeed": 0,
"upspeed": 0,
"priority": 0,
"num_seeds": 0,
"num_complete": 12,
"num_leechs": 0,
"num_incomplete": 0,
"ratio": 0.06452786606740063,
"eta": "100.00:00:00",
"state": "stalledUP",
"seq_dl": false,
"f_l_piece_prio": false,
"category": "category",
"tags": "",
"super_seeding": false,
"force_start": false,
"save_path": "/data/path/",
"added_on": 1566420155,
"completion_on": 1566424710,
"tracker": "some data",
"dl_limit": null,
"up_limit": null,
"downloaded": 0,
"uploaded": 193379600,
"downloaded_session": 0,
"uploaded_session": 0,
"amount_left": 0,
"completed": 2996838603,
"ratio_limit": -2.0,
"seen_complete": 4294967295,
"last_activity": 1566811636,
"time_active": "10.23:07:42",
"auto_tmm": true,
"total_size": 2996838603,
"max_ratio": -1,
"max_seeding_time": -1,
"seeding_time_limit": -2
}
]
So I want to match all data where the name have some text, so, in Bash I write this but I can't make it work.
Some declaration to start, actually I pass data via arguments, so I use $1:
TXTIWANT="test"
MYJSONDATA= Here I put my JSON data
Then this jq equation that doesn't work for me is this:
RESULTS=$(echo "$MYJSONDATA" | jq --raw-output --arg TOSEARCH "$TXTIWANT" '.[] | select(.name|test("$TOSEARCH.";"i")) .name')
But I always got an error or all data, I think because $TOSEARCH is not expanded.
Maybe there's a better way to search a string inside a value?
What I do wrong?
The right syntax for variable (or filter) interpolation with jq looks like this:
"foo \(filter_or_var) bar"
In your case:
jq --raw-output --arg TOSEARCH "$TXTIWANT" '.[]select(.name|test("\($TOSEARCH).";"i")) .name')
side-note: By convention, environment variables (PAGER, EDITOR, ...) and internal shell variables (SHELL, BASH_VERSION, ...) are capitalized. All other variable names should be lower case.
If (as suggested by the name TXTIWANT and by the example, as well as by the wording of the question) the value of "$TXTIWANT" is supposed to be literal text, then using test is problematic, as test will search for a regular expression.
Since it is not clear from the question why you are adding a period to TOSEARCH, in the remainder of this first section, I will ignore whatever requirement you have in mind regarding that.
So if you simply want to find the .name values that contain $TXTIWANT literally (ignoring case), then you could convert both .name and the value of $TXTIWANT to the same case, and then check for containment.
In jq, ignoring the mysterious ".", this could be done like so:
jq --raw-output --arg TOSEARCH "$TXTIWANT" '
($TOSEARCH|ascii_upcase) as $up
| .[]
| .name
| select(ascii_upcase|index($up))'
Search for non-terminal occurrence of $TXTIWANT ignoring case
If the "." signifies there must be an additional character after $TXTIWANT, then you could just add another select as follows:
($TOSEARCH|length) as $len
| ($TOSEARCH|ascii_upcase) as $up
| .[]
| .name
| (ascii_upcase|index($up)) as $ix
| select($ix)
| select($ix + $len < length)
Related
I like to filter json files using jq:
jq . some.json
Given the json containing an array of objects:
{
"theList": [
{
"id": 1,
"name": "Horst"
},
{
"id": 2,
"name": "Fritz"
},
{
"id": 3,
"name": "Walter"
},
{
"id": 4,
"name": "Gerhart"
},
{
"id": 5,
"name": "Harmut"
}
]
}
I want to filter that list to only show the elements with id having the value 2 and 4, so the expected output is:
{
"id": 2,
"name": "Fritz"
},
{
"id": 4,
"name": "Gerhart"
}
How do I filter the json using jq? I have played around with select and map, yet didn't got any of those to work, e.g.:
$ jq '.theList[] | select(.id == 2) or select(.id == 4)' array.json
true
From the docs:
jq '.[] | select(.id == "second")'
Input [{"id": "first", "val": 1}, {"id": "second", "val": 2}]
Output {"id": "second", "val": 2}
I think you can do something like this:
jq '.theList[] | select(.id == 2 or .id == 4)' array.json
You could use select within map.
.theList | map(select(.id == (2, 4)))
Or more compact:
[ .theList[] | select(.id == (2, 4)) ]
Though written that way is a little inefficient since the expression is duplicated for every value being compared. It'll be more efficient and possibly more readable written this way:
[ .theList[] | select(any(2, 4; . == .id)) ]
Using select(.id == (2, 4)) here is generally inefficient (see below).
If your jq has IN/1, then it can be used to achieve a more efficient solution:
.theList[] | select( .id | IN(2,3))
If your jq does not have IN/1, then you can define it as follows:
def IN(s): first(select(s == .)) // false;
Efficiency
One way to see the inefficiency is to use debug. The following expression, for example, results in 10 calls to debug, whereas only 9 checks for equality are actually needed:
.theList[] | select( (.id == (2,3)) | debug )
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",true]
{
"id": 2,
"name": "Fritz"
}
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",true]
{
"id": 3,
"name": "Walter"
}
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",false]
index/1
In principle, using index/1 should be efficient, but as of this writing (October 2017), its implementation, though fast (it is written in C), is inefficient.
Here is a solution using indices:
.theList | [ .[map(.id)|indices(2,4)[]] ]
I am trying to write a rule but am running into an issue. I managed to extract the following from as my input:
myData:= [{"Key": "use", "Value": "1"}, {"Key": "use", "Value": "2"}, {"Key": "att1", "Value": "3"}]
I am trying to count the amount of times a key with the value use appears. However when I do:
p := {keep| keep:= myData[_]; myData.Key == "use"}
I assumed this would create a listing of all I would like to keep but the playground errors with:
1 error occurred: policy.rego:24: rego_type_error: undefined ref: data.play.myData.Key
data.play.myData.Key
I hoped I could list them in p and then do count(p) > 1 to check if more that one is listed.
In your set comprehension for p, you're iterating over the objects in myData, assigning each element to keep. Then, you assert something on myData.Key. I think what you're looking for is
p := {keep| keep := myData[_]; keep.Key == "use"}
Be aware that it's a set comprehension, so p would be the same for these two inputs:
myData:= [{"Key": "use", "Value": "1"}]
myData:= [{"Key": "use", "Value": "1"}, {"Key": "use", "Value": "1"}]
You could use an array comprehension (p := [ keep | keep := ... ]) if that's not what you want.
I have a list of dictionaries with each entry having the following structure
{
"id": 0,
"type": "notification",
"name": "jane doe",
"loc": {
"lat": 38.8239,
"long": 104.7001
},
"data": [
{
"type": "test",
"time": "Fri Aug 13 09:17:16 2021",
"df": 80000000,
"db": 1000000,
"tp": 92
},
{
"type": "real",
"time": "Sat Aug 14 09:21:30 2021",
"df": 70000000,
"db": 2000000,
"tp:": 97
}
]
}
I need to be able to sort this list by any of these keys: name, type, time, tp and return it in memory.
I understand how to sort by the top level keys sorted(json_list, key=lambda k:k['name']) or even nested keys. For instance by lat sorted(json_list, key=lambda k:k['loc']['lat'])
so currently I have a function that works for the case when sorting by name.
def sort_by(self, param, rev=False):
if param == NAME:
self.json_list = sorted(self.json_list, key=lambda k: k[param], reverse=rev)
else:
# need help here
I'm having trouble sorting by type, time, and tp. Notice the data key is also a list of dictionaries. I would like to leverage existing methods built into the standard lib if possible. I can provide more clarification if necessary
Update:
def sort_by(self, param, rev=False):
if param == NAME:
self.json_list = sorted(self.json_list, key=lambda k: k[param], reverse=rev)
else:
self.json_list = sorted(self.json_list, key=lambda k: k['data'][0][param], reverse=rev)
return self.json_list
This works fine if there is only one item in the data list
If json_list[i]['data'] (for each i) only contains one dict, then the following should work; otherwise modifications are required.
sorted(json_list, key = lambda k: (
k['name'], k['data']['type'], k['data']['time'], k['data']['tp']
))
I like to filter json files using jq:
jq . some.json
Given the json containing an array of objects:
{
"theList": [
{
"id": 1,
"name": "Horst"
},
{
"id": 2,
"name": "Fritz"
},
{
"id": 3,
"name": "Walter"
},
{
"id": 4,
"name": "Gerhart"
},
{
"id": 5,
"name": "Harmut"
}
]
}
I want to filter that list to only show the elements with id having the value 2 and 4, so the expected output is:
{
"id": 2,
"name": "Fritz"
},
{
"id": 4,
"name": "Gerhart"
}
How do I filter the json using jq? I have played around with select and map, yet didn't got any of those to work, e.g.:
$ jq '.theList[] | select(.id == 2) or select(.id == 4)' array.json
true
From the docs:
jq '.[] | select(.id == "second")'
Input [{"id": "first", "val": 1}, {"id": "second", "val": 2}]
Output {"id": "second", "val": 2}
I think you can do something like this:
jq '.theList[] | select(.id == 2 or .id == 4)' array.json
You could use select within map.
.theList | map(select(.id == (2, 4)))
Or more compact:
[ .theList[] | select(.id == (2, 4)) ]
Though written that way is a little inefficient since the expression is duplicated for every value being compared. It'll be more efficient and possibly more readable written this way:
[ .theList[] | select(any(2, 4; . == .id)) ]
Using select(.id == (2, 4)) here is generally inefficient (see below).
If your jq has IN/1, then it can be used to achieve a more efficient solution:
.theList[] | select( .id | IN(2,3))
If your jq does not have IN/1, then you can define it as follows:
def IN(s): first(select(s == .)) // false;
Efficiency
One way to see the inefficiency is to use debug. The following expression, for example, results in 10 calls to debug, whereas only 9 checks for equality are actually needed:
.theList[] | select( (.id == (2,3)) | debug )
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",true]
{
"id": 2,
"name": "Fritz"
}
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",true]
{
"id": 3,
"name": "Walter"
}
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",false]
index/1
In principle, using index/1 should be efficient, but as of this writing (October 2017), its implementation, though fast (it is written in C), is inefficient.
Here is a solution using indices:
.theList | [ .[map(.id)|indices(2,4)[]] ]
This issue is probably due to my noobishness to ELK, Python, and Unicode.
I have an index containing logstash-digested logs, including a field 'host_req', which contains a host name. Using Elasticsearch-py, I'm pulling that host name out of the record, and using it to search in another index.
However, if the hostname contains multibyte characters, it fails with a UnicodeDecodeError. Exactly the same query works fine when I enter it from the command line with 'curl -XGET'. The unicode character is a lowercase 'a' with a diaeresis (two dots). The UTF-8 value is C3 A4, and the unicode code point seems to be 00E4 (the language is Swedish).
These curl commands work just fine from the command line:
curl -XGET 'http://localhost:9200/logstash-2015.01.30/logs/_search?pretty=1' -d ' { "query" : {"match" :{"req_host" : "www.utkl\u00E4dningskl\u00E4derna.se" }}}'
curl -XGET 'http://localhost:9200/logstash-2015.01.30/logs/_search?pretty=1' -d ' { "query" : {"match" :{"req_host" : "www.utklädningskläderna.se" }}}'
They find and return the record
(the second line shows how the hostname appears in the log I pull it from, showing the lowercase 'a' with a diaersis, in two places)
I've written a very short Python script to show the problem: It uses hardwired queries, printing them and their type, then trying to use them
in a search.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import elasticsearch
es = elasticsearch.Elasticsearch()
if __name__=="__main__":
#uq = u'{ "query": { "match": { "req_host": "www.utklädningskläderna.se" }}}' # raw utf-8 characters. does not work
#uq = u'{ "query": { "match": { "req_host": "www.utkl\u00E4dningskl\u00E4derna.se" }}}' # quoted unicode characters. does not work
#uq = u'{ "query": { "match": { "req_host": "www.utkl\uC3A4dningskl\uC3A4derna.se" }}}' # quoted utf-8 characters. does not work
uq = u'{ "query": { "match": { "req_host": "www.facebook.com" }}}' # non-unicode. works fine
print "uq", type(uq), uq
result = es.search(index="logstash-2015.01.30",doc_type="logs",timeout=1000,body=uq);
if result["hits"]["total"] == 0:
print "nothing found"
else:
print "found some"
If I run it as shown, with the 'facebook' query, it's fine - the output is:
$python testutf8b.py
uq <type 'unicode'> { "query": { "match": { "req_host": "www.facebook.com" }}}
found some
Note that the query string 'uq' is unicode.
But if I use the other three strings, which include the Unicode characters, it blows up. For example, with the second line, I get:
$python testutf8b.py
uq <type 'unicode'> { "query": { "match": { "req_host": "www.utklädningskläderna.se" }}}
Traceback (most recent call last):
File "testutf8b.py", line 15, in <module>
result = es.search(index="logstash-2015.01.30",doc_type="logs",timeout=1000,body=uq);
File "build/bdist.linux-x86_64/egg/elasticsearch/client/utils.py", line 68, in _wrapped
File "build/bdist.linux-x86_64/egg/elasticsearch/client/__init__.py", line 497, in search
File "build/bdist.linux-x86_64/egg/elasticsearch/transport.py", line 307, in perform_request
File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 82, in perform_request
elasticsearch.exceptions.ConnectionError: ConnectionError('ascii' codec can't decode byte 0xc3 in position 45: ordinal not in range(128)) caused by: UnicodeDecodeError('ascii' codec can't decode byte 0xc3 in position 45: ordinal not in range(128))
$
Again, note that the query string is a unicode string (yes, the source code line is the one with the \u00E4 characters).
I'd really like to resolve this. I've tried various combinations of uq = uq.encode("utf=8") and uq = uq.decode("utf=8"), but it doesn't seem to help. I'm starting to wonder if there's an issue in the elasticsearch-py library.
thanks!
pt
PS: This is under Centos 7, using ES 1.5.0. The logs were digested into ES under a slightly older version, using logstash-1.4.2
Basically, you dont need to pass body as string. Use native python datastructures. Or transform them on the fly. Give a try, pls:
>>> import elasticsearch
>>> es = elasticsearch.Elasticsearch()
>>> es.index(index='unicode-index', body={'host': u'www.utklädningskläderna.se'}, doc_type='log')
{u'_id': u'AUyGJuFMy0qdfghJ6KwJ',
u'_index': u'unicode-index',
u'_type': u'log',
u'_version': 1,
u'created': True}
>>> es.search(index='unicode-index', body={}, doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 1.0,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 1.0,
u'total': 1},
u'timed_out': False,
u'took': 5}
>>> es.search(index='unicode-index', body={'query': {'match': {'host': u'www.utklädningskläderna.se'}}}, doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 0.30685282,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 0.30685282,
u'total': 1},
u'timed_out': False,
u'took': 122}
>>> import json
>>> body={'query': {'match': {'host': u'www.utklädningskläderna.se'}}}
>>> es.search(index='unicode-index', body=body, doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 0.30685282,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 0.30685282,
u'total': 1},
u'timed_out': False,
u'took': 4}
>>> es.search(index='unicode-index', body=json.dumps(body), doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 0.30685282,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 0.30685282,
u'total': 1},
u'timed_out': False,
u'took': 5}
>>> json.dumps(body)
'{"query": {"match": {"host": "www.utkl\\u00e4dningskl\\u00e4derna.se"}}}'