I am trying to write a rule but am running into an issue. I managed to extract the following from as my input:
myData:= [{"Key": "use", "Value": "1"}, {"Key": "use", "Value": "2"}, {"Key": "att1", "Value": "3"}]
I am trying to count the amount of times a key with the value use appears. However when I do:
p := {keep| keep:= myData[_]; myData.Key == "use"}
I assumed this would create a listing of all I would like to keep but the playground errors with:
1 error occurred: policy.rego:24: rego_type_error: undefined ref: data.play.myData.Key
data.play.myData.Key
I hoped I could list them in p and then do count(p) > 1 to check if more that one is listed.
In your set comprehension for p, you're iterating over the objects in myData, assigning each element to keep. Then, you assert something on myData.Key. I think what you're looking for is
p := {keep| keep := myData[_]; keep.Key == "use"}
Be aware that it's a set comprehension, so p would be the same for these two inputs:
myData:= [{"Key": "use", "Value": "1"}]
myData:= [{"Key": "use", "Value": "1"}, {"Key": "use", "Value": "1"}]
You could use an array comprehension (p := [ keep | keep := ... ]) if that's not what you want.
I'm having huge problems with the end portion of a regex in TextMate:
It looks like end becomes the part of the pattern that's returned between begin and end
Trying to apply multiple endings with one negative lookbehind proves unsuccessful
Here is an example code:
property_name: {
test1: [1, 50, 5000]
test2: something ;;
test3: [
1,
50,
5000
]
test4: "string"
test5: [
"text",
"text2"
]
test6: something2
test7: something3
}
I'm using the following code:
"begin": "\\b([a-z_]+):",
"beginCaptures": {
"1": {
"name" : "parameter.name"
}
}
"end": "(?<!,)\\n(?!\\])",
"patterns": [
{
"name": "parameter.value",
"match": "(.+)"
}
]
My logic for the end regular expression is to consider it ended if there's a new line but only if it's not preceded by a comma (list of values in an array) or followed by a closing square bracket (last value in an array).
Unfortunately it's not working as expected.
What I would like to achieve is that all property_name# and test are matched as parameter.name and the values are matched as parameter.value apart from ;;
Can somebody tell me please what elasticsearch documentation means by relative path to config directory? I dont see any in ES instalation. I need to find a stop words file which is defined in es index like "stopwords_path": "stopwords/slovak.txt" but I cant find any file with this name. May be Win 10 is not able to find it cause it has really poor search engine. Thanks a lot.
As written in the documentation you should create the file slovak.txt according this syntax:
A path (either relative to config location, or absolute) to a
stopwords file configuration. Each stop word should be in its own
"line" (separated by a line break). The file must be UTF-8 encoded.
so you should create a slowak.txt file like this:
a
aby
aj
ak
aká
akáže
aké
akého
akéhože
akej
akejže
akému
akémuže
akéže
ako
akom
akomže
akou
akouže
akože
akú
akúže
aký
akých
akýchže
akým
akými
akýmiže
akýmže
akýže
ale
alebo
ani
áno
asi
avšak
až
ba
bez
bezo
bol
bola
boli
bolo
buď
bude
budem
budeme
budeš
budete
budú
by
byť
cez
cezo
čej
či
čí
čia
čie
čieho
čiemu
čím
čími
čiu
čo
čoho
čom
čomu
čou
čože
ďalší
ďalšia
ďalšie
ďalšieho
ďalšiemu
ďalších
ďalším
ďalšími
ďalšiu
ďalšom
ďalšou
dnes
do
ešte
ho
hoci
i
iba
ich
im
iná
iné
iného
inej
inému
iní
inom
inú
iný
iných
iným
inými
ja
je
jeho
jej
jemu
ju
k
ká
kam
kamže
každá
každé
každého
každému
každí
každou
každú
každý
každých
každým
každými
káže
kde
ké
keď
keďže
kej
kejže
kéže
kie
kieho
kiehože
kiemu
kiemuže
kieže
koho
kom
komu
kou
kouže
kto
ktorá
ktoré
ktorej
ktorí
ktorou
ktorú
ktorý
ktorých
ktorým
ktorými
ku
kú
kúže
ký
kýho
kýhože
kým
kýmu
kýmuže
kýže
lebo
leda
ledaže
len
ma
má
majú
mal
mala
mali
mám
máme
máš
mať
máte
medzi
mi
mňa
mne
mnou
moja
moje
mojej
mojich
mojim
mojimi
mojou
moju
možno
môcť
môj
môjho
môže
môžem
môžeme
môžeš
môžete
môžu
mu
musí
musia
musieť
musím
musíme
musíš
musíte
my
na
nad
nado
najmä
nám
nami
nás
náš
naša
naše
našej
nášho
naši
našich
našim
našimi
našou
ne
neho
nech
nej
nejaká
nejaké
nejakého
nejakej
nejakému
nejakom
nejakou
nejakú
nejaký
nejakých
nejakým
nejakými
nemu
než
nič
ničím
ničoho
ničom
ničomu
nie
niečo
niektorá
niektoré
niektorého
niektorej
niektorému
niektorom
niektorou
niektorú
niektorý
niektorých
niektorým
niektorými
nielen
nich
nim
ním
nimi
no
ňom
ňou
ňu
o
od
odo
on
oň
ona
oňho
oni
ono
ony
po
pod
podľa
podo
pokiaľ
popod
popri
potom
poza
práve
pre
prečo
pred
predo
preto
pretože
pri
s
sa
seba
sebe
sebou
sem
si
sme
so
som
ste
sú
svoj
svoja
svoje
svojho
svojich
svojim
svojím
svojimi
svojou
svoju
ta
tá
tak
taká
takáto
také
takéto
takej
takejto
takého
takéhoto
takému
takémuto
takí
taký
takýto
takú
takúto
takže
tam
táto
teba
tebe
tebou
teda
tej
tejto
ten
tento
ti
tí
tie
tieto
tiež
títo
to
toho
tohto
tohoto
tom
tomto
tomu
tomuto
toto
tou
touto
tu
tú
túto
tvoj
tvoja
tvoje
tvojej
tvojho
tvoji
tvojich
tvojim
tvojím
tvojimi
ty
tých
tým
tými
týmto
u
už
v
vám
vami
vás
váš
vaša
vaše
vašej
vášho
vaši
vašich
vašim
vaším
veď
viac
vo
však
všetci
všetka
všetko
všetky
všetok
vy
z
za
začo
začože
zo
že
This file have to be inside ES_PATH_CONF so in linux is /etc/elasticsearch/ and in windows is C:\ProgramData\Elastic\Elasticsearch\config Then you follow relative path notation. So if it is C:\ProgramData\Elastic\Elasticsearch\config\slowak.txt, you should set your path in this way:
"stopwords_path":"slowak.txt"
if you would put it inside C:\ProgramData\Elastic\Elasticsearch\config\synonym\slowak.txt you you set:
"stopwords_path":"synonym\slowak.txt"
What this documentation means is that you can provide your own path or use the relative file to define your own stop words in a text file.
if you are using the relative path then it should be inside your config folder or elasticsearch, where your elasticsearch.yml is present.
If you choose to have an absolute path, then you can store this file to any location where elasticsearch has access.
Just reproduced your issue and used GET Settings API to tell the current location of this file
For example:
GET yourindex/_settings
Retrurns the path which you gave while creating this setting.
{
"stopwords": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "stopwords",
"creation_date": "1587374021579",
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": [
"and",
"is",
"the"
],
"stopwords_path": "opster.txt". -> this is the file location which in this is relative
}
}
},
"number_of_replicas": "1",
"uuid": "EQyF7JydTXGXoebh52yNpg",
"version": {
"created": "7060199"
}
}
}
}
}
Update: an example with the absolute path given by me on my tar installation of Elasticsearch on the ubuntu EC2 machine and using same GET index setting figures out that.
Is the data below in a well-known format, or is this a custom format invented by the generator?
[{
"tmsId": "MV006574730000",
"rootId": "11214341",
"subType": "Feature Film",
"title": "Doctor Strange 3D",
"releaseYear": 2016,
"releaseDate": "2016-11-04",
"titleLang": "en",
"descriptionLang": "en",
"entityType": "Movie",
"genres": ["Action", "Adventure", "Fantasy"],
"longDescription": "Dr. Stephen Strange's (Benedict Cumberbatch) life changes after a car accident robs him of the use of his hands.
When traditional medicine fails him, he looks for healing, and hope,
in a mysterious enclave. He quickly learns that the enclave is at the
front line of a battle against unseen dark forces bent on destroying
reality. Before long, Strange is forced to choose between his life of
fortune and status or leave it all behind to defend the world as the
most powerful sorcerer in existence.",
"shortDescription": "Dr. Stephen Strange discovers the world of magic after meeting the Ancient One.",
"topCast": ["Benedict Cumberbatch", "Chiwetel Ejiofor", "Rachel McAdams"],
"directors": ["Scott Derrickson"],
"officialUrl": "http://marvel.com/doctorstrange",
"ratings": [{
"body": "Motion Picture Association of America",
"code": "PG-13"
}],
Well this is indeed JSON format. I suppose the chunk of data you are giving us here are not the complete data. Because there missing some closing brackets. Well if you delete the last comma "," and put there these: "}]".
Then as you can see it passes validation in the jsonlint.
You can try this here: jsonlint.com
This issue is probably due to my noobishness to ELK, Python, and Unicode.
I have an index containing logstash-digested logs, including a field 'host_req', which contains a host name. Using Elasticsearch-py, I'm pulling that host name out of the record, and using it to search in another index.
However, if the hostname contains multibyte characters, it fails with a UnicodeDecodeError. Exactly the same query works fine when I enter it from the command line with 'curl -XGET'. The unicode character is a lowercase 'a' with a diaeresis (two dots). The UTF-8 value is C3 A4, and the unicode code point seems to be 00E4 (the language is Swedish).
These curl commands work just fine from the command line:
curl -XGET 'http://localhost:9200/logstash-2015.01.30/logs/_search?pretty=1' -d ' { "query" : {"match" :{"req_host" : "www.utkl\u00E4dningskl\u00E4derna.se" }}}'
curl -XGET 'http://localhost:9200/logstash-2015.01.30/logs/_search?pretty=1' -d ' { "query" : {"match" :{"req_host" : "www.utklädningskläderna.se" }}}'
They find and return the record
(the second line shows how the hostname appears in the log I pull it from, showing the lowercase 'a' with a diaersis, in two places)
I've written a very short Python script to show the problem: It uses hardwired queries, printing them and their type, then trying to use them
in a search.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import elasticsearch
es = elasticsearch.Elasticsearch()
if __name__=="__main__":
#uq = u'{ "query": { "match": { "req_host": "www.utklädningskläderna.se" }}}' # raw utf-8 characters. does not work
#uq = u'{ "query": { "match": { "req_host": "www.utkl\u00E4dningskl\u00E4derna.se" }}}' # quoted unicode characters. does not work
#uq = u'{ "query": { "match": { "req_host": "www.utkl\uC3A4dningskl\uC3A4derna.se" }}}' # quoted utf-8 characters. does not work
uq = u'{ "query": { "match": { "req_host": "www.facebook.com" }}}' # non-unicode. works fine
print "uq", type(uq), uq
result = es.search(index="logstash-2015.01.30",doc_type="logs",timeout=1000,body=uq);
if result["hits"]["total"] == 0:
print "nothing found"
else:
print "found some"
If I run it as shown, with the 'facebook' query, it's fine - the output is:
$python testutf8b.py
uq <type 'unicode'> { "query": { "match": { "req_host": "www.facebook.com" }}}
found some
Note that the query string 'uq' is unicode.
But if I use the other three strings, which include the Unicode characters, it blows up. For example, with the second line, I get:
$python testutf8b.py
uq <type 'unicode'> { "query": { "match": { "req_host": "www.utklädningskläderna.se" }}}
Traceback (most recent call last):
File "testutf8b.py", line 15, in <module>
result = es.search(index="logstash-2015.01.30",doc_type="logs",timeout=1000,body=uq);
File "build/bdist.linux-x86_64/egg/elasticsearch/client/utils.py", line 68, in _wrapped
File "build/bdist.linux-x86_64/egg/elasticsearch/client/__init__.py", line 497, in search
File "build/bdist.linux-x86_64/egg/elasticsearch/transport.py", line 307, in perform_request
File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 82, in perform_request
elasticsearch.exceptions.ConnectionError: ConnectionError('ascii' codec can't decode byte 0xc3 in position 45: ordinal not in range(128)) caused by: UnicodeDecodeError('ascii' codec can't decode byte 0xc3 in position 45: ordinal not in range(128))
$
Again, note that the query string is a unicode string (yes, the source code line is the one with the \u00E4 characters).
I'd really like to resolve this. I've tried various combinations of uq = uq.encode("utf=8") and uq = uq.decode("utf=8"), but it doesn't seem to help. I'm starting to wonder if there's an issue in the elasticsearch-py library.
thanks!
pt
PS: This is under Centos 7, using ES 1.5.0. The logs were digested into ES under a slightly older version, using logstash-1.4.2
Basically, you dont need to pass body as string. Use native python datastructures. Or transform them on the fly. Give a try, pls:
>>> import elasticsearch
>>> es = elasticsearch.Elasticsearch()
>>> es.index(index='unicode-index', body={'host': u'www.utklädningskläderna.se'}, doc_type='log')
{u'_id': u'AUyGJuFMy0qdfghJ6KwJ',
u'_index': u'unicode-index',
u'_type': u'log',
u'_version': 1,
u'created': True}
>>> es.search(index='unicode-index', body={}, doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 1.0,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 1.0,
u'total': 1},
u'timed_out': False,
u'took': 5}
>>> es.search(index='unicode-index', body={'query': {'match': {'host': u'www.utklädningskläderna.se'}}}, doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 0.30685282,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 0.30685282,
u'total': 1},
u'timed_out': False,
u'took': 122}
>>> import json
>>> body={'query': {'match': {'host': u'www.utklädningskläderna.se'}}}
>>> es.search(index='unicode-index', body=body, doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 0.30685282,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 0.30685282,
u'total': 1},
u'timed_out': False,
u'took': 4}
>>> es.search(index='unicode-index', body=json.dumps(body), doc_type='log')
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'AUyBTz5CsiBSSvubLioQ',
u'_index': u'unicode-index',
u'_score': 0.30685282,
u'_source': {u'host': u'www.utkl\xe4dningskl\xe4derna.se'},
u'_type': u'log'}],
u'max_score': 0.30685282,
u'total': 1},
u'timed_out': False,
u'took': 5}
>>> json.dumps(body)
'{"query": {"match": {"host": "www.utkl\\u00e4dningskl\\u00e4derna.se"}}}'