Elasticsearch stop words relative path

Elasticsearch stop words relative path - elasticsearch

Can somebody tell me please what elasticsearch documentation means by relative path to config directory? I dont see any in ES instalation. I need to find a stop words file which is defined in es index like "stopwords_path": "stopwords/slovak.txt" but I cant find any file with this name. May be Win 10 is not able to find it cause it has really poor search engine. Thanks a lot.

As written in the documentation you should create the file slovak.txt according this syntax:
A path (either relative to config location, or absolute) to a
stopwords file configuration. Each stop word should be in its own
"line" (separated by a line break). The file must be UTF-8 encoded.
so you should create a slowak.txt file like this:
a
aby
aj
ak
aká
akáže
aké
akého
akéhože
akej
akejže
akému
akémuže
akéže
ako
akom
akomže
akou
akouže
akože
akú
akúže
aký
akých
akýchže
akým
akými
akýmiže
akýmže
akýže
ale
alebo
ani
áno
asi
avšak
až
ba
bez
bezo
bol
bola
boli
bolo
buď
bude
budem
budeme
budeš
budete
budú
by
byť
cez
cezo
čej
či
čí
čia
čie
čieho
čiemu
čím
čími
čiu
čo
čoho
čom
čomu
čou
čože
ďalší
ďalšia
ďalšie
ďalšieho
ďalšiemu
ďalších
ďalším
ďalšími
ďalšiu
ďalšom
ďalšou
dnes
do
ešte
ho
hoci
i
iba
ich
im
iná
iné
iného
inej
inému
iní
inom
inú
iný
iných
iným
inými
ja
je
jeho
jej
jemu
ju
k
ká
kam
kamže
každá
každé
každého
každému
každí
každou
každú
každý
každých
každým
každými
káže
kde
ké
keď
keďže
kej
kejže
kéže
kie
kieho
kiehože
kiemu
kiemuže
kieže
koho
kom
komu
kou
kouže
kto
ktorá
ktoré
ktorej
ktorí
ktorou
ktorú
ktorý
ktorých
ktorým
ktorými
ku
kú
kúže
ký
kýho
kýhože
kým
kýmu
kýmuže
kýže
lebo
leda
ledaže
len
ma
má
majú
mal
mala
mali
mám
máme
máš
mať
máte
medzi
mi
mňa
mne
mnou
moja
moje
mojej
mojich
mojim
mojimi
mojou
moju
možno
môcť
môj
môjho
môže
môžem
môžeme
môžeš
môžete
môžu
mu
musí
musia
musieť
musím
musíme
musíš
musíte
my
na
nad
nado
najmä
nám
nami
nás
náš
naša
naše
našej
nášho
naši
našich
našim
našimi
našou
ne
neho
nech
nej
nejaká
nejaké
nejakého
nejakej
nejakému
nejakom
nejakou
nejakú
nejaký
nejakých
nejakým
nejakými
nemu
než
nič
ničím
ničoho
ničom
ničomu
nie
niečo
niektorá
niektoré
niektorého
niektorej
niektorému
niektorom
niektorou
niektorú
niektorý
niektorých
niektorým
niektorými
nielen
nich
nim
ním
nimi
no
ňom
ňou
ňu
o
od
odo
on
oň
ona
oňho
oni
ono
ony
po
pod
podľa
podo
pokiaľ
popod
popri
potom
poza
práve
pre
prečo
pred
predo
preto
pretože
pri
s
sa
seba
sebe
sebou
sem
si
sme
so
som
ste
sú
svoj
svoja
svoje
svojho
svojich
svojim
svojím
svojimi
svojou
svoju
ta
tá
tak
taká
takáto
také
takéto
takej
takejto
takého
takéhoto
takému
takémuto
takí
taký
takýto
takú
takúto
takže
tam
táto
teba
tebe
tebou
teda
tej
tejto
ten
tento
ti
tí
tie
tieto
tiež
títo
to
toho
tohto
tohoto
tom
tomto
tomu
tomuto
toto
tou
touto
tu
tú
túto
tvoj
tvoja
tvoje
tvojej
tvojho
tvoji
tvojich
tvojim
tvojím
tvojimi
ty
tých
tým
tými
týmto
u
už
v
vám
vami
vás
váš
vaša
vaše
vašej
vášho
vaši
vašich
vašim
vaším
veď
viac
vo
však
všetci
všetka
všetko
všetky
všetok
vy
z
za
začo
začože
zo
že
This file have to be inside ES_PATH_CONF so in linux is /etc/elasticsearch/ and in windows is C:\ProgramData\Elastic\Elasticsearch\config Then you follow relative path notation. So if it is C:\ProgramData\Elastic\Elasticsearch\config\slowak.txt, you should set your path in this way:
"stopwords_path":"slowak.txt"
if you would put it inside C:\ProgramData\Elastic\Elasticsearch\config\synonym\slowak.txt you you set:
"stopwords_path":"synonym\slowak.txt"

What this documentation means is that you can provide your own path or use the relative file to define your own stop words in a text file.
if you are using the relative path then it should be inside your config folder or elasticsearch, where your elasticsearch.yml is present.
If you choose to have an absolute path, then you can store this file to any location where elasticsearch has access.
Just reproduced your issue and used GET Settings API to tell the current location of this file
For example:
GET yourindex/_settings
Retrurns the path which you gave while creating this setting.
{
"stopwords": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "stopwords",
"creation_date": "1587374021579",
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": [
"and",
"is",
"the"
],
"stopwords_path": "opster.txt". -> this is the file location which in this is relative
}
}
},
"number_of_replicas": "1",
"uuid": "EQyF7JydTXGXoebh52yNpg",
"version": {
"created": "7060199"
}
}
}
}
}
Update: an example with the absolute path given by me on my tar installation of Elasticsearch on the ubuntu EC2 machine and using same GET index setting figures out that.

Related

Computing the center of gravity of a geoJSON Polygon using jq

I have a list of cities described by their polygon in geoJSON files.
I would like to get a sample inside point of the polygon.
Basic maths says the center of gravity is inside the polygon and it suffices to sum all longitudes and all latitudes together then divide it by the number of points.
Full file to process (visualization is available on GitHub)
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[2.41101, 48.72605], [2.41554, 48.72656], [2.41718, 48.72791], [2.4211, 48.72953], [2.42603, 48.72824], [2.42756, 48.72865], [2.42922, 48.72723], [2.43133, 48.72646], [2.43404, 48.72665], [2.43513, 48.72409], [2.42554, 48.7227], [2.42072, 48.72105], [2.41426, 48.71782], [2.41327, 48.71869], [2.41582, 48.72086], [2.41238, 48.72193], [2.41136, 48.72325], [2.41101, 48.72605]]]
},
"properties": {
"code": "94001",
"nom": "Ablon-sur-Seine"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[2.41959, 48.81691], [2.4159, 48.81633], [2.40936, 48.81667], [2.40787, 48.81746
},
"properties": {
"code": "94018",
"nom": "Charenton-le-Pont"
}
},
...
]
}
I already have a command that computes the length of the polygon vertices.
$ curl -s https://raw.githubusercontent.com/gregoiredavid/france-geojson/master/departements/94-val-de-marne/communes-94-val-de-marne.geojson \
> | jq '.features[0].geometry.coordinates[0][][0]' \
> | jq -s 'add/length'
2.4206944444444445
See https://unix.stackexchange.com/questions/13731/
Using jq and simple bash commands, how can I compute the sum of the longitudes and sum of the latitudes as well, and reinject the barycenter in properties field in another geoJSON file?
Thank you.

So if I understood correctly, you're trying to get the averages of the first set of coordinates, then updating the properties to store the result.
.features[] |= (
(.geometry.coordinates[0] | length as $len | reduce .[] as [$x, $y] ([0,0];
[.[0] + $x, .[1] + $y]
) | map(. / $len)) as $barrycenter |
.properties.barycenter = $barrycenter
)

Warning
The "center of gravity" of a polygon is in general not the same as the point defined in terms of the averages of the x and y co-ordinates of its vertices. See e.g. https://math.stackexchange.com/questions/3177/why-doesnt-a-simple-mean-give-the-position-of-a-centroid-in-a-polygon
The averages of the latitudes and of the longitudes
Here is a jq filter that will,
with a single invocation of jq and without redundancy, compute the averages of the latitudes and of the longitudes of each "feature":
.features[].geometry.coordinates[0]
| [ [.[][0]], [.[][1]] ]
| map(add/length)
With the -c command-line option, this produces a stream of arrays, one array for each "feature". The stream begins:
[2.4206944444444445,48.724651111111115]
[2.407614,48.82250133333333]
...
Of course there are other alternatives but note that there is no need to use string interpolation to perform grouping, so there is usually no need for tr as your first version did.

How to write mongodb search in golang using mgo

Here is my JSON file:
[{
"name": "chetan",
"age": 23,
"hobby": ["cricket", "football"]
}, {
"name": "raj",
"age": 24,
"hobby": ["cricket", "golf"]
}]
Here is the golang code I tried but didn't work as expected.
id:= "ket"
c.EnsureIndexKey("hobby")
err = c.Find(bson.M{"$hobby": bson.M{"$search": id,},}).All(&result)
It gives error:
$hobby exit status 1

From $search I'm assuming you're trying to use a text index/search, but in your case that wouldn't work. Text index doesn't support partials. You can still use regex to find those documents, but performance wise it wouldn't be a wise choice probably, unless you can utilize the index - which in your case wouldn't happen.
Still, you could achieve what you want with:
id := "ket"
regex := bson.M{"$regex": bson.RegEx{Pattern: id}}
err = c.Find(bson.M{"hobby": regex}).All(&result)

I think i have have a bug in JSON.Net

So I'm a bit of a beginner think i've got some unexpected behaviour/ a bug, though it may well be operator error rather than anything else but either way I'm stumped and don't know what to do.
I'm reading in a JSON string! from
https://beta-api.betfair.com/exchange/betting/rest/v1/en/navigation/lhm.json
I'm passing it with JSON.Net(v6.0.3 from Nuget) ill get to how in minute, but im getting a error where two of the supposedly unique objects returned have the same ID, something of a problem. While trying to work out where I had mashed it up I looked at the JSON string with the Visual Studio JSON Visualiser and that is showing two different ID's as expected.
Edit
I've uploaded two pictures but had to od it externally and copied in the section of JSON that's relevant.
http://imgur.com/pk2hIJI,SZDSSLh
{
"children": [
{
"children": [
{
"exchangeId": "1",
"id": "1.114548892",
"name": "Moneyline",
"type": "MARKET"
}
],
"id": "27229997",
"name": "Hamilton # Calgary",
"type": "EVENT"
},
{
"children": [
{
"exchangeId": "1",
"id": "1.114548889",
"name": "Moneyline",
"type": "MARKET"
}
],
"id": "27229996",
"name": "Toronto # Ottawa",
"type": "EVENT"
}
],
"id": "74587734296",
"name": "Games 18 July",
"type": "GROUP"
},
To fetch the string i am using an object inherited from HTTPclient, with
BFresponce = Await Me.GetAsync(BetFairBetaAddress & RestAddress & Method)
Dim x = Await BFresponce.Content.ReadAsStringAsync 'not normaly here just so i can veiw the string
Return JsonConvertHelper.DeserializeObject(Of T)(Await BFresponce.Content.ReadAsStreamAsync())
With my own help function
Public Shared Function DeserializeObject(Of T)(stream As Stream) As T
Dim serializer As New JsonSerializer()
Using streamReader As New StreamReader(stream)
Return serializer.Deserialize(streamReader, GetType(T))
End Using
End Function
And the class beeing passed in T is
Namespace BetFairNS
Public Class NavigationData
Public Property name As String
Public Property id As Single
Public Property exchangeId As Integer
Public Property type As NavigationDataType
Public Property children As List(Of NavigationData)
End Class
Public Enum NavigationDataType
EVENT_TYPE
GROUP
[EVENT]
MARKET
RACE
End Enum
End Namespace
So the crux of it is have i mashed this up somewhere? or if its a bug what do I do?

There is nothing wrong with Json.Net. The JSON data file you linked to has 260 instances of recurring IDs, all of them in the Horse Racing category. Here are the first 5:
Duplicate id found: 1.114591860
Path 1: ROOT > Horse Racing > 1600m 3yo > 1600m 3yo
Path 2: ROOT > Horse Racing > FRA > Chant (FRA) 14th Jul > 1600m 3yo
Duplicate id found: 1.114591859
Path 1: ROOT > Horse Racing > 1600m 3yo > To Be Placed
Path 2: ROOT > Horse Racing > FRA > Chant (FRA) 14th Jul > To Be Placed
Duplicate id found: 1.114591864
Path 1: ROOT > Horse Racing > 1600m 3yo > 1600m 3yo
Path 2: ROOT > Horse Racing > FRA > Chant (FRA) 14th Jul > 1600m 3yo
Duplicate id found: 1.114591863
Path 1: ROOT > Horse Racing > 1600m 3yo > To Be Placed
Path 2: ROOT > Horse Racing > FRA > Chant (FRA) 14th Jul > To Be Placed
Duplicate id found: 1.114591869
Path 1: ROOT > Horse Racing > 1600m Grp1 > 1600m Grp1
Path 2: ROOT > Horse Racing > FRA > Chant (FRA) 14th Jul > 1600m Grp1
You can check this simply by downloading the file using a web browser, saving it to disk, then opening it with a text editor and searching for the ID values I've listed. Each one appears twice, at different places in the hierarchy.
Does it say somewhere in the API documentation for this site that all IDs in the JSON will be distinct? It looks to me like they simply decided to list the same node at more than one level for browsing convenience (i.e. list all the races directly under "horse racing" and also list them by country/event). You are probably going to need to change your assumptions about the data and adjust your code accordingly.
EDIT
Now that you have shared the actual ID / name of the node that is giving you trouble, the problem is clear. You've declared the id field of your NavigationData class as Single when it should be String. Single is a floating point type, and is not suitable for holding ID values, even if they may have a decimal point in them.
Again, take a closer look at the actual JSON file. If you search for "Hamilton # Calgary", you will see that it has an ID of 27229997. The other node, "Toronto # Ottawa", immediately beneath it, has an ID of 27229996. In your debugger image, the values both show as 27229996.0 The IDs are getting mangled most likely because Single does not have the capability to represent the number 27229997 exactly as a binary floating point number, so the closest neighboring value is being chosen instead. This is a very bad thing when you need an exact representation (as you always do with an ID).
The key point is to use the right tool for the job. You cannot assume that a third-party ID will always be numeric or contain only a single decimal point, and you will never do math operations on an ID. In short, there's no reason to make it a decimal type. Declare it as String and that will fix the problem. I would also recommend the same for the exchangeId field for the same reason.

Wrong start Prolog in SublimeREPL

I want to use Prolog in SublimeREPL.
I set up correctly the interpreter (swipl or gprolog) in the config file of SublimeREPL and everything works fine except for one thing: the start character "?-" of the line never show up, an if a get a result like "X=1" when I press Enter what happen in the terminal is print a dot "X=1." and make a new line starting with "?-". In SublimeREPL there is no dot and the new line doesn't start with anything.
Is there a solution for have the same behavior of the terminal?

I have partial success with Sublime Text 2 running on MacOS X by using the following Main.sublime-menu file contents:
[
{
"id": "tools",
"children":
[{
"caption": "SublimeREPL",
"mnemonic": "r",
"id": "SublimeREPL",
"children":
[
{"command": "repl_open",
"caption": "Prolog",
"id": "repl_prolog",
"mnemonic": "l",
"args": {
"type": "subprocess",
"encoding": "utf8",
"cmd": ["gprolog"],
"cwd": "$file_path",
"external_id": "prolog",
"syntax": "Packages/Logtalk/Logtalk.tmLanguage"
}
}
]
}]
}
]
I get the GNU Prolog startup banner and I can type queries and get the query results. But there's an issue that I have not yet solved with the handling of new lines: when typing ; to ask the Prolog interpreter for the next proof, I cannot get the last one. For example:
GNU Prolog 1.4.4 (64 bits)
Compiled Apr 23 2013, 17:24:33 with /opt/local/bin/gcc-apple-4.2
By Daniel Diaz
Copyright (C) 1999-2013 Daniel Diaz
| ?- member(X, [1,2,3]).
X = 1 ? ;
;
X = 2 ?
yes
| ?-
Seems to be a problem handling echo and new lines. The first ; is echoed but then I don't get a chance to type a second ; after getting the second solution.

JSON to CSV via FasterCSV

I'm new to Ruby and had a question. I'm trying to create a .rb file that converts JSON to CSV.
I came across some disparate sources that got me to make:
require "rubygems"
require 'fastercsv'
require 'json'
csv_string = FasterCSV.generate({}) do |csv|
JSON.parse(File.open("small.json").read).each do |hash|
csv << hash
end
end
puts csv_string
Now, it does in fact output text but they are all squashed together without spaces, commas etc. How do I make it more customised, clear for a CSV file so I can export that file?
The JSON would look like:
{
"results": [
{
"reportingId": "s",
"listingType": "Business",
"hasExposureProducts": false,
"name": "Medeco Medical Centre World Square",
"primaryAddress": {
"geoCodeGranularity": "PROPERTY",
"addressLine": "Shop 9.01 World Sq Shopng Cntr 644 George St",
"longitude": "151.206172",
"suburb": "Sydney",
"state": "NSW",
"postcode": "2000",
"latitude": "-33.876416",
"type": "VANITY"
},
"primaryContacts": [
{
"type": "PHONE",
"value": "(02) 9264 8500"
}
]
},xxx
}
The CSV to just have something like:
reportingId, s, listingType, Business, name, Medeco Medical...., addressLine, xxxxx, longitude, xxxx, latitude, xxxx, state, NSW, postcode, 2000, type, phone, value, (02) 92648544

Since your JSON structure is a mix of hashes and lists, and also has levels of different heights, it is not as trivial as the code you show. However (assuming your input files always look the same) it shouldn't be hard to write an appropriate converter. On the lowest level, you can transform a hash to CSV by
hash.to_a.flatten
E.g.
input = JSON.parse(File.open("small_file.json").read)
writer = FasterCSV.open("out.csv", "w")
writer << input["results"][0]["primaryAddress"].to_a.flatten
will give you
type,VANITY,latitude,-33.876416,postcode,2000,state,NSW,suburb,Sydney,longitude,151.206172,addressLine,Shop 9.01 World Sq Shopng Cntr 644 George St,geoCodeGranularity,PROPERTY
Hope that guides you the direction.
Btw, your JSON looks invalid. You should change the },xxx line to }].

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch stop words relative path - elasticsearch

Related

Computing the center of gravity of a geoJSON Polygon using jq

How to write mongodb search in golang using mgo

I think i have have a bug in JSON.Net

Wrong start Prolog in SublimeREPL

JSON to CSV via FasterCSV

Categories

Resources