I have a list of cities described by their polygon in geoJSON files.
I would like to get a sample inside point of the polygon.
Basic maths says the center of gravity is inside the polygon and it suffices to sum all longitudes and all latitudes together then divide it by the number of points.
Full file to process (visualization is available on GitHub)
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[2.41101, 48.72605], [2.41554, 48.72656], [2.41718, 48.72791], [2.4211, 48.72953], [2.42603, 48.72824], [2.42756, 48.72865], [2.42922, 48.72723], [2.43133, 48.72646], [2.43404, 48.72665], [2.43513, 48.72409], [2.42554, 48.7227], [2.42072, 48.72105], [2.41426, 48.71782], [2.41327, 48.71869], [2.41582, 48.72086], [2.41238, 48.72193], [2.41136, 48.72325], [2.41101, 48.72605]]]
},
"properties": {
"code": "94001",
"nom": "Ablon-sur-Seine"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[2.41959, 48.81691], [2.4159, 48.81633], [2.40936, 48.81667], [2.40787, 48.81746
},
"properties": {
"code": "94018",
"nom": "Charenton-le-Pont"
}
},
...
]
}
I already have a command that computes the length of the polygon vertices.
$ curl -s https://raw.githubusercontent.com/gregoiredavid/france-geojson/master/departements/94-val-de-marne/communes-94-val-de-marne.geojson \
> | jq '.features[0].geometry.coordinates[0][][0]' \
> | jq -s 'add/length'
2.4206944444444445
See https://unix.stackexchange.com/questions/13731/
Using jq and simple bash commands, how can I compute the sum of the longitudes and sum of the latitudes as well, and reinject the barycenter in properties field in another geoJSON file?
Thank you.
So if I understood correctly, you're trying to get the averages of the first set of coordinates, then updating the properties to store the result.
.features[] |= (
(.geometry.coordinates[0] | length as $len | reduce .[] as [$x, $y] ([0,0];
[.[0] + $x, .[1] + $y]
) | map(. / $len)) as $barrycenter |
.properties.barycenter = $barrycenter
)
Warning
The "center of gravity" of a polygon is in general not the same as the point defined in terms of the averages of the x and y co-ordinates of its vertices. See e.g. https://math.stackexchange.com/questions/3177/why-doesnt-a-simple-mean-give-the-position-of-a-centroid-in-a-polygon
The averages of the latitudes and of the longitudes
Here is a jq filter that will,
with a single invocation of jq and without redundancy, compute the averages of the latitudes and of the longitudes of each "feature":
.features[].geometry.coordinates[0]
| [ [.[][0]], [.[][1]] ]
| map(add/length)
With the -c command-line option, this produces a stream of arrays, one array for each "feature". The stream begins:
[2.4206944444444445,48.724651111111115]
[2.407614,48.82250133333333]
...
Of course there are other alternatives but note that there is no need to use string interpolation to perform grouping, so there is usually no need for tr as your first version did.
Related
I like to filter json files using jq:
jq . some.json
Given the json containing an array of objects:
{
"theList": [
{
"id": 1,
"name": "Horst"
},
{
"id": 2,
"name": "Fritz"
},
{
"id": 3,
"name": "Walter"
},
{
"id": 4,
"name": "Gerhart"
},
{
"id": 5,
"name": "Harmut"
}
]
}
I want to filter that list to only show the elements with id having the value 2 and 4, so the expected output is:
{
"id": 2,
"name": "Fritz"
},
{
"id": 4,
"name": "Gerhart"
}
How do I filter the json using jq? I have played around with select and map, yet didn't got any of those to work, e.g.:
$ jq '.theList[] | select(.id == 2) or select(.id == 4)' array.json
true
From the docs:
jq '.[] | select(.id == "second")'
Input [{"id": "first", "val": 1}, {"id": "second", "val": 2}]
Output {"id": "second", "val": 2}
I think you can do something like this:
jq '.theList[] | select(.id == 2 or .id == 4)' array.json
You could use select within map.
.theList | map(select(.id == (2, 4)))
Or more compact:
[ .theList[] | select(.id == (2, 4)) ]
Though written that way is a little inefficient since the expression is duplicated for every value being compared. It'll be more efficient and possibly more readable written this way:
[ .theList[] | select(any(2, 4; . == .id)) ]
Using select(.id == (2, 4)) here is generally inefficient (see below).
If your jq has IN/1, then it can be used to achieve a more efficient solution:
.theList[] | select( .id | IN(2,3))
If your jq does not have IN/1, then you can define it as follows:
def IN(s): first(select(s == .)) // false;
Efficiency
One way to see the inefficiency is to use debug. The following expression, for example, results in 10 calls to debug, whereas only 9 checks for equality are actually needed:
.theList[] | select( (.id == (2,3)) | debug )
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",true]
{
"id": 2,
"name": "Fritz"
}
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",true]
{
"id": 3,
"name": "Walter"
}
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",false]
["DEBUG:",false]
index/1
In principle, using index/1 should be efficient, but as of this writing (October 2017), its implementation, though fast (it is written in C), is inefficient.
Here is a solution using indices:
.theList | [ .[map(.id)|indices(2,4)[]] ]
I have a list of dictionaries with each entry having the following structure
{
"id": 0,
"type": "notification",
"name": "jane doe",
"loc": {
"lat": 38.8239,
"long": 104.7001
},
"data": [
{
"type": "test",
"time": "Fri Aug 13 09:17:16 2021",
"df": 80000000,
"db": 1000000,
"tp": 92
},
{
"type": "real",
"time": "Sat Aug 14 09:21:30 2021",
"df": 70000000,
"db": 2000000,
"tp:": 97
}
]
}
I need to be able to sort this list by any of these keys: name, type, time, tp and return it in memory.
I understand how to sort by the top level keys sorted(json_list, key=lambda k:k['name']) or even nested keys. For instance by lat sorted(json_list, key=lambda k:k['loc']['lat'])
so currently I have a function that works for the case when sorting by name.
def sort_by(self, param, rev=False):
if param == NAME:
self.json_list = sorted(self.json_list, key=lambda k: k[param], reverse=rev)
else:
# need help here
I'm having trouble sorting by type, time, and tp. Notice the data key is also a list of dictionaries. I would like to leverage existing methods built into the standard lib if possible. I can provide more clarification if necessary
Update:
def sort_by(self, param, rev=False):
if param == NAME:
self.json_list = sorted(self.json_list, key=lambda k: k[param], reverse=rev)
else:
self.json_list = sorted(self.json_list, key=lambda k: k['data'][0][param], reverse=rev)
return self.json_list
This works fine if there is only one item in the data list
If json_list[i]['data'] (for each i) only contains one dict, then the following should work; otherwise modifications are required.
sorted(json_list, key = lambda k: (
k['name'], k['data']['type'], k['data']['time'], k['data']['tp']
))
Can somebody tell me please what elasticsearch documentation means by relative path to config directory? I dont see any in ES instalation. I need to find a stop words file which is defined in es index like "stopwords_path": "stopwords/slovak.txt" but I cant find any file with this name. May be Win 10 is not able to find it cause it has really poor search engine. Thanks a lot.
As written in the documentation you should create the file slovak.txt according this syntax:
A path (either relative to config location, or absolute) to a
stopwords file configuration. Each stop word should be in its own
"line" (separated by a line break). The file must be UTF-8 encoded.
so you should create a slowak.txt file like this:
a
aby
aj
ak
aká
akáže
aké
akého
akéhože
akej
akejže
akému
akémuže
akéže
ako
akom
akomže
akou
akouže
akože
akú
akúže
aký
akých
akýchže
akým
akými
akýmiže
akýmže
akýže
ale
alebo
ani
áno
asi
avšak
až
ba
bez
bezo
bol
bola
boli
bolo
buď
bude
budem
budeme
budeš
budete
budú
by
byť
cez
cezo
čej
či
čí
čia
čie
čieho
čiemu
čím
čími
čiu
čo
čoho
čom
čomu
čou
čože
ďalší
ďalšia
ďalšie
ďalšieho
ďalšiemu
ďalších
ďalším
ďalšími
ďalšiu
ďalšom
ďalšou
dnes
do
ešte
ho
hoci
i
iba
ich
im
iná
iné
iného
inej
inému
iní
inom
inú
iný
iných
iným
inými
ja
je
jeho
jej
jemu
ju
k
ká
kam
kamže
každá
každé
každého
každému
každí
každou
každú
každý
každých
každým
každými
káže
kde
ké
keď
keďže
kej
kejže
kéže
kie
kieho
kiehože
kiemu
kiemuže
kieže
koho
kom
komu
kou
kouže
kto
ktorá
ktoré
ktorej
ktorí
ktorou
ktorú
ktorý
ktorých
ktorým
ktorými
ku
kú
kúže
ký
kýho
kýhože
kým
kýmu
kýmuže
kýže
lebo
leda
ledaže
len
ma
má
majú
mal
mala
mali
mám
máme
máš
mať
máte
medzi
mi
mňa
mne
mnou
moja
moje
mojej
mojich
mojim
mojimi
mojou
moju
možno
môcť
môj
môjho
môže
môžem
môžeme
môžeš
môžete
môžu
mu
musí
musia
musieť
musím
musíme
musíš
musíte
my
na
nad
nado
najmä
nám
nami
nás
náš
naša
naše
našej
nášho
naši
našich
našim
našimi
našou
ne
neho
nech
nej
nejaká
nejaké
nejakého
nejakej
nejakému
nejakom
nejakou
nejakú
nejaký
nejakých
nejakým
nejakými
nemu
než
nič
ničím
ničoho
ničom
ničomu
nie
niečo
niektorá
niektoré
niektorého
niektorej
niektorému
niektorom
niektorou
niektorú
niektorý
niektorých
niektorým
niektorými
nielen
nich
nim
ním
nimi
no
ňom
ňou
ňu
o
od
odo
on
oň
ona
oňho
oni
ono
ony
po
pod
podľa
podo
pokiaľ
popod
popri
potom
poza
práve
pre
prečo
pred
predo
preto
pretože
pri
s
sa
seba
sebe
sebou
sem
si
sme
so
som
ste
sú
svoj
svoja
svoje
svojho
svojich
svojim
svojím
svojimi
svojou
svoju
ta
tá
tak
taká
takáto
také
takéto
takej
takejto
takého
takéhoto
takému
takémuto
takí
taký
takýto
takú
takúto
takže
tam
táto
teba
tebe
tebou
teda
tej
tejto
ten
tento
ti
tí
tie
tieto
tiež
títo
to
toho
tohto
tohoto
tom
tomto
tomu
tomuto
toto
tou
touto
tu
tú
túto
tvoj
tvoja
tvoje
tvojej
tvojho
tvoji
tvojich
tvojim
tvojím
tvojimi
ty
tých
tým
tými
týmto
u
už
v
vám
vami
vás
váš
vaša
vaše
vašej
vášho
vaši
vašich
vašim
vaším
veď
viac
vo
však
všetci
všetka
všetko
všetky
všetok
vy
z
za
začo
začože
zo
že
This file have to be inside ES_PATH_CONF so in linux is /etc/elasticsearch/ and in windows is C:\ProgramData\Elastic\Elasticsearch\config Then you follow relative path notation. So if it is C:\ProgramData\Elastic\Elasticsearch\config\slowak.txt, you should set your path in this way:
"stopwords_path":"slowak.txt"
if you would put it inside C:\ProgramData\Elastic\Elasticsearch\config\synonym\slowak.txt you you set:
"stopwords_path":"synonym\slowak.txt"
What this documentation means is that you can provide your own path or use the relative file to define your own stop words in a text file.
if you are using the relative path then it should be inside your config folder or elasticsearch, where your elasticsearch.yml is present.
If you choose to have an absolute path, then you can store this file to any location where elasticsearch has access.
Just reproduced your issue and used GET Settings API to tell the current location of this file
For example:
GET yourindex/_settings
Retrurns the path which you gave while creating this setting.
{
"stopwords": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "stopwords",
"creation_date": "1587374021579",
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": [
"and",
"is",
"the"
],
"stopwords_path": "opster.txt". -> this is the file location which in this is relative
}
}
},
"number_of_replicas": "1",
"uuid": "EQyF7JydTXGXoebh52yNpg",
"version": {
"created": "7060199"
}
}
}
}
}
Update: an example with the absolute path given by me on my tar installation of Elasticsearch on the ubuntu EC2 machine and using same GET index setting figures out that.
I have a bunch of polygons that have self-intersection which causes some errors in further postprocessing them (in particular - I can't calculate intersection area of those polygons with other polygons). Here is an example of broken polygon:
{
"type": "MultiPolygon",
"coordinates": [
[
[
[
6.881057785381658,
46.82373306675715
],
[
6.857171686909481,
46.81861230543794
],
[
6.857354659059071,
46.81856788926046
],
[
6.856993473052509,
46.82693029065604
],
[
6.8612894138116785,
46.83422796373707
],
[
6.86720955648855,
46.835636765630476
],
[
6.871281147359957,
46.83078486366309
],
[
6.871573291317274,
46.8306215963777
],
[
6.877608228639841,
46.82771553607934
],
[
6.877758462659651,
46.82772313420989
],
[
6.877852632482749,
46.827735617670285
],
[
6.880928107931434,
46.82630213148064
],
[
6.8810399979122305,
46.82622029042867
],
[
6.881117606743071,
46.826115612819855
],
[
6.881057785381658,
46.82373306675715
]
]
]
]
}
This is what it looks like on the map - as you can see, there is intersection of two polygon edges. RGeo throws an error, pointing intersection coordinate (I guess): => "Geos::GEOSException: TopologyException: Input geom 0 is invalid: Self-intersection at or near point 6.8573510795579145 46.818650764080992 at 6.8573510795579145 46.818650764080992". So, I have it at least.
My question is: is there a way to fix that intersection automatically? I read, that a possible solution is to insert 2 similar points with coordinates of self-intersection. But the problem is - the polygon has a specific order, and I don't know WHERE to insert those points.
Also, maybe there are some existing tools helping fix that...
The solution I would use is postgis's ST_MakeValid option for postgres if that is an option for you you could do something along the lines of ST_AsText(ST_MakeValid(geom_column)) or if you would rather pass in the text here is an example using the bowtie example shown in prepair:
select ST_AsText(ST_MakeValid(ST_GeomFromText('POLYGON((0 0, 0 10, 10 0, 10 10, 0 0))')));
st_astext
-----------------------------------------------------------
MULTIPOLYGON(((0 0,0 10,5 5,0 0)),((5 5,10 10,10 0,5 5)))
(1 row)
If that doesn't interest you, you could export those geometries and use a tool like prepair to convert them. To sum up how this works behind the scenes, it will split these "bowties" into multiple polygons which will then be made into a multipolygon. The same type of fix will applied to multipolygons.
I have a very large code block in my .rst file, which I would like to highlight just a small portion of and make it bold. Consider the following rst:
wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.
wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.
**Example 1: Explain showing a table scan operation**::
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "53.80" # This query costs 53.80 cost units
},
"table": {
"table_name": "Country",
"access_type": "ALL", # ALL is a table scan
"rows_examined_per_scan": 239, # Accessing all 239 rows in the table
"rows_produced_per_join": 11,
"filtered": "4.76",
"cost_info": {
"read_cost": "51.52",
"eval_cost": "2.28",
"prefix_cost": "53.80",
"data_read_per_join": "2K"
},
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
],
"attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
}
}
}
When it converts to html, it syntax highlights by default (good), but I also want to specify a few lines that should be bold (the ones with comments on them, but possibly others too.)
I was thinking of adding a trailing character sequence on the line (.e.g. ###) and then writing a post-parser script to modify the html files generated. Is there a better way?
The code-block directive has an emphasize-lines option. The following highlights the lines with comments in your code.
**Example 1: Explain showing a table scan operation**
.. code-block:: python
:emphasize-lines: 7, 11, 12
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "53.80" # This query costs 53.80 cost units
},
"table": {
"table_name": "Country",
"access_type": "ALL", # ALL is a table scan
"rows_examined_per_scan": 239, # Accessing all 239 rows in the table
"rows_produced_per_join": 11,
"filtered": "4.76",
"cost_info": {
"read_cost": "51.52",
"eval_cost": "2.28",
"prefix_cost": "53.80",
"data_read_per_join": "2K"
},
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
],
"attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
}
}
}