Calculating Distance on a MultiValueField Location

Calculating Distance on a MultiValueField Location - elasticsearch

In my ElasticSearch index, location is a MultiValueField. When I write a custom scoring formula for my documents involving location, I want the script to pick up on whichever location is the closest to the point in my query.
So, I have this part of my scoring formula:
...
if (!doc['location'].empty && doc['location'].values.length > 1) {
least_distance = 10000;
foreach (loc_index: doc['location'].values) {
temp_distance = loc_index.distance(lat, lng);
if (temp_distance < least_distance) {
least_distance = temp_distance;
}
...
It's not the most elegant (I'm new to mvel and ES), but conceptually I'm first checking to see if doc['location'] indeed has more than one location in it, and if so, go through each of the locations to calculate distance, and keep track of the minimum distance found so far.
When I do this, ElasticSearch is returning an error:
Query Failed [Failed to execute main query]]; nested: PropertyAccessException[[Error: unable to resolve method: org.elasticsearch.common.geo.GeoPoint.distance(java.lang.Double, java.lang.Double)
which I think means that it doesn't want to do .distance() on a GeoPoint, which for some reason is different than a field that I might get by doing doc['location'].
Am I interpreting this situation correctly, and does anybody know of a workaround? Is there a way to just calculate distance (ideally without actually putting all the arithmetic for the distance between two coordinates) using ElasticSearch?

The issue here is that calling .values gives a list of GeoPoint() objects. There is a work around, although we need to do a bit of extra work to pull in the appropriate Java classes. We need to have the latitude and longitude of both points.
import org.elasticsearch.common.geo.GeoDistance;
import org.elasticsearch.common.unit.DistanceUnit;
base_point = doc['base_location'].value;
if (!doc['location'].empty && doc['location'].values.length > 1) {
foreach (loc_index: doc['location'].values) {
distance = GeoDistance.PLANE.calculate(loc_index.lat, loc_index.lon, base_point.lat, base_point.lon, DistanceUnit.MILES);
}
}
We can get the result in different units described by the enumerable here.
We can also use different calculation methodologies (like ARC), described here.

Related

Extracting all children belongs to specific parent in graphql

I am using GrapgQL and Java. I need to extract all the children belongs to specific parent. I have used the below way but it will fetch only the parent and it does not fetch any children.
schema {
query: Query
}
type LearningResource{
id: ID
name: String
type: String
children: [LearningResource]
}
type Query {
fetchLearningResource: LearningResource
}
#Component
public class LearningResourceDataFetcher implements DataFetcher{
#Override
public LearningResource get(DataFetchingEnvironment dataFetchingEnvironment) {
LearningResource lr3 = new LearningResource();
lr3.setId("id-03");
lr3.setName("Resource-3");
lr3.setType("Book");
LearningResource lr2 = new LearningResource();
lr2.setId("id-02");
lr2.setName("Resource-2");
lr2.setType("Paper");
LearningResource lr1 = new LearningResource();
lr1.setId("id-01");
lr1.setName("Resource-1");
lr1.setType("Paper");
List<LearningResource> learningResources = new ArrayList<>();
learningResources.add(lr2);
learningResources.add(lr3);
learningResource1.setChildren(learningResources);
return lr1;
}
}
return RuntimeWiring.newRuntimeWiring().type("Query", typeWiring -> typeWiring.dataFetcher("fetchLearningResource", learningResourceDataFetcher)).build();
My Controller endpoint
#RequestMapping(value = "/queryType", method = RequestMethod.POST)
public ResponseEntity query(#RequestBody String query) {
System.out.println(query);
ExecutionResult result = graphQL.execute(query);
System.out.println(result.getErrors());
System.out.println(result.getData().toString());
return ResponseEntity.ok(result.getData());
}
My request would be like below
{
fetchLearningResource
{
name
}
}
Can anybody please help me to sort this ?

Because I get asked this question a lot in real life, I'll answer it in detail here so people have easier time googling (and I have something to point at).
As noted in the comments, the selection for each level has to be explicit and there is no notion of an infinitely recursive query like get everything under a node to the bottom (or get all children of this parent recursively to the bottom).
The reason is mostly that allowing such queries could easily put you in a dangerous situation: a user would be able to request the entire object graph from the server in one easy go! For any non-trivial data size, this would kill the server and saturate the network in no time. Additionally, what would happen once a recursive relationship is encountered?
Still, there is a semi-controlled escape-hatch you could use here. If the scope in which you need everything is limited (and it really should be), you could map the output type of a specific query as a (complex) scalar.
In your case, this would mean mapping LearningResource as a scalar. Then, fetchLearningResource would effectively be returning a JSON blob, where the blob would happen to be all the children and their children recursively. Query resolution doesn't descent deeper once a scalar field is reached, as scalars are leaf nodes, so it can't keep resolving the children level-by-level. This means you'd have to recursively fetch everything in one go, by yourself, as GraphQL engine can't help you here. It also means sub-selections become impossible (as scalars can't have sub-selections - again, they're leaf nodes), so the client would always get all the children and all the fields from each child back. If you still need the ability to limit the selection in certain cases, you can expose 2 different queries e.g. fetchLearningResource and fetchAllLearningResources, where the former would be mapped as it is now, and the latter would return the scalar as explained.
An object scalar implementation is provided by the graphql-java ExtendedScalars project.
The schema could then look like:
schema {
query: Query
}
scalar Object
type Query {
fetchLearningResource: Object
}
And you'd use the method above to produce the scalar implementation:
RuntimeWiring.newRuntimeWiring()
.scalar(ExtendedScalars.Object) //register the scalar impl
.type("Query", typeWiring -> typeWiring.dataFetcher("fetchLearningResource", learningResourceDataFetcher)).build();
Depending on how you process the results of this query, the DataFetcher for fetchLearningResource may need to turn the resulting object into a map-of-maps (JSON-like object) before returning to the client. If you simply JSON-serialize the result anyway, you can likely skip this. Note that you're side-stepping all safety mechanisms here and must take care not to produce enormous results. By extension, if you need this in many places, you're very likely using a completely wrong technology for your problem.
I have not tested this with your code myself, so I might have skipped something important, but this should be enough to get you (or anyone googling) onto the right track (if you're sure this is the right track).
UPDATE: I've seen someone implement a custom Instrumentation that rewrites the query immediately after it's parsed, and adds all fields to the selection set if no field had already been selected, recursively. This effectively allows them to select everything implicitly.
In graphql-java v11 and prior, you could mutate the parsed query (represented by the Document class), but as of v12, it will no longer be possible, but instrumentations in turn gain the ability to replace the Document explicitly via the new instrumentDocument method.
Of course, this only makes sense if your schema is such that it can not be exploited or you fully control the client so there's no danger. You could also only do it selectively for some types, but it would be extremely confusing to use.

Stanford OpenIE: How to output dependency path instead of plain text patterns?

I am looking through the Java source code and wondering if it's easy to modify the system such that the predicate portion of each triple is the dependency path between the two entities instead of the surface form.
Since the natural logic module operates on the dependency trees I suppose there shall be an easy tweak to this demand.
I trace the code in edu.stanford.nlp.naturalli/OpenIE.java to:
// Get the extractions
boolean empty = true;
synchronized (OUTPUT) {
for (CoreMap sentence : ann.get(CoreAnnotations.SentencesAnnotation.class)) {
for (RelationTriple extraction : sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class)) {
// Print the extractions
OUTPUT.println(tripleToString(extraction, docid, sentence));
empty = false;
}
}
}
Please point me to the implementation of the following step:
sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class)
Thanks!

Each relation triple actually does store the dependency structure from which it was generated. Take a look at the asDependencyTree() function in RelationTriple.
Note that this tree is not necessarily a subtree of the original sentence -- e.g., it may be that a subject was moved around to produce a relation triple. If you're looking for a dependency path in the original sentence, you can look up tokens by their IndexAnnotation and compute a dependency path from that.

How do I calculate distance on a geospatial query in RethinkDB?

I am trying to run a query that allows me to filter for a specific document, and then check the distance between the coordinates stored there, and the new ones I'm passing in.
I've tried this:
r.db('food').table('fruits')
.hasFields(['origin', 'region'])
.filter({region: 'North America'})
.pluck('gpsLocation')
.distance(r.point(37.759056, 105.015018))
but I get this error: e: Expected type DATUM but found SEQUENCE:
From the docs I see that I need
geometry.distance(geometry[, {geoSystem: 'WGS84', unit: 'm'}])
but I'm not sure how to get my query to return that. gpsLocation is an index on the fruits table if that makes a difference.

You cannot pluck an index. I think you mean field, because the way you want to extract the data of that field with pluck. And you create an index with same name, from that field. If my assumption is correct, below is my answer. If not, you need to update your question because you cannot pluck an index.
The problem is wrong data type. According to https://rethinkdb.com/api/javascript/distance/, command syntax looks liek this:
geometry.distance(geometry[, {geoSystem: 'WGS84', unit: 'm'}]) → number
r.distance(geometry, geometry[, {geoSystem: 'WGS84', unit: 'm'}]) → number
That means distance can be either call on r, passing two geometyr object, or calling on a geometry object, and passing another geometry object as its first parameter.
Your query is returning a STREAM. You can found out its data type via reading document of API, or just use typeOf.
r.db('food').table('fruits')
.hasFields(['origin', 'region'])
.filter({region: 'North America'})
.pluck('gpsLocation')
.typeOf()
So, you have to somehow loop over the STREAM, and calling distance on a document of stream.
r.db('food').table('fruits')
.hasFields(['origin', 'region'])
.filter({region: 'North America'})
.pluck('gpsLocation')
.map(function(location) {
return location.distance(r.point(37.759056, 105.015018))
})
It's a bit similar to how you have an array, and calling map in JavaScript to walk over the array, and running a callback function on the element. Though here, the map function runs on server.
With assume that you gpsLocation field contains an geometry object, such as a point(https://rethinkdb.com/api/javascript/point/) with longitude and latitude.

d3js: Calculating sub-totals, conditional on i=="some stuff"

Warning/Disclaimer: This is a basic JavaScript question, but I've gone through a bunch of iterations in my code, much Googling, and I'm having trouble wrapping my head around how to proceed.
I have data in three columns in a CSV file: a political candidate's name, their party, and their approval rating.
I've created a bubble chart/force layout, similar to this. Candidates are represented as bubbles. Users can select to see candidates organized in one big blob, or they can click to see the bubbles organize themselves by party. I have some <div> elements that pop up under each grouping of same-party candidates. What I'd like to do now is to have each party-specific <div> element display the total approval rating all its same-party candidates get, combined. (So, in Excel, a =SUMIF().)
To do this: I'm creating a function(party) which should, in principle, return that conditional sum. Here's what it looks like when I'm calling it for candidates with no party affiliation:
d3.select("#text-NONE")
.text(label_party("N/A"));
"N/A" is the string found in the CSV file.
And the function itself:
function label_party(party) {
var party_total = 0;
function party(d) {
if (d.party[i] == "party") {
party_total+= d.y2012[i];
};
};
return party_total;
};
Both of the above are happening outside of the d3.csv() call. My main Q: how can I set up a conditional sum over two columns in a CSV? At the moment, it's simply returning 0 - so it's skipping my loop, though I don't know why.

Sorting CouchDB Views By Value

I'm testing out CouchDB to see how it could handle logging some search results. What I'd like to do is produce a view where I can produce the top queries from the results. At the moment I have something like this:
Example document portion
{
"query": "+dangerous +dogs",
"hits": "123"
}
Map function
(Not exactly what I need/want but it's good enough for testing)
function(doc) {
if (doc.query) {
var split = doc.query.split(" ");
for (var i in split) {
emit(split[i], 1);
}
}
}
Reduce Function
function (key, values, rereduce) {
return sum(values);
}
Now this will get me results in a format where a query term is the key and the count for that term on the right, which is great. But I'd like it ordered by the value, not the key. From the sounds of it, this is not yet possible with CouchDB.
So does anyone have any ideas of how I can get a view where I have an ordered version of the query terms & their related counts? I'm very new to CouchDB and I just can't think of how I'd write the functions needed.

It is true that there is no dead-simple answer. There are several patterns however.
http://wiki.apache.org/couchdb/View_Snippets#Retrieve_the_top_N_tags. I do not personally like this because they acknowledge that it is a brittle solution, and the code is not relaxing-looking.
Avi's answer, which is to sort in-memory in your application.
couchdb-lucene which it seems everybody finds themselves needing eventually!
What I like is what Chris said in Avi's quote. Relax. In CouchDB, databases are lightweight and excel at giving you a unique perspective of your data. These days, the buzz is all about filtered replication which is all about slicing out subsets of your data to put in a separate DB.
Anyway, the basics are simple. You take your .rows from the view output and you insert it into a separate DB which simply emits keyed on the count. An additional trick is to write a very simple _list function. Lists "render" the raw couch output into different formats. Your _list function should output
{ "docs":
[ {..view row1...},
{..view row2...},
{..etc...}
]
}
What that will do is format the view output exactly the way the _bulk_docs API requires it. Now you can pipe curl directly into another curl:
curl host:5984/db/_design/myapp/_list/bulkdocs_formatter/query_popularity \
| curl -X POST host:5984/popularity_sorter/_design/myapp/_view/by_count
In fact, if your list function can handle all the docs, you may just have it sort them itself and return them to the client sorted.

This came up on the CouchDB-user mailing list, and Chris Anderson, one of the primary developers, wrote:
This is a common request, but not supported directly by CouchDB's
views -- to do this you'll need to copy the group-reduce query to
another database, and build a view to sort by value.
This is a tradeoff we make in favor of dynamic range queries and
incremental indexes.
I needed to do this recently as well, and I ended up doing it in my app tier. This is easy to do in JavaScript:
db.view('mydesigndoc', 'myview', {'group':true}, function(err, data) {
if (err) throw new Error(JSON.stringify(err));
data.rows.sort(function(a, b) {
return a.value - b.value;
});
data.rows.reverse(); // optional, depending on your needs
// do something with the data…
});
This example runs in Node.js and uses node-couchdb, but it could easily be adapted to run in a browser or another JavaScript environment. And of course the concept is portable to any programming language/environment.
HTH!

This is an old question but I feel it still deserves a decent answer (I spent at least 20 minutes on searching for the correct answer...)
I disapprove of the other suggestions in the answers here and feel that they are unsatisfactory. Especially I don't like the suggestion to sort the rows in the applicative layer, as it doesn't scale well and doesn't deal with a case where you need to limit the result set in the DB.
The better approach that I came across is suggested in this thread and it posits that if you need to sort the values in the query you should add them into the key set and then query the key using a range - specifying a desired key and loosening the value range. For example if your key is composed of country, state and city:
emit([doc.address.country,doc.address.state, doc.address.city], doc);
Then you query just the country and get free sorting on the rest of the key components:
startkey=["US"]&endkey=["US",{}]
In case you also need to reverse the order - note that simple defining descending: true will not suffice. You actually need to reverse the start and end key order, i.e.:
startkey=["US",{}]&endkey=["US"]
See more reference at this great source.

I'm unsure about the 1 you have as your returned result, but I'm positive this should do the trick:
emit([doc.hits, split[i]], 1);
The rules of sorting are defined in the docs.

Based on Avi's answer, I came up with this Couchdb list function that worked for my needs, which is simply a report of most-popular events (key=event name, value=attendees).
ddoc.lists.eventPopularity = function(req, res) {
start({ headers : { "Content-type" : "text/plain" } });
var data = []
while(row = getRow()) {
data.push(row);
}
data.sort(function(a, b){
return a.value - b.value;
}).reverse();
for(i in data) {
send(data[i].value + ': ' + data[i].key + "\n");
}
}
For reference, here's the corresponding view function:
ddoc.views.eventPopularity = {
map : function(doc) {
if(doc.type == 'user') {
for(i in doc.events) {
emit(doc.events[i].event_name, 1);
}
}
},
reduce : '_count'
}
And the output of the list function (snipped):
165: Design-Driven Innovation: How Designers Facilitate the Dialog
165: Are Your Customers a Crowd or a Community?
164: Social Media Mythbusters
163: Don't Be Afraid Of Creativity! Anything Can Happen
159: Do Agencies Need to Think Like Software Companies?
158: Customer Experience: Future Trends & Insights
156: The Accidental Writer: Great Web Copy for Everyone
155: Why Everything is Amazing But Nobody is Happy

Every solution above will break couchdb performance I think. I am very new to this database. As I know couchdb views prepare results before it's being queried. It seems we need to prepare results manually. For example each search term will reside in database with hit counts. And when somebody searches, its search terms will be looked up and increments hit count. When we want to see search term popularity, it will emit (hitcount, searchterm) pair.

The Link Retrieve_the_top_N_tags seems to be broken, but I found another solution here.
Quoting the dev who wrote that solution:
rather than returning the results keyed by the tag in the map step, I would emit every occurrence of every tag instead. Then in the reduce step, I would calculate the aggregation values grouped by tag using a hash, transform it into an array, sort it, and choose the top 3.
As stated in the comments, the only problem would be in case of a long tail:
Problem is that you have to be careful with the number of tags you obtain; if the result is bigger than 500 bytes, you'll have couchdb complaining about it, since "reduce has to effectively reduce". 3 or 6 or even 20 tags shouldn't be a problem, though.
It worked perfectly for me, check the link to see the code !

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio