How to find related related songs or artists using Freebase MQL? - google-api

I have any Freebase mid such as: /m/0mgcr, which is The Offspring.
Whats the best way to use MQL to find related artists?
Or if I have a song mid such as: /m/0l_f7f, which is Original Prankster by The Offspring.
Whats the best way to use MQL to find related songs?

So, the revised question is, given a musical artist, find all other musical artists who share all of the same genres assigned to the first artist.
MQL doesn't have any operators which can work across parts of the query tree, so this can't be done in a single query, but given that you're likely doing this from a programming language, it be done pretty simply in two steps.
First, we'll get all genres for our subject artist, sorted by the number of artists that they contain using this query (although the last part isn't strictly necessary):
[{
"id": "/m/0mgcr",
"name": null,
"/music/artist/genre": [{
"name": null,
"id": null,
"artists": {
"return": "count"
},
"sort": "artists.count"
}]
}]
Then, using the genre with the smallest number of artists for maximum selectivity, we'll add in the other genres to make it even more specific. Here's a version of the query with the artists that match on the three most specific genres (the base genre plus two more):
[{
"id": "/m/0mgcr",
"name": null,
"/music/artist/genre": [{
"name": null,
"id": null,
"artists": {
"return": "count"
},
"sort": "artists.count",
"limit": 1,
"a:artists": [{
"name": null,
"id": null,
"a:genre": {
"id": "/en/ska_punk"
},
"b:genre": {
"id": "/en/melodic_hardcore"
}
}]
}]
}]
Which gives us: Authority Zero, Millencolin, Michael John Burkett, NOFX, Bigwig, Huelga de Hambre, Freygolo, The Vandals
The things to note about this query are that, this fragment:
"sort": "artists.count",
"limit": 1,
limits our initial genre selection to the single genre with the fewest artists (ie Skate Punk), while the prefix notation:
"a:genre": {"id": "/en/ska_punk"},
"b:genre": {"id": "/en/melodic_hardcore"}
is to get around the JSON limitation on not having more than one key with the same name. The prefixes are ignored and just need to be unique (this is the same reason for the a:artists elsewhere in the query.
So, having worked through that whole little exercise, I'll close by saying that there are probably better ways of doing this. Instead of an absolute match, you may get better results with a scoring function that looks at % overlap for the most specific genres or some other metric. Things like common band members, collaborations, contemporaneous recording history, etc, etc, could also be factored into your scoring. Of course this is all beyond the capabilities of raw MQL and you'd probably want to load the Freebase data for the music domain (or some subset) into a graph database to run these scoring algorithms.
In point of fact, both last.fm and Google think a better list would include bands like Sum 41, blink-182, Bad Religion, Green Day, etc.

Related

Maps vs Lists in Elasticsearch for optimized query performance

I have some data I will be putting into Elasticsearch, and want to decide on a format that will optimize query performance. The query will be in words: "Is ID X in category Y?". I have a fixed number of categories (small, say, 5), and possibly a large number of IDs to put into each category (currently in the dozens, but of indeterminate size in the future). Each ID will be in at most one category (possibly none).
Format 1:
{
"field1": "value1",
...
"categories": {
"category1": ["id10", "id24", "id38",...],
...
"category5": ["id62", "id19", "id82" ...]
}
}
or
Format 2:
{
"field1": "value1",
...
"categories": {
"id1": "category4",
"id2": "category2",
"id3": "category1",
...
}
}
Which data format would be preferred? The latter format has linear lookup time, but possibly many keys.
I think method 1 is better, Id will be more in the future, if you press method 2, then you may need to close the categories index or increase the number of index fields, and using method 1 can be more convenient to determine the type of a single id (indeOf).There are pros and cons. Maybe there's a better way.

how to breakdown seearch result with elasticsearch?

I have documents in my elasticsearch that represent suppliers, each document is a supplier and each supplier have branches as well, it looks like this:
{
"id": 1,
"supplierName": "John Flower Shop",
"supplierAddress": "107 main st, Los Angeles",
"branches": [
{
"branchId": 11,
"branchName": "John Flower Shop New York",
"branchAddress": "34 5th Ave, New York"
},
{
"branchId": 12,
"branchName": "John Flower Shop Miami",
"branchAddress": "56 ragnar st, Miami"
}
]
}
currently I exposed api to allow search in fields: supplierName, supplierAddress, branchName and branchAddress.
the use case is a search box in my website, that perform a call to the backend, and pur the result in a dropdown for the user to choose the supplier.
my issue is, given the example document above, if you search for "John Flower Shop Miami", the answer will be the whole document, and what will be presented is the top level supplier name.
what I want is to present "John Flower Shop Miami", and im not sure how to understand what part of the result is what hit the search....
does someone had to do something like this before?
Handling relationship in elasticsearch is a bit of work but you can do it. I recommend you to read the ES guide's chapter handling relationships to have the big picture.
Then my advice is to index your branches as nested documents. Thus they will be stored as distinct documents in your index.
It will require you to change your query syntax to use nested queries that can be a pain in the a... but in exchange, you will be granted with inner_hits functionality.
It will allow you to know which subdocument ( nested document ) matched your query.

Elasticsearch data size optimization

I was wondering, would it be good practice to optimize data like this in Elasticsearch?
Old data
{
"user_id": 1,
"firstname": "name",
"lastname": "name",
"email": "email"
}
New data
{
"uid": 1,
"f": "name",
"l": "name",
"e": "email"
}
Lets say, I have billions of documents with long named keys, would it save alot space if I used short named keys instead?
Or does elasticsearch compress data by default, so I don't need to worry about this?
I prefer to have data more readable, but if it could save a lot space, then its whole different thing.
This question is asked five years ago and it had only one answer, so would be nice to have more comments about this.
You can read it here:
Elasticsearch scheme optimization
Any thoughts from experienced elasticsearch developers?

Nesting relationship with ElasticSearch

I am trying to find revenue per actor in a movie. It is pretty straightforward, but here's an example of what I have now:
// without actor
{
"ID": 1,
"Timestamp": "2014-01-01 00:02:12",
"Title": "Great White Shark",
"Amount": 4.99
}
It is not an issue if I have, for example, 100M entries in financials and I ask for the aggregate where the title=GreatWhiteShark.
However, when I add in an Actor, the structure becomes extremely verbose, and probably increases my storage size by 10x --
{
"ID": 1,
"Timestamp": "2014-01-01 00:02:12",
"Title": "Great White Shark",
"Amount": 4.99,
"Actors": [Christopher Plummer,Andrew Garfield,Heath Ledger,
Lily Cole,Jude Law,Verne Troyer,Johnny Depp,
Tom Waits,George MacKay,Tom Holland,Saoirse Ronan,
Seymour Cassel,Sofia Milos]
}
This is so I can ask a question such as "How much money did movies with Christopher Plummer make in 2011?".
Is there a better way to do the above structure? My main concern is performance, and secondary would be storage size.
Performance should be very good, Elasticsearch will build an inverted index for actors array anyway. Querying an actor will return all associated movies instantly.
As for space reduction, you can try encoding each actor name to an integer id instead of actor slug. But you should try slug variant first as this does not destroy readability and integrations to Kibana, etc.
Your proposed structure is perfectly suited for Elasticsearch by all means.

Grouping non null fields together in Kibana

Given the following three User entries in an ElasticSearch index:
"user": [
{
"userId": "100",
"hobby": "chess"
}
"user": [
{
"userId": "200",
"hobby": "music"
}
"user": [
{
"userId": "300",
"hobby": ""
}
I want to create a vertical bar chart to compare the number of users who have a hobby as opposed to those who do not. Individual hobbies should not be shown separately, but grouped together.
If split along the Y axis, one block would take up two thirds of the height (the two users with hobbies) and one block one third of the height (the one user with no hobbies).
How could one achieve this grouping in Kibana?
Thanks
You'll need to choose Split Bars and then Filters aggregation. Once you have that selected you should see Query 1 with * in it. Change the * to hobby:*. Next hit Add Filter and put in NOT hobby:*
The filters aggregation lets you bucket things pretty much any way you can search for things.

Resources