ElasticSearch : how to rank on skill rating in full text search? - elasticsearch

I have quite a simple case but can't find the good way to solve it:
I have People, their Skills are rated. They also have other information attached (eg: city). All of this is in my ElasticSearch index.
Example:
John
Paris
Python: 7/10
Boris
Paris
Python: 3/10
Mike
Frankfurt
Python: 7/10
I would like to perform a text search only to find people.
If I type "Python", the better rated someone is, the higher it should be
If I type "Python Paris", it should get all people in Paris sorted by Python rating
Here is an example of people document in ES index:
{
"_index": "senso",
"_type": "talent",
"_id": "12469",
"_version": 1,
"found": true,
"_source": {
"id": 12469,
"nickname": "Roger",
"first_name": "Moore",
"last_name": "Bond",
"companyName": null,
"email": "example#example.org",
"city": "Marseille",
"region": "Provence-Alpes-Côte d'Azur",
"internalGlobalRating": 5,
"declaredDailyPrice": 650,
"declaredAnnualSalaryTarget": null,
"boughtDailyPrice": null,
"soldDailyPrice": null,
"skillsRatings": [
{
"skillName": "Direction Artistique Web",
"skillId": 1298,
"rating": 9
},
{
"skillName": "UX Design",
"skillId": 1295,
"rating": 9
},
{
"skillName": "Identité Visuelle",
"skillId": 1319,
"rating": 8
},
{
"skillName": "Illustrator",
"skillId": 1425,
"rating": 9
},
{
"skillName": "Photoshop",
"skillId": 1427,
"rating": 9
},
{
"skillName": "InDesign",
"skillId": 1426,
"rating": 9
}
],
"expertises": [
{
"name": "Direction Artistique Web",
"id": 1298
},
{
"name": "UX Design",
"id": 1295
},
{
"name": "Identité Visuelle",
"id": 1319
}
],
"missionTypes": [
{
"name": "Freelance sur place",
"id": 2
},
{
"name": "Freelance en télétravail",
"id": 3
},
{
"name": "Forfait",
"id": 4
}
],
"tools": [
{
"name": "Illustrator",
"id": 1425
},
{
"name": "Photoshop",
"id": 1427
},
{
"name": "InDesign",
"id": 1426
}
],
"themes": [],
"medias": [],
"organizationType": {
"id": 2,
"name": "Studio"
},
"source": {
"id": 2
},
"spokenLanguages": [
{
"id": 2
},
{
"id": 3
}
],
"mainLanguage": {
"id": 1,
"name": "Français"
}
"created": "2011-10-05T20:17:52+02:00",
"updated": "2017-07-03T15:59:11+02:00",
"applicationDate": "2011-10-05T20:17:52+02:00",
"portfolio": {
"id": 95,
"visible": true,
"submissionTime": "2017-01-13T18:20:31+01:00",
"isDisplayed": 1,
"isPublic": 1
}
}
}
I wonder which approach I should choose : tweak at index time or custom queries, or both ?
Any clue on how to tackle this problem would be appreciated.
Thank you.

Related

ElasticSearch - Combine filters & Composite Query to get unique fields combinations

Well.. I am quite "newb" regarding ES so regarding aggregation... there is no words in the dictionary to describe my level regarding it :p
Today I am facing an issue where I am trying to create a query that should execute something similar to a SQL DISTINCT, but among filters. I have this document given (of course, an abstraction of the real situation):
{
"id": "1",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 1,
"name": "a_name_for_id_1"
},
"structure": {
"material": "cartoon",
"thickness": 5
},
"shared": true,
"objective": "stackoverflow"
}
As all the data of the above document can vary, I however have some values that can be redundant, such as classification.id, kind, structure.material.
So, in order to fullfit my requirements, I would like to "group by" these 3 fields in order to have a unique combination of each. If we go deeper, with the following data, I should get the following possibilities:
[{
"id": "1",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 1,
"name": "a_name_for_id_1"
},
"structure": {
"material": "cartoon",
"thickness": 5
},
"shared": true,
"objective": "stackoverflow"
},
{
"id": "2",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 2,
"name": "a_name_for_id_2"
},
"structure": {
"material": "iron",
"thickness": 3
},
"shared": true,
"objective": "linkedin"
},
{
"id": "3",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": false,
"kind": "document",
"classification": {
"id": 2,
"name": "a_name_for_id_2"
},
"structure": {
"material": "paper",
"thickness": 1
},
"shared": false,
"objective": "tiktok"
},
{
"id": "4",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 3,
"name": "a_name_for_id_3"
},
"structure": {
"material": "cartoon",
"thickness": 5
},
"shared": false,
"objective": "snapchat"
},
{
"id": "5",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 3,
"name": "a_name_for_id_3"
},
"structure": {
"material": "paper",
"thickness": 1
},
"shared": true,
"objective": "twitter"
},
{
"id": "6",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": false,
"kind": "document",
"classification": {
"id": 3,
"name": "a_name_for_id_3"
},
"structure": {
"material": "iron",
"thickness": 3
},
"shared": true,
"objective": "facebook"
}
]
based on the above, I should get the following results in the "buckets":
document 1 cartoon
document 2 iron
document 2 paper
document 3 cartoon
document 3 paper
document 3 iron
Of course, for the sake of this example (and to make it easier, I yet don't have any duplicates)
However, on top of that, I need some "pre-filters" as I only want:
Documents that are available isAvailable=true
Documents'structure's thickness should range between 2 and 4 included: 2 >= structure.thickness >= 4
Document's that are shared shared=true
I should so then get only the following combinations compared to the first set of results:
document 1 cartoon -> not a valid result, thickness > 4
document 2 iron
document 2 paper -> not a valid result, isAvailable != true
document 3 cartoon -> not a valid result, thickness > 4
document 3 cartoon -> not a valid result, thickness < 2
document 3 iron -> not a valid result, isAvailable != true
If you're still reading, well.. thanks! xD
So, as you can see, I need all the possible combination of this field regarding the static pattern kind <> classification_id <> structure_material that are matching the filters regarding isAvailable, thickness, shared.
Regarding the output, the hits doesn't matter to me as I don't need the documents but only the combination kind <> classification_id <> structure_material :)
Thanks for any help :)
Max
You can got with Cardinatily aggregations with your existing filters.Please check this url and let me know if you have any queries.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Thanks to a colleague, I could finally get it working as expected!
QUERY
GET index-latest/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"isAvailable": true
}
},
{
"range": {
"structure.thickness": {
"gte": 2,
"lte": 4
}
}
},
{
"term": {
"shared": true
}
}
]
}
},
"aggs": {
"my_agg_example": {
"composite": {
"size": 10,
"sources": [
{
"kind": {
"terms": {
"field": "kind.keyword",
"order": "asc"
}
}
},
{
"classification_id": {
"terms": {
"field": "classification.id",
"order": "asc"
}
}
},
{
"structure_material": {
"terms": {
"field": "structure.material.keyword",
"order": "asc"
}
}
}
]
}
}
}
}
The given result is then:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"my_agg_example": {
"after_key": {
"kind": "document",
"classification_id": 2,
"structure_material": "iron"
},
"buckets": [
{
"key": {
"kind": "document",
"classification_id": 2,
"structure_material": "iron"
},
"doc_count": 1
}
]
}
}
}
So, as we can see, we get the following bucket:
{
"key": {
"kind": "document",
"classification_id": 2,
"structure_material": "iron"
},
"doc_count": 1
}
Note: Be careful regarding the type of your field.. putting .keyword on classification.id was resulting to no results in the buckets... .keyword should be use only on types such as string (as far as I understood, correct me if I am wrong)
As expected, we have the following result (compared to the initial question):
document 2 iron
Note: Be careful, the order of the elements within the aggs.<name>.composite.sources does play a role in the returned results.
Thanks!

Square API: Create Checkout API error

When using the sample POSTMAN request:
{
"idempotency_key": "74ae1696-b1e3-4328-af6d-f1e04d947a13",
"order": {
"reference_id": "my-order-001",
"line_items": [
{
"name": "line-item-1",
"quantity": "1",
"base_price_money": {
"amount": 1599,
"currency": "USD"
}
},
{
"name": "line-item-2",
"quantity": "2",
"base_price_money": {
"amount": 799,
"currency": "USD"
}
}]
},
"ask_for_shipping_address": true,
"merchant_support_email": "merchant+support#website.com",
"pre_populate_buyer_email": "buyer#email.com",
"pre_populate_shipping_address": {
"address_line_1": "500 Electric Ave",
"address_line_2": "Suite 600",
"locality": "New York",
"administrative_district_level_1": "NY",
"postal_code": "10003",
"first_name": "Jane",
"last_name": "Doe"
},
"redirect_url": "https://merchant.website.com/order-confirm"
}
I'm getting the following response:
{
"errors": [
{
"category": "INVALID_REQUEST_ERROR",
"code": "INVALID_VALUE",
"detail": "The order must have at least one line item.",
"field": "line_items"
}]
}
This is simply executing the sample POSTMAN requests available via https://docs.connect.squareup.com/api/connect/v2/#runningpostman
I was having the same issue and I reached out to Tristan. Tristan replied there was a bug that square development had to fix. I confirmed that the create checkout API is now working properly so this issue should be resolved now.
Are you using your sandbox or production access tokens? I was able to generate a checkout form with the example postman request:
{
"idempotency_key": "73ae1696-b1e3-4328-af6d-f1e04d947a13",
"order": {
"reference_id": "my-order-001",
"line_items": [
{
"name": "line-item-1",
"quantity": "1",
"base_price_money": {
"amount": 1599,
"currency": "USD"
}
},
{
"name": "line-item-2",
"quantity": "2",
"base_price_money": {
"amount": 799,
"currency": "USD"
}
}
]
},
"ask_for_shipping_address": true,
"merchant_support_email": "merchant+support#website.com",
"pre_populate_buyer_email": "buyer#email.com",
"pre_populate_shipping_address": {
"address_line_1": "500 Electric Ave",
"address_line_2": "Suite 600",
"locality": "New York",
"administrative_district_level_1": "NY",
"postal_code": "10003",
"first_name": "Jane",
"last_name": "Doe"
},
"redirect_url": "https://merchant.website.com/order-confirm"
}

How to get Order and OrderLineItem for a transaction using Connect V2?

I am writing a wrapper to retrive transaction data for the Square Connect API V2. I am able to retrive the trasactions with Order data missing.
I am getting following response using All Transaction and Retrieve Transaction API:
{
"transactions": [
{
"id": "mYziFkYv2QK7e2kb2vyIhegeV",
"location_id": "75S3K9Z9KSVYK",
"created_at": "2017-04-17T11:00:51Z",
"tenders": [
{
"id": "2qeDw6CmCs299m9w0RY7KQB",
"location_id": "75S3K9Z9KSVYK",
"transaction_id": "mYziFkYv2QK7e2kb2vyIhegeV",
"created_at": "2017-04-17T11:00:51Z",
"amount_money": {
"amount": 10000,
"currency": "INR"
},
"processing_fee_money": {
"amount": 0,
"currency": "INR"
},
"type": "OTHER"
}
],
"product": "REGISTER",
"client_id": "75S3K9Z9KSVYK-a776-4377-84f5-75S3K9Z9KSVYK"
},
{
"id": "UJsg9IdIv9WWvqT1h2VkbxgeV",
"location_id": "75S3K9Z9KSVYK",
"created_at": "2017-04-17T11:00:37Z",
"tenders": [
{
"id": "UVuQghb8RTF8OUcmAsaXKQB",
"location_id": "75S3K9Z9KSVYK",
"transaction_id": "UJsg9IdIv9WWvqT1h2VkbxgeV",
"created_at": "2017-04-17T11:00:37Z",
"amount_money": {
"amount": 0,
"currency": "INR"
},
"processing_fee_money": {
"amount": 0,
"currency": "INR"
},
"type": "NO_SALE"
}
],
"product": "REGISTER",
"client_id": "75S3K9Z9KSVYK-a751-4434-a041-75S3K9Z9KSVYK"
}
]}
Is there any way to get order (line item) details?
If you are looking for itemizations, you can use the v1 transactions endpoints.
See here: https://docs.connect.squareup.com/api/connect/v1/#get-paymentid

How to perform a multi level logical query in elasticsearch

Let's say that we have such query
(a or b or c) and (x or y or z) and (f or g or h) ...
How we can perform such query using Elasticsearch?
I tried the bool query, but I still confused about must, should... clauses and how to place different conditions in these clauses to obtain a logical result
Here is a sample of my data(courses type):
[{
"_index": "training-hub",
"_type": "courses",
"_id": "58ad8090604aff26df131971",
"_score": 1,
"_source": {
"title": "Getting Started with Vue.js",
"description": "In this course, you will learn the basics of Vue.js framework and how to create amazing web applications using this framework",
"slug": "hzrthtrjhyjyt--by--Jemli-Fathi",
"format": [
"Intra-entreprise"
],
"duration": 6,
"language": "EN",
"requirements": [
"Basics of web development"
],
"tags": [],
"audience": [
"Web developers"
],
"price": 600,
"cover": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487840598/training-hub/k1h0tzciyjflqvtlyr2l.png",
"scheduleUrl": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487765627/training-hub/sozofm68nrwxhta3ga3u.png",
"trainer": {
"firstname": "Jemli",
"lastname": "Fathi",
"photo": "https://ucarecdn.com/5923a1bb-3e77-47d0-bd5b-7d07b0f559fe/16487460_1269388943138870_3235853817844339449_o.jpg"
},
"courseCategory": {
"name": "Game Development"
},
"createdAt": "2017-02-22T12:14:08.186Z",
"updatedAt": "2017-02-24T10:39:22.896Z"
}
},
{
"_index": "training-hub",
"_type": "courses",
"_id": "58ad81c0604aff26df131973",
"_score": 1,
"_source": {
"title": "Create your first Laravel application today!",
"description": "In this course, you're gonna learn how create fancy web applications using Laravel PHP Framework ...",
"slug": "sdvrehtrthrth--by--Jemli-Fathi",
"format": [
"Intra-entreprise",
"Inter-entreprise"
],
"duration": 6,
"language": "EN",
"requirements": [
"Basics of Web development",
"Basics of PHP language"
],
"tags": [],
"audience": [
"Web developers",
"PHP developers"
],
"price": 600,
"cover": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487841464/training-hub/dpgqbchualnfc78n69gs.png",
"scheduleUrl": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487765627/training-hub/sozofm68nrwxhta3ga3u.png",
"trainer": {
"firstname": "Jemli",
"lastname": "Fathi",
"photo": "https://ucarecdn.com/5923a1bb-3e77-47d0-bd5b-7d07b0f559fe/16487460_1269388943138870_3235853817844339449_o.jpg"
},
"courseCategory": {
"name": "Web Development"
},
"createdAt": "2017-02-22T12:19:12.376Z",
"updatedAt": "2017-02-23T09:39:23.414Z"
}
},
{
"_index": "training-hub",
"_type": "courses",
"_id": "58aead4faecfc31e4559d49b",
"_score": 1,
"_source": {
"title": "Getting Started with Docker",
"description": "After taking this course, you will be able to use Docker in real life projects and you can bootstrap your projects into different containers",
"slug": "regrehgreh--by--Jemli-Fathi",
"format": [
"Intra-entreprise"
],
"duration": 5,
"language": "EN",
"requirements": [
"Basic Linux Shell skills",
"Basic knowledge of Linux environment"
],
"tags": [],
"audience": [
"Dev-Ops",
"Web Developers"
],
"price": 999,
"cover": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487939101/training-hub/vkqupkiqerq0kgjgd0km.png",
"scheduleUrl": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487842509/training-hub/r9y1qisyfelseeuzgtt3.png",
"trainer": {
"firstname": "Jemli",
"lastname": "Fathi",
"photo": "https://ucarecdn.com/5acb1b5f-b550-4560-b085-2d75384e5ec8/13567067_1064268623650904_3773193220506255422_n.jpg"
},
"courseCategory": {
"name": "Web Development"
},
"createdAt": "2017-02-23T09:37:19.758Z",
"updatedAt": "2017-02-24T12:31:32.078Z"
}
}]
What I need exactly is to filter courses like this:
(category1 or category2 ...) and (format1 or format2...) and
(language1 or language2)...
You may use bool queries combined with the terms query for the desired behavior
GET _search
{
"query" : {
"bool": {
"must": [ //AND
{
"terms": {
"category.name": [ // Category with value VALUE1 or VALUE2
"VALUE1",
"VALUE2"
]
}
},
{
"terms": {
"format": [// format with value FORMAT1 or FORMAT2
"FORMAT1",
"FORMAT2"
]
}
},
{
"terms": {
"language": [// language with value LANG1 or LANG2
"LANG1",
"LANG2"
]
}
}
]
}
}
}
Also, please make sure that the fields that are used for term matching, are not_analyzed.
Please read up on mappings starting here: http://www.elasticsearch.org/guide/reference/mapping/.
Elasticsearch Query String Query does the same what you are looking for.
Look at following code snippet extracted from official docs here
GET /_search
{
"query": {
"query_string": {
"query": "(content:this OR name:this) AND (content:that OR name:that)"
}
}
}
Hope this helps!

ArrowDB dashboard upload photo to user

when I upload a photo using the arrowdb dashboard on https://platform.appcelerator.com
Cloud.Users.query only shows the photo_id
but whne I created a new user using the dashboard and attached a phot it is showing in the Cloud.Users.query
eg . photo uploaded after user created
{
"id": "563019f18cb04aede69e2111",
"first_name": "store1",
"last_name": "123",
"created_at": "2015-10-28T00:42:25+0000",
"updated_at": "2016-01-22T08:59:44+0000",
"external_accounts": [],
"confirmed_at": "2015-10-28T00:42:25+0000",
"username": "user",
"admin": "false",
"stats": {
"photos": {
"total_count": 0
},
"storage": {
"used": 0
}
},
"photo_id": "56a1dc083a654d090d126792",
"friend_counts": {
"requests": 0,
"friends": 0
}
}
eg. photo uploaded while creating user
{
"id": "56a1f0333a65234234390d7",
"first_name": "qqqq",
"last_name": "wwwe",
"created_at": "2016-01-22T09:02:43+0000",
"updated_at": "2016-01-22T09:07:18+0000",
"external_accounts": [],
"confirmed_at": "2016-01-22T09:02:43+0000",
"username": "qwe",
"admin": "false",
"stats": {
"photos": {
"total_count": 0
},
"storage": {
"used": 0
}
},
"photo": {
"id": "56a1f0333a654d090d0390d8",
"filename": "userPhoto.jpg",
"size": 25394,
"md5": "e20f4fcadf6cde9fccfb458dd11951d4",
"created_at": "2016-01-22T09:02:43+0000",
"updated_at": "2016-01-22T09:02:43+0000",
"processed": true,
"urls": {
"original": "https://s3-us-west-1.amazonaws.com/storage-platform.cloud.appcelerator.com/xmqh1djNEIChtQFP6d37HNH5DQNCXQoX/photos/51/d4/56a1f0333a654d090d0390d9/userPhoto_original.jpg"
},
"content_type": "image/jpeg",
"user": {
"id": "56a1f0333a65234234390d7",
"first_name": "qqqq",
"last_name": "wwwe",
"created_at": "2016-01-22T09:02:43+0000",
"updated_at": "2016-01-22T09:07:18+0000",
"external_accounts": [],
"confirmed_at": "2016-01-22T09:02:43+0000",
"username": "qwe",
"admin": "false",
"stats": {
"photos": {
"total_count": 0
},
"storage": {
"used": 0
}
},
"photo_id": "56a1f0333a654d090d0390d8",
"friend_counts": {
"requests": 0,
"friends": 0
}
}
},
"friend_counts": {
"requests": 0,
"friends": 0
}
}
basically the user which had photo uploaded during creation shows this extra info
"photo": {
"id": "56a1f0333a654d090d0390d8",
"filename": "userPhoto.jpg",
"size": 25394,
"md5": "e20f4fcadf6cde9fccfb458dd11951d4",
"created_at": "2016-01-22T09:02:43+0000",
"updated_at": "2016-01-22T09:02:43+0000",
"processed": true,
"urls": {
"original": "https://s3-us-west-1.amazonaws.com/storage-platform.cloud.appcelerator.com/xmqh1djNEIChtQFP6d37HNH5DQNCXQoX/photos/51/d4/56a1f0333a654d090d0390d9/userPhoto_original.jpg"
},
"content_type": "image/jpeg",
"user": {
"id": "56a1f0333a65234234390d7",
"first_name": "qqqq",
"last_name": "wwwe",
"created_at": "2016-01-22T09:02:43+0000",
"updated_at": "2016-01-22T09:07:18+0000",
"external_accounts": [],
"confirmed_at": "2016-01-22T09:02:43+0000",
"username": "qwe",
"admin": "false",
"stats": {
"photos": {
"total_count": 0
},
"storage": {
"used": 0
}
},
I've been experiencing also that kind of problem, plus I've noticed that the images won't show up from the ArrowDB user interface (not sure if related).
The API returns only an empty object when you query the model:
photo: {}
I've been creating a ticket for the ArrowDB user interface https://jira.appcelerator.org/browse/API-1277.
sachinmw did you create a ticket for the initial problem yet?
A workaround could be to use the photo_id and run another separate query to retrieve the photo model but this is not good for network optimisation purposes.
EDIT
Ok after dealing with Appcelerator directly about having an empty photo: {} Object return the answer is pretty simple:
Whenever you use the query() function like Cloud.Objects.query() from ti.cloud on any ArrowDB object, there is a parameter called response_json_depth which is set to 1 by default and will only return one level of the JSON Object returned by the API.
Without touching that parameter I was seeing:
{
"Vehicle": [
{
"name": "foo",
"photo: {}
}
]
}
By setting response_json_depth to 3 I managed to have:
{
"Vehicle": [
{
"name": "foo",
"photo: {
"urls": {
"original": "http://bar.com"
}
}
}
]
}
Hope that will help someone. The also applies for the Cloud.Objects.show() method for any ArrowDB object.

Resources