Firebase many to many performance - performance

I'm wondering about the performance of Firebase when making n + 1 queries. Let's consider the example in this article https://www.firebase.com/blog/2013-04-12-denormalizing-is-normal.html where a link has many comments. If I want to get all of the comments for a link I have to:
Make 1 query to get the index of comments under the link
For each comment ID, make a query to get that comment.
Here's the sample code from that article that fetches all comments belonging to a link:
var commentsRef = new Firebase("https://awesome.firebaseio-demo.com/comments");
var linkRef = new Firebase("https://awesome.firebaseio-demo.com/links");
var linkCommentsRef = linkRef.child(LINK_ID).child("comments");
linkCommentsRef.on("child_added", function(snap) {
commentsRef.child(snap.key()).once("value", function() {
// Render the comment on the link page.
));
});
I'm wondering if this is a performance concern as compared to the equivalent of this query if I were using a SQL database where I could make a single query on comments: SELECT * FROM comments WHERE link_id = LINK_ID clause.
Imagine I have a link with 1000 comments. In SQL this would be a single query, but in Firebase this would be 1001 queries. Should I be worried about the performance of this?

One thing to keep in mind is that Firebase works over web sockets (where available), so while there may be 1001 round trips there is only one connection that needs to be established. Also: a lot of the round trips will be happening in parallel. So you might be surprised at how much time this takes.
Should I worry about this?
In general people over-estimate the amount of use they'll get. So (again: in general) I recommend that you don't worry about it until you actually have that many comments. But from day 1, ensure that nothing you do today precludes optimizing later.
One way to optimize is to further denormalize your data. If you already know that you need all comments every time you render an article, you can also consider duplicating the comments into the article.
A fairly common scenario:
/users
twitter:4784
name: "Frank van Puffelen"
otherData: ....
/messages
-J4377684
text: "Hello world"
uid: "twitter:4784"
name: "Frank van Puffelen"
-J4377964
text: "Welcome to StackOverflow"
uid: "twitter:4784"
name: "Frank van Puffelen"
So in the above data snippet I store both the user's uid and their name for every message. While I could look up the name from the uid, having the name in the messages means I can display the messages without the lookup. I'm also keeping the uid, so that I provide a link to the user's profile page (or other message).
We recently had a good question about this, where I wrote more about the approaches I consider for keeping the derived data up to date: How to write denormalized data in Firebase

Related

Number of restaurants with specific cuisine in each country

I am trying to figure out how many restaurants, in each country, there are of a specific cuisine (seafood). I have looked at Google Places Api and TripAdvisor Api, but cannot find these numbers. I don´t need the list of restaurants, only number of restaurants. I found OpenStreetMap which looked very promising. I downloaded data for Norway, but the numbers are not correct (osmium tags-filter norway-latest.osm.pbf cuisine=seafood) = 62, which is way to low.
Any suggestion for how and where I can find what I am looking for?
Extrapolate.
You won't get an accurate answer, how do you even define what a seafood restaurant is?
Find out roughly how many restaurants there are in the area you are interested in and then decide what % of them might be seafood restaurants.
You can use this approach to extract the data from OpenStreetMap:
https://gis.stackexchange.com/questions/363474/aggregate-number-of-features-by-country-in-overpass
You can run the query on http://overpass-turbo.eu/ (go to settings and chose the kumi-systems server).
The query could look like this:
// Define fields for csv output
[out:csv(name, total)][timeout:2500];
//All countries
area["admin_level"=2];
// Count in each area
foreach->.regio(
// Collect all Nodes with highway=milestone in the current area
( node(area.regio)[cuisine=seafood];
way(area.regio)[cuisine=seafood];
rel(area.regio)[cuisine=seafood];);
// assemble the output
make count name = regio.set(t["name:en"]),
total = count(nodes) + count(ways) + count(relations);
out;
);
This query can take a long time (at the time of writing, mine did not yet finish)
You can also run the query via curl in on some server and let the results mailed to you via curl ....... | mail -s "Overpass Result" yourmail#example.com. You get the curl command in the browser network tab by "copy curl"
I also considered Taginfo (https://taginfo.openstreetmap.org/tags/cuisine=seafood) but it cannot filter by tag.

how to handle spelling mistake(typos) in entity extraction in Rasa NLU?

I have few intents in my training set(nlu_data.md file) with sufficient amount of training examples under each intent.
Following is an example,
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
I have added multiple sentences like this.
At the time of testing, all sentences in training file are working fine. But if any input query is having spelling mistake e.g, hotol/hetel/hotele for hotel keyword then Rasa NLU is unable to extract it as an entity.
I want to resolve this issue.
I am allowed to change only training data, also restricted not to write any custom component for this.
To handle spelling mistakes like this in entities, you should add these examples to your training data. So something like this:
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
- looking for a [hotol](place) in Chennai
- [hetel](place) in Berlin please
Once you've added enough examples, the model should be able to generalise from the sentence structure.
If you're not using it already, it also makes sense to use the character-level CountVectorFeaturizer. That should be in the default pipeline described on this page already
One thing I would highly suggest you to use is to use look-up tables with fuzzywuzzy matching. If you have limited number of entities (like country names) look-up tables are quite fast, and fuzzy matching catches typos when that entity exists in your look-up table (searching for typo variations of those entities). There's a whole blogpost about it here: on Rasa.
There's a working implementation of fuzzy wuzzy as a custom component:
class FuzzyExtractor(Component):
name = "FuzzyExtractor"
provides = ["entities"]
requires = ["tokens"]
defaults = {}
language_list ["en"]
threshold = 90
def __init__(self, component_config=None, *args):
super(FuzzyExtractor, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):
pass
def process(self, message, **kwargs):
entities = list(message.get('entities'))
# Get file path of lookup table in json format
cur_path = os.path.dirname(__file__)
if os.name == 'nt':
partial_lookup_file_path = '..\\data\\lookup_master.json'
else:
partial_lookup_file_path = '../data/lookup_master.json'
lookup_file_path = os.path.join(cur_path, partial_lookup_file_path)
with open(lookup_file_path, 'r') as file:
lookup_data = json.load(file)['data']
tokens = message.get('tokens')
for token in tokens:
# STOP_WORDS is just a dictionary of stop words from NLTK
if token.text not in STOP_WORDS:
fuzzy_results = process.extract(
token.text,
lookup_data,
processor=lambda a: a['value']
if isinstance(a, dict) else a,
limit=10)
for result, confidence in fuzzy_results:
if confidence >= self.threshold:
entities.append({
"start": token.offset,
"end": token.end,
"value": token.text,
"fuzzy_value": result["value"],
"confidence": confidence,
"entity": result["entity"]
})
file.close()
message.set("entities", entities, add_to_output=True)
But I didn't implement it, it was implemented and validated here: Rasa forum
Then you will just pass it to your NLU pipeline in config.yml file.
Its a strange request that they ask you not to change the code or do custom components.
The approach you would have to take would be to use entity synonyms. A slight edit on a previous answer:
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
- looking for a [hotol](place:hotel) in Chennai
- [hetel](place:hotel) in Berlin please
This way even if the user enters a typo, the correct entity will be extracted. If you want this to be foolproof, I do not recommend hand-editing the intents. Use some kind of automated tool for generating the training data. E.g. Generate misspelled words (typos)
First of all, add samples for the most common typos for your entities as advised here
Beyond this, you need a spellchecker.
I am not sure whether there is a single library that can be used in the pipeline, but if not you need to create a custom component. Otherwise, dealing with only training data is not feasible. You can't create samples for each typo.
Using Fuzzywuzzy is one of the ways, generally, it is slow and it doesn't solve all the issues.
Universal Encoder is another solution.
There should be more options for spell correction, but you will need to write code in any way.

Kendo ui grid (web) row height , fixed cell height

One of the column in my Kendo grid is the "Notes Column" (which has atleast 3000 characters).
Now the problem I'm facing is the grid cell(along with the row) expands to the size of the characters in the cell. It makes my cell huge.
I would like to make the grid cell single line with a fixed amount of characters and have a tooltip on the cell.
I'm not sure whether I can achieve it.
Please let me know the possible solution for the above case.
HAve tried some css changes :
.k-grid tbody tr{height:38px;} //Not working
Sample data in a cell in the Column (Notes):
17/11/2010 - Not received many enquiries, uses Extra Sure and Holmans. Finds our Medical Screening too lengthy. Took her through system and printed off Mes Screen questions, will use us more than N J Heritage. She will aim to use us for new enquiries.16/02/11 - L/M - R/C.08/03/2011 - JCO - Spoke to Matthew Salmon, not finding us competitive been using ExtraSure who are a lot cheaper. Advised our USP's and our benefits. Will use us for next travel enquiry.16/06/11 - Jane spoke to Richard and will be taking him through the system, SunWorld Plus.12/10/11-I Have spoken to Richard, I have emailed across details of Sunworld Extra & medical screening along with Username & Password. MP01/11/2011 - JCO spoke to Richard, Richard issuing a quotation today, likes that we don't have any age limits and restricted to 85 on AMT policies. First time to use us, usually use Citybond, likes the look of our product. JCO advised to contact me if requires detailed explanation of system. 25/05/12 - MW - Spoke to Richard, he is very nice, I thanked them for their continued support in using SunWorld and I am sending him the email with the Special Features and the SunWorld Extra info.. He said he rates SunWorld 8 and a half out of ten because he would like to see the option to increase the single article limit and also the rates have gone up quite a bit recently.. He said they don't really do that much travel but they are going to be pushing it over the next year because he said he thinks people are getting fed up of going on the internet to get insurance and realising they aren't actually covered for anything... He is generally happy with everything.08/08/12 - MW - Spoke to Luke, I asked why they hadn't used us since June and he said it was just because of a slow down in enquiries. Travel is not something they push, they just offer it to accommodate their existing clients. He said the only use us and one other provider so any enquiries they do get they always quote with us, he has done some quotes this week but they haven't come back. They are very happy with everything. No problems etc. I am sending him the Special Features for 2012 and also SunWorld Extra info,16/08/12 - MW - Spoke to Richard, asked if they would be interested in having a poster, he said it would not really be of any use to them as they are not a high street broker, they are in an office and not customer facing, he said they don't really do much travel, they are mainly a commercial broker but they are happy with any travel business they can do, what they would like is a flyer as opposed to a poster so they can email it out to their customers. He said he thinks the product is great, likes the age limits and limits etc,14/11/12 - MW - Spoke to Luke, told him about Snowman Cover and Broker Survey.04/12/12 - MW - Spoke to Luke, he said they are really quiet at the moment. Only using SunWorld but just not getting the enquiries. he is happy with SunWorld though and I have told him about the Changes for 2013.08/02/13 - MW - I can see that they said back in August that they would like some leaflets so I am sending them some out.28/02/13 - MW - Luke has sent this email - "Sorry, not sure if you are still doing this but can we havesome leaflets to send with our renewals to try and offer your services J thanks" , So I am sending them out some more leaflets.01/05/13 - MW - Spoke to Richard, he said the main person who does the travel, Luke, is on holiday in Turkey for the rest of the week and will be back on Tuesday. I have made a note in my diary to give him a call back on Wednesday.08/05/13 - MW - Spoke to Luke, he said SunWorld are their main travel provider, they have not really had many enquiries for travel lately. He said our rates are competitive for annual but people can get travel insurance so cheap online now that he thinks they have just been doing that. I said we will reduce the rates for him and he said that would be good. They are also set up to use us via the AXA route which he said is fine to be deactivated and they will carry on using this one as they have been. He said he would like some leaflets as he never received the last lot so I have checked his address and I am sending out 20 more. I told him about the new product and I am sending him the email with the Underwriting Changes and Special Features for 2013. RATES REDUCED01/07/13 - MW - Spoke to Luke, he said they are only using SunWorld so they must have not had any enquiries and that's why they haven't used us in the last month. He said they only really offer travel insurance to accommodate their existing clients, they don't really push for it. He said they have got the leaflets and they will be sending them out with renewals etc. As soon as they get the enquiries, we will be getting the business.13/09/13 - MW - Spoke to Richard, told him about the new product going live on the 1st October. I am sending him the email with the Underwriting Changes and Special Features for 2013. He said they would like some leaflets so I have confirmed their address and I am sending out 20.01/11/13 - MW - Spoke to Richard, he said they are very quiet at the moment, their customers are not going away and that's why they haven't issued anything. Luke is the main person who deals with this and he will be our contact going forward because he is the one who deals with it most of the time. I told Richard about the video tutorial and he is going to tell Luke and he will let us know if he has any queries. I am sending the email with the New Special Features and further information on the changes we have made to the website. luke.robson#aifltd.co.uk02/01/14 - MW - Spoke to Luke, I have confirmed all the contact information is correct and I have added his email address to the spreadsheet for the 2014 mailer and then he will forward it around to all the others. He said they only use SunWorld and it is the easiest to use, they just haven't had any enquiries for travel. I am sending him the email with the New Special Features and information about the changes we have made to the system. He said he has asked for some leaflets before but he has not received them. He really wants some to send out with all his renewals so I am sending him out 60 leaflets.
In addition to limit the height of the row, you have to say that the excess in text should be hidden.
Try adding the following style:
.k-grid tbody tr td {
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
See it here: http://jsfiddle.net/OnaBai/8U6rg/1/
For showing a tooltip you need to use a template that includes both as title and as content the value of the cell. Example of the column definition for a field called name:
{
field: "name",
title: "Name",
template: "<span title='${name}'>${name}</span>"
}
See it here: http://jsfiddle.net/OnaBai/8U6rg/3
EDIT: If you want to show an HTML formatted text in tooltip, you cannot use standard tooltips and you will have to use KendoUI tooltips.
To do so, you have to do:
In order to show the content of the cell as HTML, you should add the option encoded set to false, something like:
{
field: "name",
title: "Name",
encoded: false
}
Next, to use a KendoUI tooltip for this, what we are going to do is create a KendoUI Tooltip widget for each cell. You should do this once the grid is rendered, so I do it in Grid's dataBound handler:
dataBound: function() {
$("#grid").kendoTooltip({
...
});
}
To limit what to tooltip I'm going to mark the cells with the CSS class onabai, so now my column definition is:
{
field: "name",
title: "Name",
encoded: false,
template: "<span class='onabai'>#=name#</span>"
}
And the Tooltip in dataBound is:
dataBound: function() {
$("#grid").kendoTooltip({
filter: ".onabai",
position : "left",
width: 200,
...
});
}
But, we still have to say what is going to the content of the tooltip. To do so we have to define a content property and define a function that returns the content of the cell in the Grid. We do this using e.target.html()
dataBound: function() {
$("#grid").kendoTooltip({
filter: ".onabai",
position : "left",
width: 200,
content: function(e) {
return e.target.html();
}
});
You can see this running here: http://jsfiddle.net/OnaBai/8U6rg/8/

How to create unique ID in format xx-123 on rails

is it possible to create some unique ID for articles on rails?
For example, first article will get ID - aa-001,
second - aa-002
...
article #999 - aa-999,
article #1000 - ab-001 and so on?
Thanks in advance for your help!
The following method gives the next id in the sequence, given the one before:
def next_id(id, limit = 3, seperator = '-')
if id[/[0-9]+\z/] == ?9 * limit
"#{id[/\A[a-z]+/i].next}#{seperator}#{?0 * (limit - 1)}1"
else
id.next
end
end
> next_id("aa-009")
=> "aa-010"
> next_id("aa-999")
=> "ab-001"
The limit parameter specifies the number of digits. You can use as many prefix characters as you want.
Which means you could use it like this in your application:
> Post.last.special_id
=> "bc-999"
next_id(Post.last.special_id)
=> "bd-001"
However, I'm not sure I'd advice you to do it like this. Databases have smart methods to avoid race conditions for creating ids when entries are created concurrently. In Postgres, for example, it doesn't guarantee gapless ids.
This approach has no such mechanism, which could potentially lead to race conditions. However, if this is extremely unlikely to happen such in a case where you are the only one writing articles, you could do it anyway. I'm not exactly sure what you want to use this for, but you might want to look into to_param.
You may want to look into the FriendlyId gem. There’s also a Railscast on this topic which covers a manual approach as well as the usage of FriendlyId.

Sorting by counting the intersection of two lists in MongoDB

We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can't figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I'm currently using the pymongodb as I'm python developer.
You should create an index on tags:
db.posts.ensure_index([('tags', 1)])
and search for posts that share at least one tag with postA:
posts = list(db.posts.find({_id: {$ne: postA['_id']}, 'tags': {'$in': postA['tags']}}))
and finally, sort by intersection in Python:
key = lambda post: len(tag for tag in post['tags'] if tag in postA['tags'])
posts.sort(key=key, reverse=True)
Note that if postA shares at least one tag with a large number of other posts this won't perform well, because you'll send so much data from Mongo to your application; unfortunately there's no way to sort and limit by the size of the intersection using Mongo itself.

Resources