I have a simple, custom rolled chat here: ( http://ninjawars.net - essentially: ajax chat, php backend, javascript listing of chat messages, logged-in user input only ) that suffers from being able to be spammed. What are some simple systems to prevent spamming of a chat?
One thing (lowest level of protection) that I have already implemented:
Ignore consecutive duplicate messages from the same user.
Other ideas that I have:
Add consecutive messages from the same user together, instead of creating a separate message line. (relatively simple to implement, decreases the effect of spam but doesn't prevent it)
Prevent continued messages after a certain number of consecutive messages from one user, for new users. (relatively simple to implement)
Chat moderation by trusted users (complex to implement).
Are there any simple systems/algorithms to prevent chat message spamming that I should know about?
Put an increasing delay on how fast a user can reply. So after each message post store next_reply_time as a timestamp of NOW + 1 second. If they reply before the time has reached, ignore it and give a "Reply too fast" warning and set the next_reply_time to NOW + 2 seconds, and so on. This way if they stack up messages too fast, you'll ignore them for longer periods of time. This delay can of course be based on reputation.
Related
we have a chat bot that seems to be receiving messages from another bot. we'd like to ignore these messages, as responding to them leads to an infinite loop of ping pong between the two bots.
we were hoping to rely on activity.from.role as documented here, but it seems like that field is never set.
activity.from.id looks something like 28:app:00000000-dfae-4fe1-a068-80fe8fc61f2b_62b732f7-fc71-40bc-b27d-35efcb000000, and we are thinking that the only way to identify the account as a bot is by detecting the :app: in these IDs. this is sub-optimal, as this ID format is not part of the official API and could change at any time.
that said, how should we detect if an activity event is coming from a bot?
If you've to deal with potential bots from outside your organisation, a simple way could be to keep a dictionary of few last text exchanges indexed by userId or UserName in the Activity object. Then, at each POST received by your bot, check if the received text match fully one of the precedent message entries in this dictionary. If it is the case, then mark the related userId/UserName as a candidate for the bot role but continue to check further text exchanges in case a non bot user just said hi twice.
If the few following further exchanges doesn't meet anymore the full match requirement, unmark the userId/UserName as a potential bot. If there is marked UserId/UserName as candidate for bot role, apply the bot role to them if there's no more further exchanges past the full match entry or after a delay of your choice. For the latter, it might be useful to provoke a last text exchange after the delay to decide.
For the Watson/Eliza kind of bots, i recommended to check the speed of the exchanges, as far as i know, no human being can exchange more than twenty messages per second.
I'm working on architecting a micro-service solution where most code will be C# and most likely Angular for any front end. My question is about message chaining. I am still figuring out what message broker to use; Azure Service Bus , RabbitMQ, etc.. There is a concept which I haven't found much about.
How do I handle cases when I want to fire a message when a specific set of messages have fired. An example but not part of my actual solution: I want to say Notify someone when pays a bill. We send a message "PAIDBILL"
which will fire off microservices which will be processed independently:
FinanceService to Debit the ledger and fire "PaymentPosted"
EmailService: email Customer Saying thank you for paying the bill
"CustomerPaymentEmailSent"
DiscountService: Check if they get a discount for paying on time then send
"CustomerCanGetPaymentDiscount"
If all three messages have fired for the Same PAIDBILL: Message "PaymentPosted", "CustomerPaymentEmailSent", "CustomerCanGetPaymentDiscount"
then I want to email the customer that they will get a discount on their next bill. It Must be done AFTER all three have tiggered and the order doesn't matter. How do I Schedule a new message to be sent "EmailNextTimeDiscount" message, without having to poll for what messages have fired every minute, hour, day?
All I can think of is to have a SQL table which marks that each one is complete (by locking the table) and when the last one is filled then send off the message. Would this be a good solution? I find it an anti-pattern for the micro-service & message queue design.
If you're using messages (e.g. Service Bus / RabbitMQ), then I think the solution you have described is the best one. This type of design - where services have knowledge about the other domains in the system - is typically known as choreography.
You'll want to pick a service which will be responsible for this business logic. That service will need to receive all the preceding types of messages so that it can determine when (if) all have been met, which it probably wants to do by recording which of the gates have already passed in a database.
One alternative you could consider is chaining the business processes instead of doing them in parallel. So...
PAYBILL causes FinanceService to Debit the ledger and fire "PaymentPosted"
"PayentPosted" causes EmailService to email Customer Saying thank you for paying the bill and broadcasts "CustomerPaymentEmailSent"
"CustomerPaymentEmailSent" causes DicsountService to check if they get a discount for paying on Time then sends "CustomerCanGetPaymentDiscount"
The email you want to send is just triggered by "CustomerCanGetPaymentDiscount".
If I'm honest, I would switch around the dependency model you're using at this last stage. So, instead of some component listening for "CustomerCanGetPaymentDiscount" events from DiscountService and sending an email, I think I would instead have the DiscountService tell some other component to send an email. It seems natural to me for something that calculates discounts to know that an email should be sent. It seems less natural for something that sends emails to know about discounts (and everything else that needs emails sent). This is why I don't like architectures where the assumption is that every message should be an event and every action should be triggered by an event: it removes a lot of decisions about where domain logic can live, because the message receiver always has to know about the domain of the message sender, never vice versa.
This is more of a hypothetical question, so I can't really show any code examples. Imagine if a site like Twitter wanted to live-update stats on a Tweet via web sockets/Socket.io. In terms of performance, which of these would be the best approach?
Each action (like, retweet, reply) sends a message to the server, which then gets emitted to all clients, and the client is responsible for updating the appropriate tweet.
Each tweet the client loads is connected to a different room so that it only emits and receives messages relevant to itself.
Other?
Or perhaps it's dependent on the scale of the application? Maybe 1 is better if you had a Twitter clone with only a few users, whereas I would think 2 is better in Twitter's case because it's a matter of hundreds of "rooms" vs millions of signals/second? And if that's the case, at what point is one approach preferred over the other?
At scale, you do not want to be sending messages to clients that they did not ask for and do not have any use for. Imagine a twitter client that was receiving every single tweet being sent in real time. That could overwhelm that client and it would mean the server would be delivering every single tweet to every single connected client. That obviously doesn't scale on either the server side or the client side.
So option 1 is out.
The appropriate solution has the server send to the client only the messages that is has a particular interest in seeing. This works just fine at any scale. I can't tell whether your option 2 is that or not since rooms are just a tool for making groups of connections that you can send the same message to - they don't really decide who gets what message - that logic must be baked into your server code.
For a twitter-like service, it seems you're going to have to have a system where your server can easily tell which users have an interest in this particular new message. That can presumably be for a number of reasons such as they are following the author, they are following a hashtag present in the message, they are mentioned in the message, etc... That is server-side logic, not just simple rooms.
Currently I'm working on a SaaS with support for multiple tenants that can enable push notifications for their user-bases.
I'm thinking of using a message queue to store all pushes and send them with a separate service. That new service would need to read from the queue and send the push notifications.
My question now is: Do I need to come up with a complex sending strategy? I know that with GCM has a limit of 1000 devices per request, so this needs to be considered. I also can't wait for x pushes to fly in as this might delay a previous push from being sent. My next thought was to create a global array and fill it with pushes from the queue. A loop would then fetch that array every say 1 second and send pushes. This way pushes would get sent for sure and I wouldn't exceed the 1000 devices limit.
So ... although this might work I'm not sure if an infinite loop is the best way to go. I'm wondering if GCM / FCM even has a request limit? If not, I wouldn't need to aggregate the pushes in the first place and I could ditch the loop. I could simply fire a request for each push that gets pulled from the queue.
Any enlightenment on this topic or improvement of my prototypical algorithm would be great!
Do I need to come up with a complex sending strategy?
Not really. GCM/FCM is pretty simple enough. Just send the message towards the GCM/FCM server and it would queue it on it's own, then (as per it's behavior) send it as soon as possible.
I know that with GCM has a limit of 1000 devices per request, so this needs to be considered.
I think you're confusing the 1000 devices per request limit. The 1000 devices limit refers to the number of registration tokens you add in the list when using the registration_ids parameter:
This parameter specifies a list of devices (registration tokens, or IDs) receiving a multicast message. It must contain at least 1 and at most 1000 registration tokens.
This means you can only send to 1000 devices with the same message payload in a single request (you can then do a batch request (1000/each request) if you need to).
I'm wondering if GCM / FCM even has a request limit?
AFAIK, there is no such limit. Ditch the loop. Whenever you successfully send a message towards the GCM/FCM server, it will enqueue and keep the message until such time that it is available to send.
I'm coding a new website that will need users to enter their mobile phone number, the problem I'm facing is that I need to make sure that the user is in fact the owner of (or in this case, has access to) the mobile number.
The solution I've come up with is, upon number submission I send them a SMS with a token and ask the user to enter the token on my website, much like Google Calendar does. However I'm on a short budget and I need to make sure user A doesn't submit 100,000 mobile numbers, if that happens I'll be out of business in no time, since each SMS sent costs me about 0.10 USD.
So far, I've come up with the following solutions:
use a CAPTCHA (keeps some users away and it is still vulnerable to manual registrations)
limit the number of tokens a given IP address request (dynamic IPs, proxies, etc)
limit the number of tokens sent for a given mobile number (a user can request tokens for all the available numbers and when the real user tries to request a legitimate token, his number will be already blocked)
None of these solutions are perfect, how do you suggest I approach this problem?
In a recent project, we were associating SMS numbers with a user account. Each account needed a CAPTCHA and email activation. The user could activate SMS via token, like you are using.
You could rate limit IP addresses (not a total limit). No more than 10 requests from an IP within 5 minutes, or something like that.
And/or you could limit outstanding SMS requests. After an IP address requests a token for SMS, it must be submitted before that IP can request for another SMS number. Or no more than 10 outstanding SMS tokens per IP per day.
Also, like #Alan said, we put a cap on our SMS messages per month.
I would use a combination of CATPCHA and Limit the requests of a Given Mobile Number.
In addition you should be able to specify with your SMS aggregator a preset limit per month. After you reach that limit, service is shutoff. That way if you are a victim of an attack, you will only be liable for a limited amount of money.
Instead of SMS, you can make use of an automated service that calls a phone number speaks out a One Time Password (via Text 2 speech). These services are similar in pricing to SMS, and less likely to get spam abused, as there is more overhead.
Twilio cost $0.03 a minute, or in this case, $0.03 a call.
You could do what Twitter does, which is have the user text you the token (rather than you texting it to them).
This will require you to find a provider that let's you receive texts for free (or close to it), but that might be easier.
Why is SMS costing you a dime? Utilize the EMAIL address that is associated with every SMS system (at least here in the U.S).
http://www.sms411.net/2006/07/how-to-send-email-to-phone.html
If someone tries their best to abuse a system, they will more than likely find a way to do it.
Using a combination of the techniques you've already come up with is likely the best way to thwart most malicious users.
Limit what people can do (no more than 10 requests from 1 ip in 10 minutes, one phone number can only recieve 3 texts a week, captcha before number entry), but more importantly, if people have no control over the content of the message there's no real reason to exploit it.