More specific way to track companies on Google Analytics - events

I'm job searching now, and I created a resume website which is a good way for me to track what companies are interested in me for. Also, I sometimes can tell which companies specifically have looked at me by going to the "Service Provider" dimension. However, sometimes they have a generic network (like Verizon or Charter) so I don't know. Is there another dimension, or is there another way for me to try to figure out which specific company is looking at me? In your answer, you don't have to be super specific because my question is pretty broad.

Related

What are the steps of using historical chat data in RASA

There's a crucial part in the process that says, the best place for the chatbot to learn is from real users, what if I already have that data, and would like to test the model on it.
Think of Interactive Learning, but in scale and possibly automated. Does such a feature already exist within RASA?
Think of Interactive Learning, but in scale and possibly automated.
Are you referring to something like reinforcement learning? Unfortunately something like that currently doesn't exist. Measuring the success of conversations is a tough problem (e.g. some users might give you positive feedback when the bot solved their problem, while others would simply leave the conversation). Something like external business metrics could do the trick (e.g. whether the user turned out buying something from you within the next 24h), but it's still hard. Another problem is that you probably want to have some degree of control over how your chatbot interacts with your users. Training the bot on user conversations without any double checking could potentially lead to problems (e.g. Microsoft once had an AI trained on Twitter data, which didn't turn out well).
Rasa is offering Rasa X for learning from real conversations. The community edition is a free, closed source product which helps you monitor and annotate real user conversations quickly.
Disclaimer: I am a software engineer working at Rasa.

Scale of designing an IP Geolocation Algorithm

just a bit of background. I'm quite new to software development so am not very good at assessing how big a project is. Recently, I've been asked to look into how to Geolocate an IP Address without using any existing databases.
I looked through some research papers and learned a bit about delay-based Geolocation Techniques. It seems if I want to implement an IP Address without using an existing database, I would basically need to:
1) Get access to a bunch of different servers I know the physical location of and be able to send pings with them to a bunch of other servers I know the location of and the target IP Address
2) Compare the delays I received from the servers I know the location of with the target IP address to get the best match
It seems to me that this is quite a massive project to setup, though there's a bunch of research papers that use these techniques so I'm not sure if the scale is really as big as I'm expecting. Anyways, my main question is that is setting up something like this as big of a problem as I originally suspected or is it actually quite simple if you have the servers available?
Almost every Geolocation related question falls back to using a database and the main reason I'm looking into an algorithm that doesn't is because we already have a database but want this as sort of a backup or validator, and I want to try and figure out if this is worth pursuing.
There are many researches done since 2000 related to this technique. You can search "Topology-based Geolocation" in Google Scholar and read the research papers. You can add value to the research by not reinventing the wheels.

What is pump.io?

Recently I have been looking into the development of social networks and I often find references to pump.io. There is however very limited information available on what pump.io actually is. The official website says nothing more than: "It's a stream server that does most of what people really want from a social network." I found some more information on this website (http://slid.es/evanp/understanding-pumpio/fullscreen#/) but that still doesn't say a lot to me.
Could someone please provide an elaborate discussion on what pump.io actually is (and does) to someone who does not know anything about (activity) stream servers? Maybe the better question is: "What is an activity stream server?"
Yeah, the term is one a lot of people are unfamiliar with and it makes a couple of distinctions that aren't immediately obvious even if you use and post to a pump.io site.
pump.io, as it is distributed, is really two programs with different sets of functions. One is the Activity Stream Server and the other is the Web Client.
At the risk of being pedantic, let me define each of the words. I know you know what the words mean, but I hope the specific contexts/usage will help:
Server: a program which distributes information (usually) across a
network.
Stream: a (usually) chronological series of some sorts of pieces of information.
Activity: a description or depiction of something someone is doing.
The Activity Stream Server is a program which distributes (server) a chronological series (stream) of posts about stuff people do (activities).
The distinction is important because the website part of a pump.io website is a client for the pump server—essentially no different from a desktop or smartphone pump.io client. It listens to the pump's stream of posts and sends new posts to the pump using the same API and data formats that standalone applications—or other pumps—do.
You could actually totally decouple the Web Client and have a fully-functioning pump.io instance without any website. Users on other pump sites could see your posts and you could see theirs, and you could comment back and forth. It would make no difference.
ActivityStream is a JSON-based data format to describe "activities". The specification of ActivityStream 2.0 can be found at https://www.w3.org/TR/activitystreams-core/ and the vocabulary of activities at https://www.w3.org/TR/activitystreams-vocabulary/. To get the feeling of how the data format looks like you can have a look at the few examples at https://www.w3.org/TR/activitystreams-core/#examples. More examples can be found throughout the two specifications.
pump.io is an activity stream server that does most of what people
really want a social network server to do.
That's a pretty packed sentence, I understand, but I can try to unwind
it a little.
"Activities" are the things we do in our on-line or off-line
life—waking up in the morning, going for a run, tasting a beer,
uploading a photo, adding a friend, eating a burrito, joining a group,
liking a blog post.
pump.io uses a simple JSON format to represent all these kinds of
activities and many more. It organizes activities into streams—time
ordered lists of activities, with the newest first. Most streams are
organized by theme, like: all the things that my friends did, or all
the things that I did, or all the things anyone has done to this
picture.
Programmers use a simple API to connect to a pump.io server and add
new activities. pump.io automatically organizes the activities into
streams and makes sure the activities get to the people who are
interested in them.
And, really, that's what we want from a social network
Behrenshausen, B. (2013). 'Interview with Evan Prodromou, lead developer of pump.io'. Retrieved from: https://opensource.com/life/13/7/pump-io
If you peer a few centimeters down the page on the official website, you'll see:
What's it for? I post something and my followers see it. That's the
rough idea behind the pump.
There's an API defined in the API.md file. It uses activitystrea.ms
JSON as the main data and command format.
You can post almost anything that can be represented with activity
streams -- short or long text, bookmarks, images, video, audio,
events, geo checkins. You can follow friends, create lists of people,
and so on.
The software is useful for at least these scenarios:
Mobile-first social networking
Activity stream functionality for an existing app
Experimenting with social software
Those last 3 items hopefully answer your question.
Currently, you can:
install the nodejs-based pump.io server
(or) sign up for an account on a public service
post notes and pictures with configurable permissions
log in to web and client applications using your webfinger ID

How to I block bad bots from my site without interfering with real users?

I want to keep no-good scrapers (aka. bad bots that by defintition ignores robots.txt) that steal content and consume bandwidth off my site. At the same time, I do not want to interfere with the user experience of legitimate human users, or stop well-behaved bots (such as Googlebot) from indexing the site.
The standard method for dealing with this has already been described here: Tactics for dealing with misbehaving robots. However, the solution presented and upvoted in that thread is not what I am looking for.
Some bad bots connect through tor or botnets, which means that their IP address is ephemeral and may well belong to a human being using a compromised computer.
I've therefore been thinking about how to improve the industry standard method by letting the "false positives" (i.e. humans) that has their IP blacklisted get access to my website again. One idea is to stop blocking these IPs outright, and instead asking them to pass a CAPTCHA before being allowed access. While I consider CAPTCHA to be a PITA for legitimate users, vetting suspected bad bots with a CAPTCHA seems to be a better solution than blocking access for these IPs completely. By tracking the session of users that completes the CAPTCHA, I should be able to determine whether they are human (and should have their IP removed from the blacklist), or robots smart enough to solve a CAPTCHA, placing them on an even blacker list.
However, before I go ahead and implement this idea, I want to ask the good people here if they foresee any problems or weaknesses (I am already aware that some CAPTCHAs has been broken - but I think that I shall be able to handle that).
The question I believe is whether or not there are foreseeable problems with captcha. Before I dive into that, I also want to address the point of how you plan on catching bots to challenge them with a captcha. TOR and proxy nodes change regularly so that IP list will need to be constantly updated. You can use Maxmind for a decent list of proxy addresses as your baseline. You can also find services that update the addresses of all the TOR nodes. But not all bad bots come from those two vectors, so you need find other ways of catching bots. If you add in rate limiting and spam lists then you should get to over 50% of the bad bots. Other tactics really have to be custom built around your site.
Now to talk about problems with Captchas. First, there are services like http://deathbycaptcha.com/. I dont know if I need to elaborate on that one, but it kind of renders your approach useless. Many of the other ways people get around Captcha's are using OCR software. The better the Captcha is at beating OCR, the harder it is going to be on your users. Also, many Captcha systems use client side cookies that someone can solve once and then upload to all their bots.
Most famous I think is Karl Groves's list of 28 ways to beat Captcha. http://www.karlgroves.com/2013/02/09/list-of-resources-breaking-captcha/
For full disclosure, I am a cofounder of Distil Networks, a SaaS solution to block bots. I often pitch our software as a more sophisticated system than simply using captcha and building it yourself so my opinion of the effectivity of your solution is biased.

getting started with Single Sign On / Windows Authentication

First off, The Problem:
We have a Web App with a Flash front-end that talks to our ASP.NET web service via SOAP which then deals with all of our server side code (C#).
Right now, we implement a simple user sign on in our application, storing the info in our MSSQL DB.
A client has requested what I understand to be Windows authentication through our application using the currently logged in user.
So, I have been tasked with investigating this. Nobody, including myself, has any experience in this area.
I have been reading up on some basic Active Directory information, and some simple tutorials. I understand how to get access to the directory using ADSI through code. What I'm really interested in seeing is how the entire thing should be architected. I don't want to throw together a hacky solution.
Does anyone know of a good tutorial for this kind of thing or have any advice on getting started? More importantly, does this even sound viable?
I know I haven't given much information, but feel free to ask and I will provide answers.
Thanks.
Edit:
Will, to give you an idea of the scope of this, the network will include every computer in a large hospital. So yes, this is huge. Clearly I need to start small. I would like to come up with something that will work at my office first. Maybe ~10 Windows computers on a single domain. One Domain Controller.
I am also open to any good books on the subject.
If you are going to tie into Active Directory you will want to take a look at the System.DirectoryServices namespace. The implementations can vary wildly depending on your system architecture, but this should give you a good starting point.
Enjoy!

Resources