I want to figure out a way who have talked {view, like and comments) about a specific post on our Facebook's fan page. we want to examine the influence of the daily activities of fans on the timing and amount of a specific post. Influencers and the occurrence time of every "like" or "comment" should be collected. Is there any tool for modeling and clustering of post's data?
Thanks in advance.
This is definitely possible. I know this is not the forum to advertise a specific product, but I literally wrote this exact thing over the last couple months for our company. Check it out http://napkinlabs.com/apps/ it is called SuperFans.
Related
I have an RSS feed that outputs around 100 articles per day. I wish to filter it to include only the more popular links, perhaps filter it to 50 or less. Back in the day, I believe you could use "postrank" to do this, but is now defunct after Google acquisition.
Anyone know how I can filter a specific RSS feed to include only the more popular outputs?
Thank you!
Your question is quite open, but one way to measure "popularity" of a post would be to use social networks.
Each post is probably associated with a certain URL, which can referred to on social networks. On facebook, for instance, one can share or like it. You could now get the number of shares for each post, order your list based on them and pop the top 50 ones.
This SO question asks for such share stats and contains answers for several networks.
Caution: Be aware that you'd need to do at least one API call per article. On the one hand, this may take some time, on the other hand, you'd have be careful to not exceed quotas. You're probably best off by locally caching fetched posts and only refresh every few hours or once a day.
I am looking for algorithms that allow text extraction from websites. I do not mean "strip html", or any of the hundreds of libraries that allow this.
So for example for a news article I would like to identify the heading and all the text, but not the comments section and so on.
Are there any algorithms for that out there? Thank you!
In computer science literature this problem is usually referred to as the page segmentation or boiler plate detection problem. See the report Boilerplate Detection using Shallow Text Features and its related blog post. Also, I have a few reports and software sites bookmarked that address the problem. Also, see this stackoverflow question.
there are a few open source tools available that do similar article extraction tasks.
https://github.com/jiminoc/goose which was open source by Gravity.com
It has info on the wiki as well as the source you can view. There are dozens of unit tests that show the text extracted from various articles.
"Content extraction" is a very difficult topic. There are no common standards to identify the "main-article" content (there are several approaches to make HTML easier readably for crawlers, e.g. schema.org, but none of these is very popularly used).
So it turns out, that if you want good results, its probably best to define your own XPath selectors for each (news) website you want to scrape. Although there are some APIs for HTML content extraction, but as I said its very hard to develop an algorithm which works for every site.
Some APIs you could use:
alchemyapi.com
diffbot.com
boilerpipe-web.appspot.com
aylien.com
textracto.com
What you're trying to do is called "content extraction". It turns out to be a surprisingly hard problem to solve well, and many naive solutions do quite badly.
Instapaper and Readability both have to solve this, and you may learn something from looking at their solutions. They also both provide services that you may be able to take advantage of - perhaps you can outsource your problem to them and let their API take care of it. :)
Failing that, a search for "html content extraction" returns a great deal of useful results, including a number of papers on the subject.
I compared a few different libraries, and had really great luck with Mozilla's Readability library (Node), or its Python wrapper.
For example, take this CNN article: https://edition.cnn.com/2022/06/01/tech/elon-musk-tesla-ends-work-from-home/index.html
Readability successfully returns only the relevant data:
New York (CNN Business) Elon Musk is demanding that Tesla office workers return to in-person work or leave the company. The policy, disclosed in leaked emails Musk sent to Tesla's executive staff Tuesday, was first reported by electric vehicle news site Electrek. "Anyone who wishes to do remote work must be in the office for a minimum (and I mean *minimum*) of 40 hours per week or depart Tesla. This is less than we ask of factory workers," Musk wrote, adding that the office must be the employee's primary workplace where the other workers they regularly interact with are based — "not a remote branch office unrelated to the job duties." Musk said he would personally review any request for exemption from the policy, but that for the most part, "If you don't show up, we will assume you have resigned."
etc.
I think your best shoot is study what information can you get from the metadata and write a good html parser, oEmbed could be a good standard =)
https://oembed.com/#section7
I want to make a little free calendar program to help me and others calculate how much time we have got left in a project. I mean real working time, not just time. Time in a raw form is not saying much.
Typically when my boss tells me that I have time until 05-05-2011 it doesn't tell me really how much time I have to do my job.
You know...so many things stop me from work:
A) beeing at home, not at work (so called "free time" or "spare time"). That is in my case I work exactly 8 hours a day and then the cleaning ladies throw me out of the office with their incredible loud industrial vacuum cleaners every evening (my boss accepts that as an excuse to go home in time, regularly).
B) weekends, or more precisely saturdays and sundays
C) official holiday rescuing me from having to go to work.
what I want to do is make a little utility which tells me how many working hours I really have in a given time period.
The first two things A and B are pretty easy to implement. But the last thing C scares my pants off. Holidays. OOOHHH man. You know what that means. Chaos. Pure chaos.
The huge question is: HOW TO CALCULATE HOLIDAYS?!
Since I want my program to be useful for anyone anywhere in the world, I can't just hardcode all holidays for my little town.
So which options do I have?
I) I could hand-craft downloadable lists of holidays. Users search them within the application and download them from an webserver. Or I ship all of them in the package. But I would get very, very old if I tried that by myself for every country, state and town.
II) I make an initial data sheet with holidays for my town, and don't care about the rest. However, I make that sheet with an how-to public, so that everyone who feels like beeing very nice can provide holiday data for his country / region / whatever. Those are made public on a webserver and everyone can get the data packages he/she needs for the app.
III) ?
I care a lot about usability. I don't want to make an ugly linux hack style hard to use app that only computer freaks can use.
So you need to tell me more about holiday science. I was never really clever at this. I assume every single country in the world has it's own set of holidays. In every country there may be several states. For example the US has some, and Germany has also some states. Holidays vary from state to state. But I know from an good programmer he told me never assume anything. So the questions about holiday science are:
Which categories do I need to make holiday-data-packs searchable? A guy from India should find quickly his holiday data pack, and a guy from Sillicon Valley should find his pack as equally fast. It makes most sense to me to filter for COUNTRY > STATE > WHATEVER. Like a drill-down-search. Did I miss something?
What would be the best data format to hold holiday information? A holiday has a start and end date and a name. That should be enough. Would I put all this stuff in thousands of XML files?
How would you go about this? Any hint / help is highly welcome! Thanks to everyone!
We use a table.
It should not be that hard. If you look at your corporate holiday schedule you should be able to calculate the list of around 10 days. The only problem is that many of these are arbitrary. i.e Christmas falls on a Saturday so give the Friday before off.
Have you looked at this site to calculate the list of known Holidays ?
Many organizations post their holiday schedule on the web, it might be possible to read that and get the schedule ?
In this case, I would suggest that you are encountering less of an engineering problem and more of a data collection problem.
Rather than define a "definitive set of holidays" for each possible user, allow the user to easily setup his holidays. By offering a (usable, quick, easy) way for users to select holidays, you do not make any assumptions.
You could even make it "social" by allowing users to upload their selections - imagine your HR department takes 10 minutes to setup and upload a set of holidays for all your company employees. Now you just need to provide a way for everyone to find that set.
On another topic, I would suggest using a common format, like iCal to store your calendar data. Here's a page with some example iCal files.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I currently work in a small business (15-20 employees, 5 programmers) where most projects are custom built CMS and a few web applications products.
Since I started working there, I have worked on many projects, but specifications for each project vary a lot. Sometimes we get a little detail, a Word document telling what the client wants, and what we are suggesting (suggested form fields, a short description of display, etc.). Sometimes almost nothing except "do what you think is the best approach for this project/module/request".
My question to you guys, who might work in different kind of businesses, is: How (huge pile of paper? Word docs? Visios?) and what kind of information do you get from your superiors, managers, teamates when starting a project (plenty of analysis, drawings, etc.)? How much detail do you get on this?
Hope my question is clear enough, thank you.
Specs..that's kind of funny...how about never :(.
Seriously a lot of companies assume specs aren't needed, its absolutely unacceptable but this is how it is in a LOT of companies. They assume a one liner and the programmer knows what the program should do, the inputs / outputs and so on.
Unfortunately in my case I have to actually help write the specs..and Im the programmer :(.
I mostly get a lot of verbal direction and I use a voice recorder to record the conversation and transcribe it when I am done. I write my own specs from my customers' words.
Then, as a good consultant should, I take the writeup back to the customer and verify it, and get a signature and build it, and they live happily every after! (no they dont, they change their mind a 100 times)
It can vary depending on what group the work falls under:
Support request - If the change will take a short period of time and is fixing something broken, there is this group. This could be as simple as, "Add Bob to the list of authorized users for that ancient form" where the form is something written years ago and aside from adding and removing users, it isn't touched for fear of breaking things.
Service Advisory Committee request - Items that are up to a few days are in this group as these are kind of like mini-projects as the request may be to create a new form or portal for a group. This could be upgrading some 3rd party software where we have some customizations that make the upgrade not necessarily a simple thing for Operations to do.
Project - In this case there are usually a few Word documents and/or e-mail threads that help nail down requirements in terms of scope, budget, and time. These can take months though there is something to be said for having a prototype to change rather than creating the initial prototype to tell if requirements are really met or not. Course my current project is over a year old, still has a few more months to the timeline and already has a successor coming after it is done,i.e. there is a Phase II to go after Phase I.
Uber project - These merit their own group of documentation and are the million dollar, multiple company projects that usually try to document everything up front rarely works out well here. Thus, there is some adoptioon of agile for these but there are still some growing pains to go through as how we use agile matures. Think installing a dozen modules of some off-the-shelf software that requires both internal and external developers to customize the suite for our specific needs as the software is supposed to be very robust, flexible and help save lots of time and money on how people otherwise do their jobs generally. Think ERP or CRM for a couple of examples here.
We are a 16-person company that creates and supports customized software for small retail shop owners.
The projects we get fall into three general categories (as related to specs):
"Here, automate this form." A sales person explains that our customer only wants this form to appear where they can fill it out and print it to make it look professional to their customer. Our specs is a single piece of paper that looks something like an order form or report. This is always false; they want pop-up lookups, automatic updating from other sources, and "while you're at it" add-ons that more than double the time. These, we've learned to just live in the moment and let the project take its course. By the time we're done, the program doesn't look anything like their original form.
Small changes. Like a simple e-mail explaining that the background color is stale, or a request to sort a report by a different column. These, we just do as time allows.
Big company integrations, where we're tasked with making our software work with some big outfit like Intuit (QuickBook) or FedEx (shipping rates). These often have well thought out documentation and sample code. We get 100's of pages in word documents or pdfs. The problem with these is when their specs are wrong. We find out about inaccuracies when we try to test or certify our integration. In these instances, we usually take longer in certification than we did to originally develop the processes.
In all cases, the real trouble is when a sales person promises a solution to the customer before even asking a programmer what it would take. As recently as 2 weeks ago, a sales person got into real trouble and had to issue a refund (that person is no longer with the company).
None - at least not from management.
Instead, as a developer (and particularly one leading a software project right now), I'm expected to contact my users/customers/etc and work directly with them to come up with our specifications and requirements. The documentation I do request from my team is only what will be useful to the team. I am lucky in that management rarely requests a document that doesn't make sense or won't provide some use to our project.
I currently have a half-dozen or so specs each 60-80 pages. One of them is 80 pages with no table of contents. Good times.
Our Product Managers and senior engineers prepare three planning docs for our data management software projects.
High-level requirements: 1-to-3 sentence descriptions of hardware/software supported or specific feature for this project. (10-15 pages of Excel-like grids)
Technical details: Engineering implementation of each high-level requirements. Up to a page for each, depending on amount of detail. (30-40 pages of filled-in feature details)
Business agreement: Summary of 1 & 2 with engineering schedule and Product Mgmt's market analysis. Everyone signs off on this. (5 pages analysis, 20 technical)
I haven't seen work flows or other Visio-like details in our specs. The prioritized requirements and schedule prove critical, so we understand when to lop things off to save development and testing time.
I definitely appreciate a good interface and as a developer, I try to create them for my users. But appreciating a good interface and designing one are a different thing. I'm looking for good interfaces (such as IMHO StackOverflow, Gmail) as examples of good UI from which I can model my own UI's.
I personally think that Netflix has an excellent web UI. Responsive, easy to navigate. Not mutch CRUD going on, but I find it very comfortable.
Pretty much anything by google, really. They're all very simple and to the point, focusing on usability.
You should get yourself a copy of both Don't Make Me Think and The Non-Designer's Design Book for your base knowledge/insight.
From there, it's much easier for you to dissect and analyze the layouts you already know and like, and recreate them for your own amusement.
edit: To mitigate misunderstanding, the point I'm trying to make is that you probably don't need as many good examples of nice layouts, if you know what to look for. For example, I can be shown a thousand haute couture dresses, and I still couldn't make one myself, because I don't know what to look for.
My favorites
Stack Overflow: This is a WIKI so it's not a rep point grab. I just really love the interface on this site. Been to too many crappy Q/A sites
Google Reader
MSDN: It's gotten a ton better in recent years and is a great way to grab little esoteric details about various APIs
iStockPhoto.com it's simple, effective and handles a large amount of information and data without getting bogged down. It also doesn't get in the way of the info you are looking for.
A good user interface fulfills a specific need of its users effectively.
As an example, here is a site (translation) that I have created for finding out what food is available in the cafeterias of the University of Helsinki. The typical use case is that when a student is hungry, he needs to know what food is available in the neighborhood student cafeterias (which are cheap for students), so that he can choose where to eat and what. He knows where each of those cafeterias is, but does not know what food they have today.
That site shows all the needed information at once. Because the students typically have a couple of cafeterias where they go, they can either bookmark the page with those cafeterias selected, or save the selection as a cookie. After that they can reach their goal without any navigation on the web site.
I don't use it on a day-to-day basis, but I'm very impressed with the Perseus Project digital library.
Here's a link to a poem from Catullus' Carmina in Latin as an example of the interface. Some features that I really like:
Click on the bar near the top to jump to any poem in the work. Larger chunks of the bar represent larger sections of the work (poems, chapters, however that particular work is logically broken up by the author).
Click on a Latin word in the poem to bring up a window (be patient; it seems to take a while) with lexicon entries, user voting and statistics on the word form (i.e. what the inflection means in the context of the sentence; it can be ambiguous in Latin) and so forth.
There are a number of resources down the right column, including various English translations, notes, references, etc. Any of them can be either shown in the right column, or swapped out with whatever is in the main content area in the center.
One of my personal favs: newspond.com