Blog

Posts from the category "Technology"

Here at Scout Labs, we know there are some big corporations out there that are still standardized on IE6. We work with some of them, which is why to date we’ve supported IE6. We didn’t want to cut off IE6 users or subject them to a substandard application experience. We know most of them are stuck on IE6 because of IT admins who overinvested in proprietary apps that ONLY work on IE6, have NEVER been updated and never will be, are purposefully holding them back.
Picture 7.png

But the time has come: Support for IE6 is officially over. Not only is IE6 subpar with respect to speed, stability, and security, it limits the options we have in developing new functionality that relies on more modern, standards based browsers, specifically Javascript dependent interactions. As of our Feb 2010 release, we will finally have hit the wall with IE6: IE6 just doesn’t support the Javascript dependent interactions that our new Assignments functionality, and to a lesser extent upgrades to our graphing and collaboration features, require.

This is a decision point that old skool internet companies like Yahoo and Web 2.0 companies like Facebook and bellwethers in the SAAS space like Salesforce have already gone past. Hell, 37signals phased out IE6 support in October of 2008, which is the Internet equivalent of the Nixon era. Even Europe is following suit. But for those of you still using IE6, here are some options:
Picture 8.png


  • If you have the necessary permissions on your computer, install and use any browser more modern than Internet Explorer 6. You can download Firefox or Chrome for free. As of Feb 2010 Scout Labs officially supports IE7, IE8, Firefox, Safari and Chrome.

  • Upgrade to Internet Explorer 7 or 8. Even IE 7 is faster, more reliable, and better supported by Microsoft than IE 6. Though we’d pick IE8 over IE7 any day.

  • If you don’t have the necessary permissions on your computer, find the person who does. If they wont help you, send them this link: http://www.ie6nomore.com/ Or this one: http://www.stoplivinginthepast.com/ Or this one, from a Microsoft employee: http://www.hanselman.com/blog/IE6WarningStopLivingInThePastGetOffOfIE6.aspx Or….you get the picture.

  • If you are denied permission to upgrade past IE6, go find a company executive who believes in the future. This is a great way for some up-and-comer to make everyone in the company more productive via upgraded internet tools and experiences, and themselves wildly popular (with all non-IT personnel) in the process.

The cool part is, now we get to support Chrome- which is a fun browser, and great news for users of Microsoft OS products of a more recent vintage. And for all you network admins who just can’t seem to get everyone off IE6 and Win2000? Better hurry up, before every SAAS app your workforce relies on becomes standard equipment on the corporate smartphone- and no one gives a hoot about that big old box with a ten year old browser on it, anyway.

Here at Scout Labs we figured the best holiday gift we could give our users would be some new features. So here’s a quick recap of recently released features, including some we deployed just last night:


  • New search OVERVIEW page. Everyone wanted a single screen dashboard that would aggregate the most telling graphs, the leading indicators, and most important social media content. Welcome to the new OVERVIEW page. Instead of clicking from tab to tab within your search, you can now get a snapshot of buzz volume, sentiment trend and top stories from Twitter, Blogs, and everywhere else on a single page.

Search Overview with Border 12 09.png
  • Interactive graphs. We were as disappointed as all of you when we had to pull back from our earlier interactive graphs implementation, which used Flash technologies not universally supported by corporate sanctioned browsers, and rely on an image based solution that was not clickable. But now interactive graphs are back, and they’re bigger and better than before. You can hover over a particular day to see the counts; click into spikes to read what happened; and of course still customize your date range within the last 6 months or export the data in a .csv.
    Interactive Graph 12 09.pngOne thing we did lose in the transition was the ability to export graphs as a .png. We’ll eventually bring it back for you, Steve Majewski, but in the meantime, take a screenshot- there new graphs are much better looking than their PNG predecessors!

    CC Alert 12 09.png
  • Ability to sign up your colleagues up to receive email alerts. Many of you asked for this feature because you wanted us to send your favorite email alerts directly to other team members, instead of having to forward them yourselves. Now, instead of forwarding Scout Labs alerts, you can simply CC other users on your alerts. And opting out is as simple as clicking on a link within the email. So now you can sign your teammates up for alerts for your brand, a competitor’s campaign, or whatever else you might be tracking.

  • Links to source included in exports. Now the number of links to each source is included in the export files. Mike Arauz and Spencer Waldron, that one was for you guys.

There will be even more great new features coming out in the New Year. Bring on 2010!

The internet is a messy place, and that is especially true for the blogosphere with plenty of spam blogs, link farms, comment spam, and other data that is undesirable for our users. At Scout Labs we do quite a bit of analysis on the data we are indexing in our system and use a number of tools to keep the quality of text content high. Our main tool in fighting spam is machine learning, which we use to identify “spammy” documents and suppress them in our results.

Our first goal has been to identify and suppress keyword spam. This narrow focus has allowed us to make rapid progress, and in initial tests our spam model catches over 70% of keyword spam (recall) while misclassifying less than 5% of non-spam as spam (that is fewer than 5% false positives). We now suppress results that appear as real blog posts on real blogs but look like this:

KeywordSpam.png

How do we do it? We use a set of processes and algorithms called machine learning to build a spam predictor. We start out with a large set of documents that people had identified as spam or non-spam. We then use machine learning to create a program (or model), that predicts if a person would identify a document as spam or non-spam. We created this initial set of documents either by judging documents ourselves, or use Amazon’s Mechanical Turk.

When we import documents, we save the judgment of the machine together with the document in our system, which allows us to do things like filter them from our search results or simply rank them lower than non-spam results, graph the number of spam vs. non-spam results, and do any number of other interesting things. For instance, we might discover new spam blogs by looking at what sources spammy documents come from.

At Scout Labs we are working hard to make our algorithms better and get closer to human quality, but we understand that machines get it wrong at times. With the release of our new “show spam” feature, users can see what text results we classified as spam and re-classify them as legitimate for their account, if they so choose. With the concurrent release of the “mark as spam” feature, we likewise enable users to make spam disappear from results, so users can improve the data everyone on their team is seeing.

Best of all, these features help us to create a focused set of training data that improves our spam classifier in our next round of training. What we especially like about this approach is that, over time, our system will be increasingly trained to identify as spam exactly what Scout Labs users think is spam because we are learning directly from them. So if you’re debating whether or not it’s worth it to mark a result as spam, just think of benefits you and every other user will reap when you do- and click that button!

Spammers are always devising new tricks, so the task of suppressing spam is never finished, but we are getting some really promising feedback on how the quality of our data stacks up against the competition (“I’ve been trying to get good clean results out of ‘competitive product’ for months now, and with Scout Labs the right stuff just pops out.”). However, we really want to hear how you think we are doing, so please drop us a line, and be sure to hit that “mark as spam” button liberally!

We’ve fielded a lot of great user questions since launch, and the number one area we’ve fielded them in is sentiment. This may be too much information for some of you, but if you really want the details, read on!

The sentiment feature in the Scout Labs application is the ability for the machine to judge whether or not the author of a story is expressing a positive or negative attitude towards a specific word or phrase. For those companies with only a few posts per day that they can judge for themselves, this feature is a nice to have. But for brand and product marketers looking at a significant volume of posts, this feature is essential to understanding changes in consumer opinion.

So how do we do it? How accurate is it? And how should you use it?

How we do it

Scout Labs’ sentiment is “entity specific”. What some products do when they produce “machine generated sentiment” is that they count happy words vs. sad words in a news article. The “tone” of the article is shown by the happy word count. Consider “I love baseball. My happiest memories in life are from sitting in the bleachers at Fenway. It’s the greatest game on earth. But guys like Bonds and A-Rod are bringing it down.” Despite the high “happy” word count, this does not express a positive opinion about Barry Bonds or Alex Rodriguez.

In the Scout Labs application, we don’t count happy words. We evaluate sentiment for each particular word or phrase you search for. We can tell that the sentiment for baseball is positive but negative for Bonds and Rodriguez. This is done via part of speech tagging: parsing the underlying semantic structure of a sentence and determining which emotion words apply to the key word. Emotion words come from dictionaries of standard English words and have been augmented with phrases and slang to better map to the world of social media. So Scout Labs’ sentiment is entity-specific, which is very important.

Scout Labs’ sentiment can be changed by users. We use confidence intervals to decide whether something is positive or negative, but if we get it wrong (more on that below), you can change the score, immediately updating that item for yourself and the rest of your team. Charts and graphs update immediately as well. And the really cool part is that every time a user changes a sentiment value, that item becomes a labeled piece of data that we can use to abstract out additional rules and add words and phrases for our dictionary. So our ability to detect sentiment just gets better over time.

Scout Labs can “backfill” sentiment data for the previous 3 months in less than a day. We have 3 months of live data in our app for our users right now (6 months soon). We can go backward and score all the posts from the last 3 months in less than 24 hours. So you will have complete sentiment trend for everything going forward and going backward within less than a day from creating a search (All other graphs — buzz, share of voice, etc.) are real time and have no lag time at all).

Does it work?

Yes. We have done extensive human vs. machine testing and it’s accurate in the 70-80% range, meaning our algorithm agrees with humans’ scores 70-80% of the time. This is only slightly less than humans agree with each other. Some other insights and findings from our testing:

  • College educated people with business experience agree on the sentiment ratings for a blog post about 85% of the time. Using less qualified people, such as you might find in a random Mechanical Turk experiment, produces lower rates of agreement. We were surprised that we couldn’t get that rate higher. Some of the discrepancy stems from the human tendency to equate negative opinions and negative information: “I hate Coke” is a negative opinion; “Merrill Lynch just downgraded Coca-Cola” is negative information.
  • The Scout Labs sentiment feature agrees with college educated people about 75% of the time. We try to pad that a little by being conservative about what we call positive or negative — we call things neutral if they’re borderline.
  • The Scout Labs sentiment feature sucks at detecting irony and sarcasm. Posts that are heavy on the irony often end up classed as “neutral” because the machine can’t even guess. Consider “Another winner from the almighty Microsoft.” That’s a tough one.
  • Machines don’t understand business context. Perhaps you work for Apple and every mention of an unlocked iPhone is negative because people shouldn’t unlock their iPhones. An algorithm that uses grammar and vocabulary based rules cannot classify this post as negative about iPhone: “I love my iphone. My boyfriend unlocked it for me last night.”

So the Sentiment feature produces a pretty good guess, about what you’d get using if you got a half dozen ratings from Mechanical Turk and chose the rating the most humans agreed on. (See this useful paper from the Dolores Labs blog about how to use Mechanical Turk to get reliable human judgments). And our best guess plus your teams’ efforts to quickly change the things we miss or get wrong means really high accuracy levels for you and your team with a minimum amount of work and expense.

How you should use the sentiment feature

  • To find the top positive and negative posts. Click on “Sentiment” and filter for positive or negative posts. You’ll get immediate insight into some forceful opinions about what is wrong — or what is right — about the product or brand you are searching for.
  • As a starting point for your own sentiment analysis. Any user can change the sentiment rating for any post. If you work for Apple and you want all those unlocked iPhone posts marked “negative,” you can do that. Just click on the sentiment icon and make the change. These changes will carry through to all graph data, so you can create accurate data sets to view in the application or export data for. We use your rating changes as machine learning inputs, but your specific ratings are proprietary and confidential to your workspace.
  • To get insight into consumer opinion via alerts. When you set up a daily, weekly or monthly alert for a search, you’ll get buzz, top news, new words, recent tweets, and the top positive and negative posts pushed to your inbox via a text email. It’s a great way to stay informed and know when to invest more attention.
  • To compare sentiment between brands or products. Do consumers like Symantec or Norton? The Lebron 6 or the KD1? Embarq or Comcast? Sentiment Trend graphs can help you see trends, spikes, and make comparisons.

We have heard over and over again from our users that an affordable, reliable way to assess sentiment, with user override built in, is critical to getting insight into social media, so we continue to work on this feature. We hope you’ll let us know how you want it to evolve in the future. We’ve already got a slew of new feature requests to work on, including more metrics, visualizations, and customizations. Get your ideas into the mix at support <at> scoutlabs <dot> com.