July 17th, 2012 was the last time we were able to successfully update our browser extension on AMO, Mozilla’s extension directory. That was version 3.4.3. Many things have obviously happened since then, including new features, bug fixes and optimizations. We’re therefore now happy to announce that Surf Canyon version 5.4.0, fully compatible with Firefox 35.0 and capable of being installed without having to restart the browser, was approved on January 14th.
What took so long?
Recently, Firefox has been undergoing a massive architectural change that would result in better security and performance for Firefox. “Electrolysis” (e10s) is a new, multi-process architecture coming in a future release of Firefox. This new architecture separates Firefox’s core from open websites. An unfortunate trade-off was that older add-ons weren’t guaranteed to be supported. Surf Canyon would no longer run on the newly envisioned Firefox; nor would some of our favorite add-ons, such as “Greasemonkey.”
While older add-ons weren’t supported, anything written using the Firefox AddOn SDK (formerly known as “Jetpack”) were to remain unaffected. In order to stay on Firefox, we rewrote the add-on using the Firefox AddOn SDK. Moving to a new add-on framework, however, meant learning new tricks. For example, Firefox’s AddOn SDK makes Ajax calls somewhat tricky, since only restricted parts of the add-on are able to access the Ajax APIs. We implemented a “Callback registry” to solve this problem. If you would like to see the code, we’ve posted it on Pastebin.
Tags: Announcements · Code
From the very beginning, Surf Canyon has been using the query “dolphins” as an illustration of how real-time contextualization can significantly enhance the user’s search experience. In the field of information retrieval this query is a classic example of ambiguity: is the user looking for the football team or the animal? Other queries like “bears” (again, football team or animal), “SVM” (support vector machine or Silvercorp Metals) and “java” (programming language, coffee or island) are a few other classic examples.
What’s nice about these classic examples, and why they are often used, is that it is easy to categorize results into one intent or another. While Surf Canyon’s real-time contextualization generally delivers the most user value with relatively homogeneous result sets, during demonstrations it is helpful to be able to easily “see” the effects, or lack thereof, of a particular ranking function on disambiguation.
Now, however, thanks to the Onion, we have a result for the query “dolphins” that could potentially span the two otherwise distinct and separate possible user intents: Florida Resort Allows Guests to Swim with the Miami Dolphins. Are users who select this result interested in the football team or the animal? Hard to say, although they are most certainly looking for a laugh.
Picture from the Onion: Florida Resort Allows Guests To Swim With Miami Dolphins
Tags: Contextualization · Discovery · Fun
Surf Canyon CEO Mark Cramer was honored to be interviewed by Robert Scoble on May 27th. While we always enjoy talking about how our technology dramatically improves the search experience, it is especially exciting to do so with such a renowned blogger and evangelist, who has a studio with three cameras. Here is a table of contents:
0m40s – Introduction with Apple ][ nostalgia.
1m40s – Elevator pitch.
5m50s – Live demonstration.
8m10s – Discussion of real-time contextualization for search.
9m20s – Why doesn’t Google do this? Hard to say given that it works so well.
10m55s – Speculation regarding impact on advertisements.
12m35s – Review of business.
13m23s – Mobile discussion.
15m50s – Review of funding.
16m10s – What’s next?
Thank you, Robert!
Tags: Demonstration · Media · Presentations
When doing something that has never been done before (as we do), it can be challenging to describe it using familiar terminology in a way that doesn’t create confusion while still conveying the newness of the idea. Large companies are sometimes capable of creating new terminology that is then adopted by others, but this can be particularly difficult for small entities. As such, for the sake of clarity, we describe how we have referred to our technology over the years and how we have now settled on “real-time contextualized search.”
When Surf Canyon launched its ground-breaking technology for dynamically re-ordering search results in response real time behavioral signals, we called our product a “Discovery Engine for Search“. The technology was referred to as “real-time implicit personalization” or simply “real-time personalization.” Unfortunately, this created a bit of confusion with some people in the search community:
- The term “real-time” is an often abused and misunderstood. Technically “real-time computing” refers to guaranteeing a response “within strict time constraints.” More generally it is used to refer to a system that responds very quickly. “Quickly,” however, is subject to interpretation which is why we then sometimes referred to our technology as “instant” or “immediate personalization.”
- Additionally, the term “personalization” may also lead to misunderstanding. In information retrieval “personalization” generally refers to collecting data about an individual over an extended period of time in order to generate models of that particular person’s long-term preferences in order to then use those models to modestly adjust relevance scores for future queries. Despite efforts to clarify the distinction with “real-time personalization” the term “personalization” can lead to premature interpretation.
In 2009 the team at Surf Canyon authored a paper entitled “Demonstration of Improved Search Result Relevancy Using Real-Time Implicit Relevance Feedback.” The paper was subsequently published by SIGIR after Professor Thorsten Joachims at Cornell University offered a glowing review. “Real-time implicit relevance feedback” is a mouthful but seems to alleviate misconceptions caused by the term “personalization.”
The next year, a team of researchers at Cornell, lead by Professor Joachims, published a paper called “Dynamic Ranked Retrieval” which built upon our SIGIR research by running tests using labeled results to compute relevance metrics. The results were not real-world, but they were very impressive and their paper was selected as one of the six best at WSDM 2011. We found “dynamic ranked retrieval” to be more punchy than “real-time implicit relevance feedback.”
Nevertheless, while “dynamic ranked retrieval” has its appeal, a recent post in Search Engine Watch regarding Yahoo!’s interests in search offered this:
Contextual search works by algorithmically trying to determine what you really mean to search for, such as picking up cues from the immediate preceding searches, and presenting results based on that. [Emphasis added]
Algorithmically determining user intent by observing user interactions (“cues”) is what Surf Canyon has been doing since the very beginning. Our contextual search, however, is taken one large and very important step further – rather than waiting for subsequent searches in order to exploit user behavior signals, our technology immediately re-orders the result set in response to every user action that imparts additional understanding of the at-the-moment intent. As such, we henceforth declare that we develop Real-time Contextual Search.
Tags: - Top Posts - · Contextualization · Discovery · Personalization · Recommendations
Beware the malware that disguises itself as anti-malware.
tl;dr – What You Need to Know
If you are one of our users, here is what you need to know: we have never and will never develop malware (you can see our latest results from virustotal – image below) and you should never, ever, ever install another piece of software should you wish to remove one of our applications. We provide very easy to follow removal instructions for Surf Canyon and are always happy to be of assistance should anyone wish to contact us.
Avoiding potentially malicious programs while navigating the sea of computer software has never been easy, especially in the age of internet-fueled applications that hijack browsers, generate pop-ups, insert advertisements, track behavior and steal personal information in addition to many other unsavory things. Frustrated users seeking relief will inevitably search for anti-malware applications to rid their computers of such afflictions, which is a perfect opportunity for malware claiming to be anti-malware to attack an unsuspecting user.
Often called “scareware” or “ransomware”, these programs will purportedly scan a user’s computer for malware, report hundreds if not thousands of “infections” (which we have seen on completely clean machines) and then offer to “remove” them for fee. Here is a brand new machine:
Programs & Features Window from the Control Panel of a completely clean machine
When scanned with a popular anti-malware application it detected 199 “potential threats!”:
199 “Potential Threats Detected!” on a completely clean machine
The download and scan were free but their goal is clearly to get users to “Buy Premium.”
Distribution is achieved through aggressive SEO by going through directories of software applications and then building a webpage for each one with “removal instructions” which inevitably involve installing their anti-malware malware. To optimize the pages for search engines they label every software application as a “virus” or “malware”, even when it is not, and then insert “while technically not a virus…” language to prevent libel. Even if the anti-malware software is legitimate, this is still a winning strategy for distribution. Surf Canyon has naturally been a victim of this, but so have Yahoo’s toolbar and Bing’s toolbar, both of which are obviously neither viruses nor malware. (We’re not going to link to any anti-malware malware sites for fear of further increasing their popularity.)
Certainly there exists real malware in the world, and unfortunately quite a bit of it. Furthermore, it’s often difficult to remove these programs and so having anti-virus or malware protection offers real value. Like many things on the internet, users need to be wary; fear of malware can be exploited as easily as ignorance of it.
Surf Canyon is Perfectly Clean
February 14th, 2014 · 1 Comment
Anyone familiar with Etsy knows that it is a fantastic website for finding handmade and vintage items, and a wonderful resource for gift-giving. Now, thanks to their search API and Surf Canyon’s dynamic ranking technology, there is an even better way to search through the millions of items for sale on Etsy. Click over to the Surf Canyon – Etsy Demo to try for yourself.
If you run a search for “gloves” on Etsy you’ll be presented with almost 70,000 results. There are facets on the left for drilling down, and of course users have the option of reformulating their queries to something more precise, but Surf Canyon has always been about alleviating the cognitive load by automatically and immediately assisting users with finding what they need.
As such, a query for “gloves” on the Surf Canyon – Etsy Demo will produce a page that looks like this:
As it so happens, tomorrow is Valentine’s Day, so perhaps the very first result fancies you, and so you select it. If your shopping is done, congratulations. It’s very rare indeed to find something to purchases with only a single click! More likely, however, you’ll return to the search page to check out more, which is when you’ll helpfully be provided with real-time recommendations from Surf Canyon’s dynamic ranking engine:
As you can see, these Valentine’s Day glove results are coming from pages 35, 37 and 41. It is hard to believe that anyone would ever dig that deep for a search result, but here they are helpful and automatically brought to page 1.
Every selection you make and every result you skip gives Surf Canyon’s dynamic search engine more information about your information need, enabling a superior ranking every step of the way. Clicking “More Results” on page 1 will produce more re-ranked results on page 1. Moving over to page 2 will the produce a second page of results tailored in real time to your needs.
In this particular example, you get more of what you want – gloves for Valentine’s Day:
The results are coming from pages 29, 23 and, somewhat amazingly, 88. Clicking a result on page 2 produces yet more re-ranked results, presented as “recommendations,” and the process continues.
Surf Canyon’s technology has been proven to deliver dramatic improvement in relevance, so give it a try!
Tags: Demonstration · Personalization · Recommendations
The College Humor video below is NSFW, contains some profanity and mild sexual situations, but we’re posting it here because not only is it amusing, but it is an interesting insight into the relationship between users and search engines.
People struggle to find the “magic” set of keywords to describe their information need. As we first mentioned back in 2007:
In their paper entitled “Beyond the Commons: Investigating the Value of Personalizing Web Search,” Teevan et al. state that, “Web queries are very short, and it is unlikely that a two- or three-word query can unambiguously describe a user’s informational goal.”
Simply put, it can be difficult for the user to accurately express what he or she is looking for with just a few words. Even with many words, depending on the type of query, it can be difficult to express what is ultimately desired from the results, regardless of whether the user is an expert in the domain or not. How many times in real life, speaking with real people, do people need to employ many words, over many sentences, to express a thought or need?
As a result, people will often ultimately end up fumbling and guessing with their search engine, which is humorously displayed in the video below.
Tags: Fun · Reformulation
To put together a demonstration of Surf Canyon’s search technology with Etsy’s products, we needed some Java code that would search Etsy and return a list of matching products. We assumed that this could probably be done in about 100 lines of Java code.
We also figured someone must have done this before, but after searching online the code that we found was much longer and a lot more complicated (mainly because it was trying to access parts of the Etsy API that required OAuth authentication) than what we had been expecting. However, just doing a product search doesn’t require any authentication, so we went ahead and wrote some much simpler code that did only what we needed.
The Java code, written by Mike Wertheim, is about 100 lines long and can be found at http://www.surfcanyon.com/EtsyListingFetcher.jsp.
The code makes use of Jackson (an open source Java library that handles JSON). To compile and run, download the two Jackson jar files (http://repo1.maven.org/maven2/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar and http://repo1.maven.org/maven2/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar) and put them in your Java CLASSPATH. Then copy the code from the web page (http://www.surfcanyon.com/EtsyListingFetcher.jsp) to the clipboard and paste it into a file called EtsyListingFetcher.java. (If you download the code directly, instead of doing copy and paste, you will probably run into problems with HTML entities.)
If you have any questions or feedback, please post them in the comments below or contact us.
Henry Feild in the Computer Science Department at Endicott College and James Allan in the Center for Intelligent Information Retrieval at the University of Massachusetts, Amherst, gave a quick mention to Surf Canyon in their paper entitled “Using CrowdLogger for In Situ Information Retrieval System Evaluation”:
… a popular tool called Surf Canyon modifies SERPs for major search engines by surfacing as-yet unseen search results from deeper in the rankings every time the user clicks on a result link (the goal is for the surfaced results to be similar to the clicked result).
Their paper discusses how CrowdLogger, an open-source browser extension for Firefox and Google Chrome, can be used as an in situ evaluation platform for “evaluating retrieval systems in the wild.” This is something Surf Canyon has been doing with its own retrieval system for many years.
In a previous post we mentioned the research on Dynamic Ranked Retrieval conducted by Professor Thorsten Joachims, Christina Brandt, Yisong Yue and Jacob Bank at Cornell University and that their paper was accepted for publication and then selected as one of the six Best Paper Candidates by WSDM 2011. While this is a bit belated, we are happy to have discovered the video of Professor Joachims presenting that paper.
From the 1m40s mark to the 3m20s mark a screen shot of Surf Canyon is used to demonstrate dynamic ranking in action. Professor Joachims then offers DCG analysis of a sample search senario and concludes, at the 5m50s, that, “by being dynamic, and adaptive, you can gain a lot of retrieval performance.” Surf Canyon is then mentioned at the 12m00s mark as an Interactive Information Retrieval Model. The Adaptivity Gain, defined previously as the increase in retrieval performance offered by dynamic ranking over traditional static ranking, calculated from empirical studies done on two collections of TREC queries labeled for multiple intents, is then presented at the 14m10s mark and described as “quite substantial” with NDCG going from 55% to 70%.
“When you think about how much effort search engines are spending to get a 1% improvement in NDCG, this is a lot and could potentially change the upper bound of how good you can get with a single ranking.” – Professor Thorsten Joachims
Tags: - Top Posts - · Presentations · Research