By Jason Wilson
I first met Mike Dahn, who is now SVP Marketing & New Initiative Development at Thomson Reuters Legal, last January in Eagan during a beta preview of WestlawNext (WLN), and since then, we have kept up with one another over emails and the occasional beer when travel schedules permit. I recently caught up with him at AALL in Philadelphia, and we visited a while about business (we do share a common interest in legal publishing after all), WestlawNext, and (at the time) some of the recent criticisms of the platform. Afterwards, I felt like there were some unresolved issues that my readers might like to see addressed. So I asked Mike if he would be willing to answer a few questions by email, and he agreed. What follows is the result of several months worth of email exchanges and discussion. I think what he has to say you’ll find rather illuminating on the subject of WLN and WestSearch itself. As always, I encourage you to comment. I have no doubt that Mike and his team will be reading them.
JW: Can you tell the readers a little bit about your background, and your responsibilities at Thomson Reuters?
MD: Prior to joining West / Thomson Reuters, my background was in law librarianship and web development. I got a law degree and an MLIS and then worked at a law school for a couple years as a reference librarian and web developer. I then became the head of libraries and intranet development for a (then) 200-attorney law firm with six offices in Florida. At the firm, in addition to managing information services and intranet development, I negotiated vendor contracts and focused a lot on the cost recovery of our electronic research costs from clients. From there, I joined West in product development a little over eleven years ago.
My team and I spend nearly all of our time on the product development and marketing of WestlawNext and westlaw.com. In product development, we work closely with customers to understand what they’re trying to accomplish and why, and then we develop new products or make improvements to our existing products and services to help customers achieve their business goals better, faster, or both. In doing so, we consider both organizational and individual goals.
At the organizational level, we’re constantly asking ourselves: How can we help our customers get more business? Reduce their costs? Improve their profitability? For individuals, we keep asking: How can we get them to the answer faster? How can we help them understand the research issues better? How can we reduce the chance of missing something important?
In marketing, we work on communicating the value of our products and services to the marketplace in ways that showcase our unique value, how that value translates into business benefits for our customers, and why our solutions are more cost-effective than anything else on the market. Lately we’ve been working on a campaign that explains how modern law firms, corporate law departments, and other organizations are using WestlawNext to deliver better, faster legal services at lower costs. You can see examples of this at customers.westlawnext.com.
JW: What do you think the most common misconceptions among Thomson Reuters Legal customers are, whether related to the business of publishing or something else?
MD: Our customers are pretty savvy. I don’t think there are a lot of misconceptions among them, but there are two misconceptions that come up on a regular basis that are worth discussing.
- WestSearch “crowdsourcing” means I’ll only get popular materials and not the esoteric content I might need (or that my opponent will get the benefit of my genius)
- Westlaw Classic = Boolean Search and WestlawNext = Natural Language Search
Misconception 1: “Crowdsourcing”
We actually don’t crowdsource any part of the WestSearch algorithm. By that, I just mean to clear up an issue with the nomenclature quickly before we get into how we use customer usage patterns to produce superior search results. The Oxford English Dictionary defines crowdsourcing as “the practice whereby an organization enlists a number of freelancers, paid or unpaid, to work on a specific task or problem.”
For instance, Wikipedia uses crowdsourcing – many unpaid freelancers contribute full articles to the site, as well as edit the articles of others. Other famous examples of this are InnoCentive, the Netflix Prize, Idea Bounty, reCAPTCHA, and Amazon’s Mechanical Turk.
We don’t actually ask our customers – or make open calls to the public – to work on WestSearch algorithm improvements. I’m not trying to denounce crowdsourcing as a concept. It’s just not what we do with WestSearch.
To dramatically improve search beyond what standard keyword based search engines can do, our WestSearch algorithms primarily rely on our editorial enhancements, things like the Key Number System, KeyCite, Headnotes, Statutes Notes of Decision, and the language correlations we have in our proprietary indices – like “see also” references. We’ve literally been building up this collection of editorial enhancements for over a hundred years, and it provides both extraordinary search results and a significant competitive advantage over what others can do in the marketplace.
In addition to these editorial enhancements, our algorithms also consider customer usage patterns in the system. This, of course, is not an original idea. Google’s algorithms have been doing this for a long time, and it is a standard feature of modern search engines today.
Regarding how we employ these usage patterns in WestSearch, it’s important to keep three things in mind:
- Customer usage patterns are just a feature of our algorithms – they don’t represent the core of the algorithms.
- Because WestlawNext is a full-featured research application, we have a much richer source of usage information than most search engines, including Google.
- Since usage information has been used in search engines for well over a decade, the science for dealing with popularity issues is well developed.
With this in mind, the questions I typically get about the algorithm are:
What if I’m looking for something unpopular? I don’t always want to just get what’s popular.
And a related concern I get is:
I’m a good researcher, and I don’t want my research to be messed up by a bunch of bad researchers.
These are valid questions and concerns. In fact, we had the same concerns when we set out to build WestSearch. One of our concerns was about user experience – we wanted researchers to get very noticeably better results – better enough to pay a premium for our new product. It couldn’t just be arguably better – it had to be noticeably better. Our other concern was a competitive one. We were investing a lot of money in WestlawNext, and in the search engine specifically. If employing usage data drove most of the benefits in terms of precision and recall, then our competitors could turn around quickly and do similar things. We needed to find out what mattered most and why.
To address these concerns, we tested rigorously with literally hundreds of queries in a wide variety of practice areas and jurisdictions. We tested a variety of query types: simple and complex, one-word, few words, many words, common terms and rare terms, factual terms, legal terms, and a mix of the two. We also tested a variety of issue types: substantive and procedural, common and uncommon, simple and complex, state and federal, well-settled and emerging. And we tested our algorithms with and without actual customer usage information.
What we found was that our editorial enhancements drove the vast majority of the improvements in precision and recall, and that the customer usage information provided a substantial boost to what our editorial enhancements could do alone.
That was good news, but our results were just a snapshot in time. We worried about how the results would change over time. On the one hand, we assumed we could make the results even better over time as we collected more and more usage data (our original experiments were with Westlaw Classic usage data, since we didn’t yet have customer data on WestlawNext, and WestlawNext could provide even more information than Westlaw Classic, given additional interactions like folder usage). On the other hand, we wanted to make sure that new documents that were important but didn’t yet have much usage would be properly included and ranked appropriately, and we also wanted to account for potential snowball effects (“Matthew Effects”), where popular documents get to the top, and because of that, they get more usage, which makes them more likely to get to the top next time.
So we tested specifically for those issues. And when I say “we tested,” I mean our team of PhD research scientists with specialization in AI, natural language processing, data mining, and machine learning, and our team of attorneys, painstakingly analyzed thousands of results through many iterations.
What we found were that the problems were mostly accounted for by our editorial enhancements, but there were definitely some remaining issues regarding current documents and the snowball effects of popularity. To deal with those, we reduced the weight of the customer usage in the algorithms, and we increased the threshold for when the usage patterns would be considered. In other words, to impact the algorithm, substantial numbers of other users would have needed to have provided strong relevance indications in their usage patterns, and even when they did, they couldn’t override what our editorial enhancements were telling us, they could just add some documents to the result or improve the rank of documents slightly.
Beyond that, we make special accommodations for date and other factors. I can’t reveal precisely how we do that, but for an example of how researchers at Carnegie Mellon have dealt with this common issue, see here.
For those worried about bad researchers, we worried about them, too, and we accounted for them in two ways. First, we tested the usage pattern impact with and without law student usage. Turns out the algorithms performed better without the law student usage. This made sense. I wouldn’t say that all law students are bad researchers, but as a group, they’re still learning. In addition, there appears to be a class assignment effect that can skew results.
Next, we narrowed our focus to very strong indications of relevance from substantial numbers of users. By strong indications of relevance, I’m talking about things like printing and storing document snippets in folders. These are way stronger indications of relevance than things like clicking on items in a search result, which are generally much less significant. Researchers click on irrelevant items all the time. It’s relatively rare, though, that they put an irrelevant document in a research folder or print an irrelevant document individually (as opposed to batch printing).
When substantial numbers of users give us good indications of relevance after running similar queries, it matters. Good researchers tend to find the same good documents for their similar research issues, but bad researchers tend to find different bad documents. So, in the usage patterns for similar queries, the bad researcher usage patterns tend to fall to the long tail of the power law distribution and push those bad results down, while the good researcher patterns congregate in the head – but to have a tall head, we need large numbers of similar indications of relevance.
A final concern I hear about this is:
I don’t want the other side in this litigation to be finding the right documents because I found the best stuff and then printed it.
We anticipated this during algorithm development as well, and it’s dealt with by our thresholds, where large numbers of other users needed to have found the same things. If a hundred other researchers found what you did, then it’s likely the other side will too. We’re just helping them find it a little faster, and probably not because you found it, but because a hundred others did.
To summarize, we make sure researchers don’t just get popular documents by placing substantially more weight on our editorial enhancements and text similarity than on customer usage patterns. We also account for date and other factors in manners similar to what other research scientists have done in this well-developed field.
And we make sure bad researchers don’t skew results by removing law student usage and requiring substantial numbers of similarities in the usage patterns.
That said, usage patterns are an important and useful part of our system. Collectively, they represent the importance of a document from the perspective of others. In a similar fashion, the set of all user queries that resulted in meaningful interactions with a document, represent the meaning of that document to those legal researchers – but this meaning is written in query language, which is often different from the precise language of the documents. A system that relies too heavily on citation networks and usage patterns is going to produce poor results. State of the art search engines, including WestSearch, adopt a much more holistic approach to search. Google is much more than PageRank, and WestSearch is much more than citation networks and usage behavior.
One last point to keep in mind about this: The WestSearch algorithm improvements don’t result in perfection – they’re just dramatically better than what existing solutions can offer. If you run enough searches in any modern search engine, you can always find examples of things that should have been ranked differently – even with Google. Finding examples like that does not prove that a search engine sucks. What you want to do is test a large sample against alternatives to figure out which is better. For WestSearch, we literally tested hundreds of different research issues with teams of different researchers, and we cross-checked for quality issues. For individuals, we’re not suggesting you do the same – we’re just suggesting you try it for a couple weeks. You’ll see. For scholars doing a serious academic review, we do suggest large samples, though we understand this can be costly and time-consuming.
In either case, the relevant question for the researcher is NOT, “are there any examples where the results of a search in WestlawNext are not as stellar as they could be?” The relevant question is, “on average, am I going to get to answers faster and/or develop a better understanding of the law if I use WestlawNext?”
Misconception 2: Westlaw Classic for Boolean and WestlawNext for Natural Language?
In my visits with customers, I’ve heard some say that they use Westlaw Classic for Boolean Terms & Connectors searching and WestlawNext for Natural Language searching. It makes me groan every time.
When we built WestlawNext, we knew that if we left out features that customers relied on in Westlaw Classic, it’d be a non-starter. We also believe that there are certain research issues for which a Boolean search makes the most sense – sometimes you need to be very precise – and there are other times when a plain language search makes the most sense, because you want to pull in documents with alternate terminology. Because of that, WestlawNext fully supports all of the Boolean Terms and Connectors capabilities of Westlaw Classic – and we’ve even added some new power searcher capabilities.
Compared with Westlaw Classic searching the biggest search experience improvement in WestlawNext is with plain language searches, but even if researchers were to conduct 100% of their searches in WestlawNext as Boolean Terms & Connectors searches, they would still get the following substantial benefits over Westlaw Classic:
- WestSearch algorithm ranking
- Additional ranking options (date, most cited, most used)
- New list indicators for what’s been viewed before for the same project
- New filtering options (like reported/unreported, viewed for the matter, court, etc.)
- No need to choose a database for searching
- Suggested primary law documents that use similar terminology
- Ability to put to documents in folders, add notes, highlighting, and share with clients, co-counsel, outside counsel, etc.
- New case summaries in search results
- New KeyCite information on case displays
So, for anyone who still has a huge preference for Boolean searches, or who simply agrees with us that there are some research issues that require the precision of Boolean Terms & Connectors searching, you can save a lot of time by doing those searches on WestlawNext, and I’d urge you to consider trying some plain language searching. We’ve found that even experts in a practice area can’t always anticipate all of the language variety that document authors can employ to express similar concepts, and the precision of Boolean Terms & Connectors searching can sometimes hurt more than help.
[Part 2 of the interview can be found here.]