IN-DEPTH: Alta Plana’s Seth Grimes on how text analytics is expected to shape up in 2010
No single solution provider dominates text analytics. According to Seth Grimes, president, Alta Plana, no single provider dominates any significant text-analytics market segment.
“This is good news for current and prospective users,” recently wrote Grimes, industry expert and Conference Chair at the upcoming 6th Annual Text Analytics Summit.
In order to know more about the latest trends and issues, Text Analytics News’ Ritesh Gupta recently spoke to Grimes. Excerpts:
Publishers, media portals, social-network and forum sites: they all realise that intelligent content tagging and conceptual search and semantic integration -- capabilities supported by text analytics and related semantic technologies -- are key to information findability, to a rich and satisfying user experience. Last year, you told me the use of these technologies is on a fast track, a major growth area for text analytics and semantics. How do you assess the situation as of today?
Seth Grimes: There’s strong uptake on the publishing side, where organisations seek to make their information more findable and usable (and profitable), and even stronger uptake on the consumer side, where organisations analyse and integrate content-extracted information for a gamut of business needs.
One of the most interesting developments on the publishing side is the emergence of a wide range of APIs, application programming interfaces, that allow functions such as tagging, topic classification, and content enrichment (with semantically associated information) to be included in publishing processes.
And on the information-consumption side, yes, there’s semantic search and also semantically supported content integration that allows real-time information aggregation, essentially information aggregation and analysis dashboards that range from “listening platforms” to interfaces for BI-style analyses of text-sourced data.
Can you provide an insight into text analytics technologies such as categorisation, clustering and named entity extraction in the context of eDiscovery? What new developments have you witnessed in this arena?
Seth Grimes: E-discovery is going strong; after all, given legal processes and compliance mandates, corporations have a lot at stake. Fortunately, requirements are pretty well understood and there are some great solutions now.
The most interesting developments I’ve seen are on the edges of e-discovery, in early case assessment and for what are essentially investigative functions. Call these areas “making the case”. They are areas where text analytics can really shine, in applying information extraction and data mining technologies to discover actionable or litigation-relevant information in corporate electronic stored information. In other contexts, these functions would be called “knowledge discovery”. In the legal context, they’re smart lawyering.
Recently, Clarabridge stated that it saw its customers integrating multiple data sources as they deployed to multiple user communities across their organisations. As this trend continues the application infrastructure must support enterprise security, authentication, and shielding of personally identifiable information (as needed) to ensure it meets corporate data standards. What do you make of enterprise adoption of text analytics?
Seth Grimes: The need to extend corporate security and confidentiality practices to text-mining sources and outputs is definitely under appreciated. It’s a very good and timely question and without any systematic or standard answer: approaches need to be formalised. None of this is holding back enterprise adoption, however, nor should it.
Coming from another angle -- Information that surfaces in text analysis systems is typically "open source" (in the sense of freely available) or the product of routine interactions with stakeholders. You do have potential for controversy when a company mines open sources such as Twitter microblogs or TripAdvisor reviews, especially when mined information is then linked across sources to create an aggregate profile of individuals that might unite and reveal personal information not present in any single source.
There is increasing evidence that enterprises recognise the differences between web search and enterprise search, although it is still an uphill battle. The industry has also seen the success of tools that emphasise exploratory search as a core value proposition. But largely it is felt that there’s a long way to go. How do you assess the situation?
Seth Grimes: I have to differ with you. The difference between Web search and enterprise search, conventionally defined, is well understood. I added "conventionally defined" because the majority of searching within enterprise and of enterprise information is carried out on or via the Web with Web search tools although few industry figures, other than myself, would label this "enterprise search". Perhaps the boundary that others see will erode as the (valid) perception emerges that the balance of enterprise-relevant information is "outside the firewall."
Search as a whole, however, on the Web and within the enterprise, has morphed into "information access," providing a direct connection to relevant information and not just a list of documents containing key terms. These capabilities have crept into the leading search engines without a lot of fanfare. Submit "Sylvia's New York" to Google and you'll see results that reflect recognition that you're likely searching for information about a restaurant.
Exploratory interfaces -- faceted navigation, results clustering, topic maps and other visualizations -- are becoming much more common and will, within a few years, become part of the mainstream search experience.
One interesting development is that the increasing participation in social media--particularly LinkedIn and Twitter--is facilitating conversations about enterprise search that bring together people from enterprises, vendors, and even academia. Do you see these public conversations as great venues for catalysing education, substantive debate, and hopefully productive output?
Seth Grimes: The Web is a huge commons for a myriad of public conversations about all things technical, societal, and personal so the answer is, Of Course! Conversations are great, but in the end, to be productive there have to be individual initiatives, so let's look at very specific examples that enable the conversation by opening up the particular services named --
LinkedIn and Twitter have each participated in and prompted search innovations. LinkedIn, for instance, several years ago embraced micro-formats for content tagging, which facilitate data portability and searchability. And Twitter, with its published and freely usable API, which supports both open access to tweets and a form of real-time search, has enabled the incorporation and analysis of microblogging content for a very broad set of business initiatives that include brand and reputation management, customer support, and social marketing.
Last year you told me: Automating sentiment analysis is a challenging problem, but hard work done over the last few years in academia and industry is paying off in the form of tools that can actually make good sense -- at the entity, concept, and topic level -- of attitudinal information. What sort of progress have you witnessed in this arena?
Seth Grimes: The greatest progress has been in market awareness, not a technical point, and also in the surfacing of sentiment capabilities in a broad variety of tools. Market education, by the way, remains one of the greatest challenges, in particular, helping current and prospective sentiment-analysis users understand that monitoring isn't measurement, that measurement isn't analysis, and that you need to be doing analysis, optimally in conjunction with traditionally analytics, to get the greatest business value out of attitudinal information.
But that kind of education is what the Text Analytics Summit and similar events are all about.
Companies intend to incorporate the social media into the same analytical models as they use for their internal data and, more important, plug that social media into business processes. You also mentioned that there's been a lot of hype related to the application of text analytics to social media. What are your expectations in this arena?
Seth Grimes: Because analytical information that can be harvested from social-media is quite different from longer-established BI and data mining sources -- I'm talking about transactional and operational databases -- most organisations analyse information from the two types of source with separate systems. That's perfectly appropriate, although in the longer term, they'll want to integrate these systems and their analyses and to expand their analytical models -- for both operational and predictive purposes -- to encompass information from the broad variety of sources.
This integration is happening right now, although not systematically, for intelligence, law enforcement, customer experience, financial services, and other applications. There are some fascinating examples out there, for instance, Thomson Reuters NewsScope sentiment engine, which supports use of media-extracted sentiment in automated algorithmic trading applications. NewsScope is only one of a spectrum of examples across business domains.
6th Annual Text Analytics Summit
6th Annual Text Analytics Summit is scheduled to take place in Boston (May 25-26). For more information, click here:
http://www.textanalyticsnews.com/text-mining-conference/view-the-agenda.shtml
or
contact:
Ben Satchwell
Text Analytics News
T: +44 (0) 207 375 7163
Examining the Big Questions facing the Text Analytics Industry
Want to Comment?
If you want to join the debate, then please join the Text Analytics community.
Please LOGIN or CREATE A NEW ACCOUNT ... and get commenting!
Upcoming Events
6th Annual Text Analytics Summit






Recent Comments
Start the Conversation!