sematic search for enterprise systems – interview with Daniel Tunkelang of Endeca
FCM: What are some of the areas, in your view, that need improvement in enterprise search?
DT: Many people have raised the prospect of social search in the enterprise–specifically, the idea that people will tag content within the enterprise and benefit from each other’s tagging. The reality of social search, however, has not lived up to the vision.
In order for social search to succeed, enterprise workers need to supply their proprietary knowledge in a process that is not only as painless as possible, but demonstrates the return on investment. We believe that our work at Endeca, on bootstrapping knowledge bases, can help bring about effective social search in the enterprise.
The other major area that comes to mind is federation. As much as an enterprise may value its internal content, much of the content that its workers need resides outside the enterprise. An effective enterprise search tool needs to facilitate users’ access to all of these content sources while preserving value and context of each.
FCM: What impact will semantic search have on Enterprise search and what are you exploring in that area?
DT: Semantic search means different things to different people, but broadly falls into two categories: Using linguistic and statistical approaches to derive meaning from unstructured text, using semantic web approaches to represent meaning in content and query structure. Endeca embraces both of these aspects of semantic search.
From early on, we have developed an extensible framework for enriching content through linguistic and statistical information extraction. We have developed some groundbreaking tools ourselves, but have achieved even better results by combining other vendor’s document analysis tools with our unique ability to improve their results through corpus analysis.
The growing prevalence of structured data (e.g., RDF) with well-formed ontologies (e.g., OWL) is very valuable to Endeca, since our flexible data model is ideal for incorporating heterogeneous, semi-structured content. We have done this in major applications for the financial industry, media/publishing, and the federal government.
It is also important that semantic search is not just about the data. In the popular conception of semantic search, the computer is wholly responsible derives meaning from the unstructured input. Endeca’s philosophy, as per the HCIR vision, is that humans determine meaning, and that our job is to give them clues using all of the structure we can provide.”
note the last paragraph. humans determine the meaning, the engine then gives them clues – how does that work and how is it different to what is done today?