Web Mining Framework

Ontotext’s Ontotext Web Mining Framework (WMF) is a comprehensive platform for building web search and intelligence-gathering applications that need to crawl and understand the web.
Get the Data:

  • Crawl full web pages, collecting and processing HTML, XML, RDF and other formats
  • Crawl specific sections or selected information from web pages (focused crawling)
  • Screen scraping of structured online data with high precision (e.g. descriptions of products, services or events)

Process and Store the Data:

  • Text mining and normalization of page content
  • Data extraction, transformation, merging and de-duplication
  • Natural-Language processing using KIM for semantic annotation and search
  • Extensions to Ontotext’s OWLIM triple store for storage and inference of structured data

WMF is an Enterprise-class data-collection platform with advanced data-handling features:

  • Optimized to support large volumes of data
  • Configurable and extendable workflow management
  • Performs independent, continuous data collection 24/7
  • Provides configurable post-processing options for data normalization and integration
  • Offers a good balance of options for domain-specific and broad topic tasks
  • Distributed processing with advanced task schedule management
  • Detailed monitoring and reporting of performance statistics
  • Coverage of the whole life cycle for building, executing, monitoring and maintenance of web mining components