Web Mining Framework
Ontotext’s Ontotext Web Mining Framework (WMF) is a comprehensive platform for building web search and intelligence-gathering applications that need to crawl and understand the web.
Get the Data:
- Crawl full web pages, collecting and processing HTML, XML, RDF and other formats
- Crawl specific sections or selected information from web pages (focused crawling)
- Screen scraping of structured online data with high precision (e.g. descriptions of products, services or events)
Process and Store the Data:
- Text mining and normalization of page content
- Data extraction, transformation, merging and de-duplication
- Natural-Language processing using KIM for semantic annotation and search
- Extensions to Ontotext’s OWLIM triple store for storage and inference of structured data
WMF is an Enterprise-class data-collection platform with advanced data-handling features:
- Optimized to support large volumes of data
- Configurable and extendable workflow management
- Performs independent, continuous data collection 24/7
- Provides configurable post-processing options for data normalization and integration
- Offers a good balance of options for domain-specific and broad topic tasks
- Distributed processing with advanced task schedule management
- Detailed monitoring and reporting of performance statistics
- Coverage of the whole life cycle for building, executing, monitoring and maintenance of web mining components