| |
WWWGrab is a configurable and highly flexible pattern recognition / extraction / manipulation utility (parser) for web pages, emails and files. It allows the user to define patterns, combine them in any sequence, and perform actions when the patterns are recognized. It accepts input from web pages (via HTTP), stored emails (via MAPI) and/or files. It can generate database tables (via ODBC) and/or files. It can be configured to parse web pages, emails (header fields, text bodies) or perform transformations on files.
WWWGrab uses Set Machine, which can perform a wide variety of tasks because its design recognizes that many transformation tasks (parsing/extraction/conversion/searches etc.) involve the same basic repetitive process: - recognition of patterns in the input, - transition to another 'state' based on recognition of the next pattern in the input.
Internally, WWWGrab/Set Machine is very general and abstract. The user defines the details of the transformation task. As a result, WWWGrab/Set Machine is very flexible, (but can be challenging!).
Features: * Recursive capabilities (enabling parsing of nested HTML/XML tags, comments, etc.) * Wide-string (Unicode) input / output capability * Stored email (MAPI) interface * ODBC interface making database layout info (table and field names) available to the configuration developer * ODBC interface allowing generation of arbitrary SQL statements built with a combination of user-defined data and parsed data * User-defined function interface allowing execution of custom DLL code
WWWGrab/Set Machine can be configured to : * Parse web pages / HTML * Parse emails * Search for (and replace) text * Repair data * Generate C/C++ code, HTML, XML, and other formats from various sources (emails, C/C++ code, HTML, XML, etc.) * Parse C/C++ source code * Generate and execute SQL * Count words/keywords * Count lines * Swap bytes
WWWGrab/Set Machine can be configured to perform a wide variety of tasks. |
|