Skip to content
jdevelop edited this page Dec 3, 2012 · 3 revisions

v0

  • proces HTML pages, extract links from tags attributes:
    • src [x]
    • href [x]
  • provide storage for holding links:
    • queued
    • processed
    • in process
    • in-memory implementation [x]
    • bdb je implementation [!]
    • mongodb implementation [!]
  • spawn several threads and control link processing. Use actors. [x]
  • provide configurable filters to include or exclude links
  • provide content storage options
    • filesystem [!]
    • JDBC-based

Clone this wiki locally