You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 22, 2020. It is now read-only.
between the Clinton emails and the Podesta leak, it seems to me that many document sets include a ton of copy-pasted news articles. By themselves, these are really boring and can obscure more interesting stuff. It'd be neat to classify/rank documents by whether they're mostly boilerplate (signatures, disclaimers) and news articles and therefore boring.