A .NET Standard library to extract the main content of a web page.
Remove the clutter
SmartReader gives you a clean article without ads, sidebars, etc. Available both as HTML and lightly formatted text.
SmartReader can (usually) find all the metadata you need: author, publication date, site name, language, the excerpt of the article, the featured image, a list of images found (it can optionally also download them and store as data URI), an estimate of the time needed to read the article.
Well tested algorithm
The core algorithm is a port of the Readability library, used in Firefox by millions of people.