Table of Contents

Changelog

All notable changes to this project will be documented in this file.

0.10.0 - 2025/02/02

  • Added fixes from latest updates of Readability up until January 2025
  • Improving checking and assignment of byline
  • Improve parsing of JSON-LD element
  • Keep OL and UL tags in lists
  • Add support for automatic language identification

0.9.6 - 2024/10/09

  • Added fixes from latest updates of Readability up until August 2024
  • Allow option to modify link density value
  • Small performance improvements
  • Fix issue #64, by reducing value of comments containers in different languages (Dutch, Spanish, French) (thanks to PeterHagen)
  • Add support for finding articles in alternative languages (thanks to Andrea Bondanini)
  • Fix vulnerability by updating dependency System.Text.Json to 8.0.5
  • Update dependency AngleSharp to 1.1.2

0.9.5 - 2024/06/02

  • Added fixes from latest updates of Readability up until May 2024
  • Fix parsing of JSON-LD element
  • Add support for Parsely metadata
  • Ensure short links of legitimate contents are preserved
  • Make sure elements are not deleted if they contain a data table
  • Fix issue #60, unexpected exception thrown for forbidden content (thanks to doggy8088 )
  • Added performance improvements suggested to conversion to plain text (thanks to malv007)

0.9.4 - 2023/08/27

  • Fix issue #58, data URIs in IMG SRC not preserved, treated as relative URL (thanks to Acidus)
  • Added fixes from latest updates of Readability up until August 2023
  • Expanded comma detection to non-Latin commas

0.9.3 - 2023/04/15

  • Fix issue #55, error when parsing certain urls for date detection (thanks to Ian Smirlis)
  • Fix issue #56, error Readability.CleanTitle() should properly escape string variable siteName with Regex.Escape() before it's applied (thanks to Ian Smirlis)

0.9.2 - 2023/02/05

  • Added fixes from latest updates of Readability up until January 2023
  • Allow lists of images to remain
  • Fix articles showing cookie information in reader mode
  • Fix bug in TextSimilarity method
  • Fix issue #53, error when parsing certain Style attributes (thanks to Ian Smirlis)
  • Fix issue #54, error when cleaning certain invalid attributes name (thanks to Ian Smirlis)
  • Add settings AncestorsDepth and ParagraphThreshold to customize algorithm

0.9.1 - 2022/10/23

0.9.0 - 2022/08/28

  • Improved recognition of visibility in style attribute (thanks to Sander Schutten)
  • Added use of suggested encoding/charset set in the response header. Added setting to force the encoding/charset, thus overcoming the AngleSharp heuristics, that could ignore the setting (thanks to marhyno)
  • Changed setting MinContentLengthReaderable from simple integer field to Dictionary with language-based keys (thanks to Ivan Icin)
  • Fixes issue #45, error when parsing articles with noscript tag in head (thanks to Ward Boumans)

0.8.1 - 2022/06/29

  • Fixes issue #41, SmartReader.UriExtensions.ToAbsoluteURI throws exception when uriToCheck = "" (Thanks to mininmaxim)
  • Parse other JSON-LD elements if the first one is not of a recognized type
  • Updated IsProbablyReaderable to also check article tags
  • Added fixes from latest updates of Readability up until June 2022
  • Fixes issue #42, Angle Sharp parsing xml attributes (Thanks to prestonkell)

0.8.0 - 2021/10/21

  • Huge thanks to Jason Nelson for big improvements in optimizing and updating the quality of the code to the latest C# best practices
  • Improved code quality (thanks to Jason Nelson)
  • Updated to support .NET Standard 2.1 (thanks to Jason Nelson)
  • Improved performance (thanks to Jason Nelson)
  • Added settings for determining whether the document contains an article, before attempting to do so
  • Improvements to header and title detection
  • Improvements to handling of link density and added support for hash links
  • Added improvements from latest updates of Readability up until April 2021
  • Updated Demo project to .NET 5
  • Removed MimeMappings dependency

0.7.5 - 2020/10/31

  • Fix bug Reader throws DivideByZeroException when articleTitle is empty (Thanks to DanielEgbers)
  • Added improvements from latest updates of Readability
  • Added functionality to unwrap images that are meant to be lazy loaded
  • Remove nodes with role complementary
  • Fix lazy-loaded images not visibile in Kinja sites
  • Added function to serialize HTML content in article
  • Added support to look up metadata in JSON-LD object
  • Improved byline parsing

0.7.4 - 2020/09/07

0.7.3 - 2020/09/05

0.7.2 - 2020/05/09

  • Improved documentation
  • Now we pass to the LoggerDelegate also the original body of source during Debug
  • Fixed visibility of internal methods
  • Moved RegularExpressions enum outside of the Reader class for consistency
  • Updated demo application
  • Updated dependencies of console example application
  • Updated AngleSharp dependency to 0.13. This should also fix issue #18

0.7.1 - 2020/03/08

  • Added Readability update to preserve children when removing javascript: links
  • Added Readability update to add exception to probably readable for Wikimedia Math images
  • Added function to download images using the data URI scheme
  • Added function to use a custom HttpClient
  • Improved extraction of text content

0.7.0 - 2019/10/29

  • Added Readability update to fix missing Wikipedia content
  • Added Readability update to remove aria-hidden nodes
  • Added Readability update of adding 'content' as an indicator of readable content
  • Applied remaining suggestion in issue #6
  • Improved organization of code
  • Merged pull-request #12 for dealing with problems when retrieving content (Thanks to LatisVlad)
  • Improved testing

0.6.3 - 2019/08/18

0.6.2 - 2019/05/25

  • Fixed issue #9
  • Added Readability update to transform lazy images
  • Added Readability update regarding share elements

0.6.1 - 2019/04/20

  • Fixed bug in dependency listing for the nuget package

0.6.0 - 2019/04/20

  • Updated AngleSharp dependency. Now the minimum version is .NETStandard 2.0 (this is because of AngleSharp.Css)
  • Added improvements from latest updates of Readability
  • Fixed bug for property recognition
  • Changed minimum time to read from 0 to 1 minute
  • Improved tests

0.5.2 - 2019/01/12

  • Added metadata for site name
  • Fixed bugs for recognition of title and author metadata
  • Added improvements from latest updates of Readability
  • Improved documentation

0.5.1 - 2018/08/27

  • Added support for custom operations before processing
  • Added fix to preserve CSS classes when removing a DIV with only one P
  • Improved testing
  • Added improvements from August updates of Readability

0.5.0 - 2018/08/13

  • Added support for custom operations (Thanks to G�bor Gergely
  • Added support to modify regular expressions used to determine what is part of the article and what is discarded (Thanks to G�bor Gergely
  • Added improvements from latest updates of Readability

0.4.0 - 2018/04/01

  • Fixed issue #7
  • Added support to attribute xml:lang for language detection (Thanks to G�bor Gergely)
  • Added new test pages for language detection
  • Added improvements from March updates of Readability

0.3.1 - 2018/03/03

  • Fixed issue #5
  • Added improvements from February updates of Readability
  • Added new test page
  • Fixed comparison bugs in readability scores

0.3.0 - 2018/02/17

  • Cleanup of the code and naming issues (Thanks to jamie-lord)
  • Improved testing
  • Added improvement from January update of Readability
  • Fixed bug for the detection of the readability of article
  • Fixed bug for the fixing of relative URIs
  • Fixed bug in elimination of certain nodes
  • Added detection of featured image and images found in the article

0.2.0 - 2018/01/15

  • Added improvements from December updates of Readability
  • Solved issue #2 (Thanks to Yasindn)
  • Breaking Changes to the API method names to improve clarity and solve issue #2. The Parse() method is now private, so if you were using it, now instead you should use the GetArticle/GetArticleAsync method. If you were using the ParseArticle method you can keep using it or choose the async version: ParseArticleAsync.
  • Merged pull-request #3 for the caching of HttpClient (Thanks to DanRigby)

0.1.3 - 2017/11/27

  • Added improvements from November updates of Readability
  • Added reading of itemprop properties for metadata extraction
  • Integrated tests from Readability

0.1.2 - 2017/10/17

  • Improved the accuracy of the calculation for reading time

0.1.1 - 2017/09/26

  • Release based on September updates of Readability.

0.1.0 - 2017/08/09

  • Initial release, based on a February release of Readability.