Class Reader
- Namespace
- SmartReader
- Assembly
- SmartReader.dll
The main Reader class
public class Reader
- Inheritance
-
ObjectReader
- Inherited Members
-
Object.Equals(Object)Object.Equals(Object, Object)Object.GetHashCode()Object.GetType()Object.MemberwiseClone()Object.ReferenceEquals(Object, Object)Object.ToString()
Remarks
This code is based on a port of the readability library of Firefox Reader View available at: https://github.com/mozilla/readability. Which is heavily based on Arc90's readability.js (1.7f.1) script available at: http://code.google.com/p/arc90labs-readability
Constructors
Reader(String)
Reads content from the given URI.
public Reader(string uri)
Parameters
uri
StringA string representing the URI from which to extract the content.
Reader(String, Stream)
Reads content from the given stream. It needs the uri to make some checks.
public Reader(string uri, Stream source)
Parameters
uri
StringA string representing the original URI of the article.
source
StreamA stream from which to extract the article.
Reader(String, String)
Reads content from the given text. It needs the uri to make some checks.
public Reader(string uri, string text)
Parameters
uri
StringA string representing the original URI of the article.
text
StringA string from which to extract the article.
Fields
TagsToScore
Element tags to score by default.
public string[] TagsToScore
Field Value
- String[]
Default: false
Properties
CharThreshold
The default number of characters an article must have in order to return a result
public int CharThreshold { get; set; }
Property Value
- Int32
Default: 500
ClassesToPreserve
The classes that must be preserved
public string[] ClassesToPreserve { get; set; }
Property Value
- String[]
Default: "page"
ContinueIfNotReadable
The library tries to determine if it will find an article before actually trying to do it. This option decides whether to continue if the library heuristics fails. This value is ignored if Debug is set to true
public bool ContinueIfNotReadable { get; set; }
Property Value
- Boolean
Default: true
Debug
Set the Debug option and write the data with logger
public bool Debug { get; set; }
Property Value
- Boolean
Default: false
DisableJSONLD
The library look first at JSON-LD to determine metadata. This setting gives you the option of disabling it
public bool DisableJSONLD { get; set; }
Property Value
- Boolean
Default: false
KeepClasses
Whether to preserve classes
public bool KeepClasses { get; set; }
Property Value
- Boolean
Default: false
LoggerDelegate
The action that will log any message
public Action<string> LoggerDelegate { get; set; }
Property Value
- Action<String>
Default: empty action
Logging
Set the amount of information written to the logger
public ReportLevel Logging { get; set; }
Property Value
- ReportLevel
Default: ReportLevel.Issue
MaxElemsToParse
Max number of nodes supported by this parser
public int MaxElemsToParse { get; set; }
Property Value
- Int32
Default: 0 (no limit)
NTopCandidates
The number of top candidates to consider when analysing how tight the competition is among candidates
public int NTopCandidates { get; set; }
Property Value
- Int32
Default: 5
Methods
AddCustomOperationEnd(Action<IElement>)
Add a custom operation to be performed after the article is parsed
public Reader AddCustomOperationEnd(Action<IElement> operation)
Parameters
operation
Action<IElement>The operation that will receive the final article
Returns
AddCustomOperationStart(Action<IElement>)
Add a custom operation to be performed before the article is parsed
public Reader AddCustomOperationStart(Action<IElement> operation)
Parameters
operation
Action<IElement>The operation that will receive the HTML content before any operation
Returns
AddOptionToRegularExpression(RegularExpressions, String)
Allow to add an option to the default regular expressions
public void AddOptionToRegularExpression(RegularExpressions expression, string option)
Parameters
expression
RegularExpressionsA RegularExpression indicating the expression to change
option
StringA string representing the new option
GetArticle()
Read and parse the article from the given URI.
public Article GetArticle()
Returns
- Article
An Article object with all the data extracted
GetArticleAsync()
Read and parse the article asynchronously from the given URI.
public Task<Article> GetArticleAsync()
Returns
- Task<Article>
An async Task Article object with all the data extracted
ParseArticle(String, Stream)
Read and parse the article from the given stream. It needs the uri to make some checks.
public static Article ParseArticle(string uri, Stream source)
Parameters
uri
StringA string representing the original URI of the article.
source
StreamA stream from which to extract the article.
Returns
- Article
An article object with all the data extracted
ParseArticle(String, String)
Read and parse the article from the given URI.
public static Article ParseArticle(string uri, string userAgent = null)
Parameters
uri
StringA string representing the original URI to extract the content from.
userAgent
StringA string representing a custom user agent.
Returns
- Article
An Article object with all the data extracted
ParseArticle(String, String, String)
Read and parse the article from the given text. It needs the uri to make some checks.
public static Article ParseArticle(string uri, string text, string userAgent = null)
Parameters
uri
StringA string representing the original URI of the article.
text
StringA string from which to extract the article.
userAgent
StringA string representing a custom user agent.
Returns
- Article
An article object with all the data extracted
ParseArticleAsync(String, String)
Read and parse asynchronously the article from the given URI.
public static Task<Article> ParseArticleAsync(string uri, string userAgent = null)
Parameters
uri
StringA string representing the original URI to extract the content from.
userAgent
StringA string representing a custom user agent.
Returns
- Task<Article>
An async Task Article object with all the data extracted
RemoveAllCustomOperations()
Remove all custom operations
public Reader RemoveAllCustomOperations()
Returns
RemoveAllCustomOperationsEnd()
Remove all custom operation to be performed after the article is parsed
public Reader RemoveAllCustomOperationsEnd()
Returns
RemoveAllCustomOperationsStart()
Remove all custom operation to be performed before the article is parsed
public Reader RemoveAllCustomOperationsStart()
Returns
RemoveCustomOperationEnd(Action<IElement>)
Remove a custom operation to be performed after the article is parsed
public Reader RemoveCustomOperationEnd(Action<IElement> operation)
Parameters
operation
Action<IElement>The operation to remove
Returns
RemoveCustomOperationStart(Action<IElement>)
Remove a custom operation to be performed before the article is parsed
public Reader RemoveCustomOperationStart(Action<IElement> operation)
Parameters
operation
Action<IElement>The operation to remove
Returns
ReplaceRegularExpression(RegularExpressions, String)
Allow to replace the default regular expressions
public void ReplaceRegularExpression(RegularExpressions expression, string newExpression)
Parameters
expression
RegularExpressionsA RegularExpression indicating the expression to change
newExpression
StringA string representing the new option
SetBaseHttpClientHandler(HttpMessageHandler)
Allow to set a custom HttpClient
public static void SetBaseHttpClientHandler(HttpMessageHandler clientHandler)
Parameters
clientHandler
HttpMessageHandlerThe new HttpClientHandler for all web requests made by this library
SetCustomUserAgent(String)
Allow to set an user agent
public Reader SetCustomUserAgent(string userAgent)
Parameters
userAgent
StringA string indicating the User Agent used for web requests made by this library