Table of Contents

Class Reader

Namespace
SmartReader
Assembly
SmartReader.dll

The main Reader class

public class Reader
Inheritance
Object
Reader
Inherited Members
Object.Equals(Object)
Object.Equals(Object, Object)
Object.GetHashCode()
Object.GetType()
Object.MemberwiseClone()
Object.ReferenceEquals(Object, Object)
Object.ToString()

Remarks

This code is based on a port of the readability library of Firefox Reader View available at: https://github.com/mozilla/readability. Which is heavily based on Arc90's readability.js (1.7f.1) script available at: http://code.google.com/p/arc90labs-readability

Constructors

Reader(String)

Reads content from the given URI.

public Reader(string uri)

Parameters

uri String

A string representing the URI from which to extract the content.

Reader(String, Stream)

Reads content from the given stream. It needs the uri to make some checks.

public Reader(string uri, Stream source)

Parameters

uri String

A string representing the original URI of the article.

source Stream

A stream from which to extract the article.

Reader(String, String)

Reads content from the given text. It needs the uri to make some checks.

public Reader(string uri, string text)

Parameters

uri String

A string representing the original URI of the article.

text String

A string from which to extract the article.

Fields

TagsToScore

Element tags to score by default.

public string[] TagsToScore

Field Value

String[]

Default: false

Properties

CharThreshold

The default number of characters an article must have in order to return a result

public int CharThreshold { get; set; }

Property Value

Int32

Default: 500

ClassesToPreserve

The classes that must be preserved

public string[] ClassesToPreserve { get; set; }

Property Value

String[]

Default: "page"

ContinueIfNotReadable

The library tries to determine if it will find an article before actually trying to do it. This option decides whether to continue if the library heuristics fails. This value is ignored if Debug is set to true

public bool ContinueIfNotReadable { get; set; }

Property Value

Boolean

Default: true

Debug

Set the Debug option and write the data with logger

public bool Debug { get; set; }

Property Value

Boolean

Default: false

DisableJSONLD

The library look first at JSON-LD to determine metadata. This setting gives you the option of disabling it

public bool DisableJSONLD { get; set; }

Property Value

Boolean

Default: false

KeepClasses

Whether to preserve classes

public bool KeepClasses { get; set; }

Property Value

Boolean

Default: false

LoggerDelegate

The action that will log any message

public Action<string> LoggerDelegate { get; set; }

Property Value

Action<String>

Default: empty action

Logging

Set the amount of information written to the logger

public ReportLevel Logging { get; set; }

Property Value

ReportLevel

Default: ReportLevel.Issue

MaxElemsToParse

Max number of nodes supported by this parser

public int MaxElemsToParse { get; set; }

Property Value

Int32

Default: 0 (no limit)

NTopCandidates

The number of top candidates to consider when analysing how tight the competition is among candidates

public int NTopCandidates { get; set; }

Property Value

Int32

Default: 5

Methods

AddCustomOperationEnd(Action<IElement>)

Add a custom operation to be performed after the article is parsed

public Reader AddCustomOperationEnd(Action<IElement> operation)

Parameters

operation Action<IElement>

The operation that will receive the final article

Returns

Reader

AddCustomOperationStart(Action<IElement>)

Add a custom operation to be performed before the article is parsed

public Reader AddCustomOperationStart(Action<IElement> operation)

Parameters

operation Action<IElement>

The operation that will receive the HTML content before any operation

Returns

Reader

AddOptionToRegularExpression(RegularExpressions, String)

Allow to add an option to the default regular expressions

public void AddOptionToRegularExpression(RegularExpressions expression, string option)

Parameters

expression RegularExpressions

A RegularExpression indicating the expression to change

option String

A string representing the new option

GetArticle()

Read and parse the article from the given URI.

public Article GetArticle()

Returns

Article

An Article object with all the data extracted

GetArticleAsync()

Read and parse the article asynchronously from the given URI.

public Task<Article> GetArticleAsync()

Returns

Task<Article>

An async Task Article object with all the data extracted

ParseArticle(String, Stream)

Read and parse the article from the given stream. It needs the uri to make some checks.

public static Article ParseArticle(string uri, Stream source)

Parameters

uri String

A string representing the original URI of the article.

source Stream

A stream from which to extract the article.

Returns

Article

An article object with all the data extracted

ParseArticle(String, String)

Read and parse the article from the given URI.

public static Article ParseArticle(string uri, string userAgent = null)

Parameters

uri String

A string representing the original URI to extract the content from.

userAgent String

A string representing a custom user agent.

Returns

Article

An Article object with all the data extracted

ParseArticle(String, String, String)

Read and parse the article from the given text. It needs the uri to make some checks.

public static Article ParseArticle(string uri, string text, string userAgent = null)

Parameters

uri String

A string representing the original URI of the article.

text String

A string from which to extract the article.

userAgent String

A string representing a custom user agent.

Returns

Article

An article object with all the data extracted

ParseArticleAsync(String, String)

Read and parse asynchronously the article from the given URI.

public static Task<Article> ParseArticleAsync(string uri, string userAgent = null)

Parameters

uri String

A string representing the original URI to extract the content from.

userAgent String

A string representing a custom user agent.

Returns

Task<Article>

An async Task Article object with all the data extracted

RemoveAllCustomOperations()

Remove all custom operations

public Reader RemoveAllCustomOperations()

Returns

Reader

RemoveAllCustomOperationsEnd()

Remove all custom operation to be performed after the article is parsed

public Reader RemoveAllCustomOperationsEnd()

Returns

Reader

RemoveAllCustomOperationsStart()

Remove all custom operation to be performed before the article is parsed

public Reader RemoveAllCustomOperationsStart()

Returns

Reader

RemoveCustomOperationEnd(Action<IElement>)

Remove a custom operation to be performed after the article is parsed

public Reader RemoveCustomOperationEnd(Action<IElement> operation)

Parameters

operation Action<IElement>

The operation to remove

Returns

Reader

RemoveCustomOperationStart(Action<IElement>)

Remove a custom operation to be performed before the article is parsed

public Reader RemoveCustomOperationStart(Action<IElement> operation)

Parameters

operation Action<IElement>

The operation to remove

Returns

Reader

ReplaceRegularExpression(RegularExpressions, String)

Allow to replace the default regular expressions

public void ReplaceRegularExpression(RegularExpressions expression, string newExpression)

Parameters

expression RegularExpressions

A RegularExpression indicating the expression to change

newExpression String

A string representing the new option

SetBaseHttpClientHandler(HttpMessageHandler)

Allow to set a custom HttpClient

public static void SetBaseHttpClientHandler(HttpMessageHandler clientHandler)

Parameters

clientHandler HttpMessageHandler

The new HttpClientHandler for all web requests made by this library

SetCustomUserAgent(String)

Allow to set an user agent

public Reader SetCustomUserAgent(string userAgent)

Parameters

userAgent String

A string indicating the User Agent used for web requests made by this library

Returns

Reader