ILIAS  Release_4_0_x_branch Revision 61816
 All Data Structures Namespaces Files Functions Variables Groups Pages
HTMLPurifier_Lexer_PH5P Class Reference

Experimental HTML5-based parser using Jeroen van der Meer's PH5P library. More...

+ Inheritance diagram for HTMLPurifier_Lexer_PH5P:
+ Collaboration diagram for HTMLPurifier_Lexer_PH5P:

Public Member Functions

 tokenizeHTML ($html, $config, $context)
 Lexes an HTML string into tokens.
- Public Member Functions inherited from HTMLPurifier_Lexer_DOMLex
 __construct ()
 muteErrorHandler ($errno, $errstr)
 An error handler that mutes all errors.
 callbackUndoCommentSubst ($matches)
 Callback function for undoing escaping of stray angled brackets in comments.
 callbackArmorCommentEntities ($matches)
 Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them.
- Public Member Functions inherited from HTMLPurifier_Lexer
 parseData ($string)
 Parses special entities into the proper characters.
 normalize ($html, $config, $context)
 Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
 extractBody ($html)
 Takes a string of HTML (fragment or document) and returns the content.

Additional Inherited Members

- Static Public Member Functions inherited from HTMLPurifier_Lexer
static create ($config)
 Retrieves or sets the default Lexer as a Prototype Factory.
- Data Fields inherited from HTMLPurifier_Lexer
 $tracksLineNumbers = false
 Whether or not this lexer implements line-number/column-number tracking.
- Protected Member Functions inherited from HTMLPurifier_Lexer_DOMLex
 tokenizeDOM ($node, &$tokens, $collect=false)
 Recursive function that tokenizes a node, putting it into an accumulator.
 transformAttrToAssoc ($node_map)
 Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
 wrapHTML ($html, $config, $context)
 Wraps an HTML fragment in the necessary HTML.
- Static Protected Member Functions inherited from HTMLPurifier_Lexer
static escapeCDATA ($string)
 Translates CDATA sections into regular sections (through escaping).
static escapeCommentedCDATA ($string)
 Special CDATA case that is especially convoluted for <script>
static CDATACallback ($matches)
 Callback function for escapeCDATA() that does the work.
- Protected Attributes inherited from HTMLPurifier_Lexer
 $_special_entity2str
 Most common entity to raw value conversion table for special entities.

Detailed Description

Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.

Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.

Note
Recent changes to PHP's DOM extension have resulted in some fatal error conditions with the original version of PH5P. Pending changes, this lexer will punt to DirectLex if DOM throughs an exception.

Definition at line 13 of file PH5P.php.

Member Function Documentation

HTMLPurifier_Lexer_PH5P::tokenizeHTML (   $string,
  $config,
  $context 
)

Lexes an HTML string into tokens.

Parameters
$stringString HTML.
Returns
HTMLPurifier_Token array representation of HTML.

Reimplemented from HTMLPurifier_Lexer_DOMLex.

Definition at line 15 of file PH5P.php.

References $config, HTMLPurifier_Lexer\normalize(), HTMLPurifier_Lexer_DOMLex\tokenizeDOM(), and HTMLPurifier_Lexer_DOMLex\wrapHTML().

{
$new_html = $this->normalize($html, $config, $context);
$new_html = $this->wrapHTML($new_html, $config, $context);
try {
$parser = new HTML5($new_html);
$doc = $parser->save();
} catch (DOMException $e) {
// Uh oh, it failed. Punt to DirectLex.
$context->register('PH5PError', $e); // save the error, so we can detect it
return $lexer->tokenizeHTML($html, $config, $context); // use original HTML
}
$tokens = array();
$this->tokenizeDOM(
$doc->getElementsByTagName('html')->item(0)-> // <html>
getElementsByTagName('body')->item(0)-> // <body>
getElementsByTagName('div')->item(0) // <div>
, $tokens);
return $tokens;
}

+ Here is the call graph for this function:


The documentation for this class was generated from the following file: