ILIAS  release_5-4 Revision v5.4.26-12-gabc799a52e6
HTMLPurifier_Lexer_PH5P Class Reference

Experimental HTML5-based parser using Jeroen van der Meer's PH5P library. More...

+ Inheritance diagram for HTMLPurifier_Lexer_PH5P:
+ Collaboration diagram for HTMLPurifier_Lexer_PH5P:

Public Member Functions

 tokenizeHTML ($html, $config, $context)
 
- Public Member Functions inherited from HTMLPurifier_Lexer_DOMLex
 __construct ()
 
 tokenizeHTML ($html, $config, $context)
 
 muteErrorHandler ($errno, $errstr)
 An error handler that mutes all errors. More...
 
 callbackUndoCommentSubst ($matches)
 Callback function for undoing escaping of stray angled brackets in comments. More...
 
 callbackArmorCommentEntities ($matches)
 Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them. More...
 
- Public Member Functions inherited from HTMLPurifier_Lexer
 __construct ()
 
 parseText ($string, $config)
 
 parseAttr ($string, $config)
 
 parseData ($string, $is_attr, $config)
 Parses special entities into the proper characters. More...
 
 tokenizeHTML ($string, $config, $context)
 Lexes an HTML string into tokens. More...
 
 normalize ($html, $config, $context)
 Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff. More...
 
 extractBody ($html)
 Takes a string of HTML (fragment or document) and returns the content. More...
 

Additional Inherited Members

- Static Public Member Functions inherited from HTMLPurifier_Lexer
static create ($config)
 Retrieves or sets the default Lexer as a Prototype Factory. More...
 
- Data Fields inherited from HTMLPurifier_Lexer
 $tracksLineNumbers = false
 Whether or not this lexer implements line-number/column-number tracking. More...
 
- Protected Member Functions inherited from HTMLPurifier_Lexer_DOMLex
 tokenizeDOM ($node, &$tokens, $config)
 Iterative function that tokenizes a node, putting it into an accumulator. More...
 
 getTagName ($node)
 Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6. More...
 
 getData ($node)
 Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6. More...
 
 createStartNode ($node, &$tokens, $collect, $config)
 
 createEndNode ($node, &$tokens)
 
 transformAttrToAssoc ($node_map)
 Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array. More...
 
 wrapHTML ($html, $config, $context, $use_div=true)
 Wraps an HTML fragment in the necessary HTML. More...
 
- Static Protected Member Functions inherited from HTMLPurifier_Lexer
static escapeCDATA ($string)
 Translates CDATA sections into regular sections (through escaping). More...
 
static escapeCommentedCDATA ($string)
 Special CDATA case that is especially convoluted for <script> More...
 
static removeIEConditional ($string)
 Special Internet Explorer conditional comments should be removed. More...
 
static CDATACallback ($matches)
 Callback function for escapeCDATA() that does the work. More...
 
- Protected Attributes inherited from HTMLPurifier_Lexer
 $_special_entity2str
 Most common entity to raw value conversion table for special entities. More...
 

Detailed Description

Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.

Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.

Note
Recent changes to PHP's DOM extension have resulted in some fatal error conditions with the original version of PH5P. Pending changes, this lexer will punt to DirectLex if DOM throws an exception.

Definition at line 13 of file PH5P.php.

Member Function Documentation

◆ tokenizeHTML()

HTMLPurifier_Lexer_PH5P::tokenizeHTML (   $html,
  $config,
  $context 
)
Parameters
string$html
HTMLPurifier_Config$config
HTMLPurifier_Context$context
Returns
HTMLPurifier_Token[]

Definition at line 21 of file PH5P.php.

References $config, $context, $html, $parser, HTMLPurifier_Lexer\normalize(), HTMLPurifier_Lexer_DOMLex\tokenizeDOM(), and HTMLPurifier_Lexer_DOMLex\wrapHTML().

22  {
23  $new_html = $this->normalize($html, $config, $context);
24  $new_html = $this->wrapHTML($new_html, $config, $context, false /* no div */);
25  try {
26  $parser = new HTML5($new_html);
27  $doc = $parser->save();
28  } catch (DOMException $e) {
29  // Uh oh, it failed. Punt to DirectLex.
30  $lexer = new HTMLPurifier_Lexer_DirectLex();
31  $context->register('PH5PError', $e); // save the error, so we can detect it
32  return $lexer->tokenizeHTML($html, $config, $context); // use original HTML
33  }
34  $tokens = array();
35  $this->tokenizeDOM(
36  $doc->getElementsByTagName('html')->item(0)-> // <html>
37  getElementsByTagName('body')->item(0) // <body>
38  ,
39  $tokens, $config
40  );
41  return $tokens;
42  }
$context
Definition: webdav.php:25
$config
Definition: bootstrap.php:15
tokenizeDOM($node, &$tokens, $config)
Iterative function that tokenizes a node, putting it into an accumulator.
Definition: DOMLex.php:109
wrapHTML($html, $config, $context, $use_div=true)
Wraps an HTML fragment in the necessary HTML.
Definition: DOMLex.php:310
Our in-house implementation of a parser.
Definition: DirectLex.php:13
normalize($html, $config, $context)
Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits...
Definition: Lexer.php:305
Definition: PH5P.php:70
$parser
Definition: BPMN2Parser.php:23
$html
Definition: example_001.php:87
+ Here is the call graph for this function:

The documentation for this class was generated from the following file: