ILIAS  release_5-3 Revision v5.3.23-19-g915713cf615
HTMLPurifier_EntityParser Class Reference

Handles referencing and derefencing character entities. More...

+ Collaboration diagram for HTMLPurifier_EntityParser:

Public Member Functions

 __construct ()
 
 substituteTextEntities ($string)
 Substitute entities with the parsed equivalents. More...
 
 substituteAttrEntities ($string)
 Substitute entities with the parsed equivalents. More...
 
 substituteNonSpecialEntities ($string)
 Substitutes non-special entities with their parsed equivalents. More...
 
 substituteSpecialEntities ($string)
 Substitutes only special entities with their parsed equivalents. More...
 

Protected Member Functions

 entityCallback ($matches)
 Callback function for substituteNonSpecialEntities() that does the work. More...
 
 nonSpecialEntityCallback ($matches)
 Callback function for substituteNonSpecialEntities() that does the work. More...
 
 specialEntityCallback ($matches)
 Callback function for substituteSpecialEntities() that does the work. More...
 

Protected Attributes

 $_entity_lookup
 Reference to entity lookup table. More...
 
 $_textEntitiesRegex
 Callback regex string for entities in text. More...
 
 $_attrEntitiesRegex
 Callback regex string for entities in attributes. More...
 
 $_semiOptionalPrefixRegex
 Tests if the beginning of a string is a semi-optional regex. More...
 
 $_substituteEntitiesRegex
 Callback regex string for parsing entities. More...
 
 $_special_dec2str
 Decimal to parsed string conversion table for special entities. More...
 
 $_special_ent2dec
 Stripped entity names to decimal conversion table for special entities. More...
 

Detailed Description

Handles referencing and derefencing character entities.

Definition at line 10 of file EntityParser.php.

Constructor & Destructor Documentation

◆ __construct()

HTMLPurifier_EntityParser::__construct ( )

Definition at line 36 of file EntityParser.php.

36  {
37  // From
38  // http://stackoverflow.com/questions/15532252/why-is-reg-being-rendered-as-without-the-bounding-semicolon
39  $semi_optional = "quot|QUOT|lt|LT|gt|GT|amp|AMP|AElig|Aacute|Acirc|Agrave|Aring|Atilde|Auml|COPY|Ccedil|ETH|Eacute|Ecirc|Egrave|Euml|Iacute|Icirc|Igrave|Iuml|Ntilde|Oacute|Ocirc|Ograve|Oslash|Otilde|Ouml|REG|THORN|Uacute|Ucirc|Ugrave|Uuml|Yacute|aacute|acirc|acute|aelig|agrave|aring|atilde|auml|brvbar|ccedil|cedil|cent|copy|curren|deg|divide|eacute|ecirc|egrave|eth|euml|frac12|frac14|frac34|iacute|icirc|iexcl|igrave|iquest|iuml|laquo|macr|micro|middot|nbsp|not|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otilde|ouml|para|plusmn|pound|raquo|reg|sect|shy|sup1|sup2|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml|uuml|yacute|yen|yuml";
40 
41  // NB: three empty captures to put the fourth match in the right
42  // place
43  $this->_semiOptionalPrefixRegex = "/&()()()($semi_optional)/";
44 
45  $this->_textEntitiesRegex =
46  '/&(?:'.
47  // hex
48  '[#]x([a-fA-F0-9]+);?|'.
49  // dec
50  '[#]0*(\d+);?|'.
51  // string (mandatory semicolon)
52  // NB: order matters: match semicolon preferentially
53  '([A-Za-z_:][A-Za-z0-9.\-_:]*);|'.
54  // string (optional semicolon)
55  "($semi_optional)".
56  ')/';
57 
58  $this->_attrEntitiesRegex =
59  '/&(?:'.
60  // hex
61  '[#]x([a-fA-F0-9]+);?|'.
62  // dec
63  '[#]0*(\d+);?|'.
64  // string (mandatory semicolon)
65  // NB: order matters: match semicolon preferentially
66  '([A-Za-z_:][A-Za-z0-9.\-_:]*);|'.
67  // string (optional semicolon)
68  // don't match if trailing is equals or alphanumeric (URL
69  // like)
70  "($semi_optional)(?![=;A-Za-z0-9])".
71  ')/';
72 
73  }

Member Function Documentation

◆ entityCallback()

HTMLPurifier_EntityParser::entityCallback (   $matches)
protected

Callback function for substituteNonSpecialEntities() that does the work.

Parameters
array$matchesPCRE matches array, with 0 the entire match, and either index 1, 2 or 3 set with a hex value, dec value, or string (respectively).
Returns
string Replacement string.

Definition at line 116 of file EntityParser.php.

References array, HTMLPurifier_EntityLookup\instance(), and HTMLPurifier_Encoder\unichr().

117  {
118  $entity = $matches[0];
119  $hex_part = @$matches[1];
120  $dec_part = @$matches[2];
121  $named_part = empty($matches[3]) ? @$matches[4] : $matches[3];
122  if ($hex_part !== NULL && $hex_part !== "") {
123  return HTMLPurifier_Encoder::unichr(hexdec($hex_part));
124  } elseif ($dec_part !== NULL && $dec_part !== "") {
125  return HTMLPurifier_Encoder::unichr((int) $dec_part);
126  } else {
127  if (!$this->_entity_lookup) {
128  $this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
129  }
130  if (isset($this->_entity_lookup->table[$named_part])) {
131  return $this->_entity_lookup->table[$named_part];
132  } else {
133  // exact match didn't match anything, so test if
134  // any of the semicolon optional match the prefix.
135  // Test that this is an EXACT match is important to
136  // prevent infinite loop
137  if (!empty($matches[3])) {
138  return preg_replace_callback(
139  $this->_semiOptionalPrefixRegex,
140  array($this, 'entityCallback'),
141  $entity
142  );
143  }
144  return $entity;
145  }
146  }
147  }
static unichr($code)
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition: Encoder.php:315
static instance($prototype=false)
Retrieves sole instance of the object.
Create styles array
The data for the language used.
+ Here is the call graph for this function:

◆ nonSpecialEntityCallback()

HTMLPurifier_EntityParser::nonSpecialEntityCallback (   $matches)
protected

Callback function for substituteNonSpecialEntities() that does the work.

Parameters
array$matchesPCRE matches array, with 0 the entire match, and either index 1, 2 or 3 set with a hex value, dec value, or string (respectively).
Returns
string Replacement string.

Definition at line 211 of file EntityParser.php.

References $code, HTMLPurifier_EntityLookup\instance(), and HTMLPurifier_Encoder\unichr().

212  {
213  // replaces all but big five
214  $entity = $matches[0];
215  $is_num = (@$matches[0][1] === '#');
216  if ($is_num) {
217  $is_hex = (@$entity[2] === 'x');
218  $code = $is_hex ? hexdec($matches[1]) : (int) $matches[2];
219  // abort for special characters
220  if (isset($this->_special_dec2str[$code])) {
221  return $entity;
222  }
223  return HTMLPurifier_Encoder::unichr($code);
224  } else {
225  if (isset($this->_special_ent2dec[$matches[3]])) {
226  return $entity;
227  }
228  if (!$this->_entity_lookup) {
229  $this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
230  }
231  if (isset($this->_entity_lookup->table[$matches[3]])) {
232  return $this->_entity_lookup->table[$matches[3]];
233  } else {
234  return $entity;
235  }
236  }
237  }
$code
Definition: example_050.php:99
static unichr($code)
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition: Encoder.php:315
static instance($prototype=false)
Retrieves sole instance of the object.
+ Here is the call graph for this function:

◆ specialEntityCallback()

HTMLPurifier_EntityParser::specialEntityCallback (   $matches)
protected

Callback function for substituteSpecialEntities() that does the work.

This callback has same syntax as nonSpecialEntityCallback().

Parameters
array$matchesPCRE-style matches array, with 0 the entire match, and either index 1, 2 or 3 set with a hex value, dec value, or string (respectively).
Returns
string Replacement string.

Definition at line 267 of file EntityParser.php.

268  {
269  $entity = $matches[0];
270  $is_num = (@$matches[0][1] === '#');
271  if ($is_num) {
272  $is_hex = (@$entity[2] === 'x');
273  $int = $is_hex ? hexdec($matches[1]) : (int) $matches[2];
274  return isset($this->_special_dec2str[$int]) ?
275  $this->_special_dec2str[$int] :
276  $entity;
277  } else {
278  return isset($this->_special_ent2dec[$matches[3]]) ?
279  $this->_special_dec2str[$this->_special_ent2dec[$matches[3]]] :
280  $entity;
281  }
282  }

◆ substituteAttrEntities()

HTMLPurifier_EntityParser::substituteAttrEntities (   $string)

Substitute entities with the parsed equivalents.

Use this on attribute contents in documents.

Parameters
string$stringString to have entities parsed.
Returns
string Parsed string.

Definition at line 98 of file EntityParser.php.

References array.

99  {
100  return preg_replace_callback(
101  $this->_attrEntitiesRegex,
102  array($this, 'entityCallback'),
103  $string
104  );
105  }
Create styles array
The data for the language used.

◆ substituteNonSpecialEntities()

HTMLPurifier_EntityParser::substituteNonSpecialEntities (   $string)

Substitutes non-special entities with their parsed equivalents.

Since running this whenever you have parsed character is t3h 5uck, we run it before everything else.

Parameters
string$stringString to have non-special entities parsed.
Returns
string Parsed string.

Definition at line 192 of file EntityParser.php.

References array.

193  {
194  // it will try to detect missing semicolons, but don't rely on it
195  return preg_replace_callback(
196  $this->_substituteEntitiesRegex,
197  array($this, 'nonSpecialEntityCallback'),
198  $string
199  );
200  }
Create styles array
The data for the language used.

◆ substituteSpecialEntities()

HTMLPurifier_EntityParser::substituteSpecialEntities (   $string)

Substitutes only special entities with their parsed equivalents.

We try to avoid calling this function because otherwise, it would have to be called a lot (for every parsed section).

Parameters
string$stringString to have non-special entities parsed.
Returns
string Parsed string.

Definition at line 248 of file EntityParser.php.

References array.

249  {
250  return preg_replace_callback(
251  $this->_substituteEntitiesRegex,
252  array($this, 'specialEntityCallback'),
253  $string
254  );
255  }
Create styles array
The data for the language used.

◆ substituteTextEntities()

HTMLPurifier_EntityParser::substituteTextEntities (   $string)

Substitute entities with the parsed equivalents.

Use this on textual data in an HTML document (as opposed to attributes.)

Parameters
string$stringString to have entities parsed.
Returns
string Parsed string.

Definition at line 82 of file EntityParser.php.

References array.

83  {
84  return preg_replace_callback(
85  $this->_textEntitiesRegex,
86  array($this, 'entityCallback'),
87  $string
88  );
89  }
Create styles array
The data for the language used.

Field Documentation

◆ $_attrEntitiesRegex

HTMLPurifier_EntityParser::$_attrEntitiesRegex
protected

Callback regex string for entities in attributes.

string

Definition at line 29 of file EntityParser.php.

◆ $_entity_lookup

HTMLPurifier_EntityParser::$_entity_lookup
protected

Reference to entity lookup table.

HTMLPurifier_EntityLookup

Definition at line 17 of file EntityParser.php.

◆ $_semiOptionalPrefixRegex

HTMLPurifier_EntityParser::$_semiOptionalPrefixRegex
protected

Tests if the beginning of a string is a semi-optional regex.

Definition at line 34 of file EntityParser.php.

◆ $_special_dec2str

HTMLPurifier_EntityParser::$_special_dec2str
protected
Initial value:
=
34 => '"',
38 => '&',
39 => "'",
60 => '<',
62 => '>'
)

Decimal to parsed string conversion table for special entities.

array

Definition at line 163 of file EntityParser.php.

◆ $_special_ent2dec

HTMLPurifier_EntityParser::$_special_ent2dec
protected
Initial value:
=
'quot' => 34,
'amp' => 38,
'lt' => 60,
'gt' => 62
)

Stripped entity names to decimal conversion table for special entities.

array

Definition at line 176 of file EntityParser.php.

◆ $_substituteEntitiesRegex

HTMLPurifier_EntityParser::$_substituteEntitiesRegex
protected
Initial value:
=
'/&(?:[#]x([a-fA-F0-9]+)|[#]0*(\d+)|([A-Za-z_:][A-Za-z0-9.\-_:]*));?/'

Callback regex string for parsing entities.

string

Definition at line 155 of file EntityParser.php.

◆ $_textEntitiesRegex

HTMLPurifier_EntityParser::$_textEntitiesRegex
protected

Callback regex string for entities in text.

string

Definition at line 23 of file EntityParser.php.


The documentation for this class was generated from the following file: