ILIAS  release_8 Revision v8.19
All Data Structures Namespaces Files Functions Variables Modules Pages
Sanitizer.php File Reference

Go to the source code of this file.

Data Structures

class  Sanitizer
 

Functions

 codepointToUtf8 ($codepoint)
 

Variables

const MW_CHAR_REFS_REGEX '/&([A-Za-z0-9\x80-\xff]+); |&\#([0-9]+); |&\#x([0-9A-Za-z]+); |&\#X([0-9A-Za-z]+); |(&)/x'
 Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences. More...
 
 $attrib = '[A-Za-z0-9]'
 Regular expression to match HTML/XML attribute pairs within a tag. More...
 
 $space = '[\x09\x0a\x0d\x20]'
 
const MW_ATTRIBS_REGEX "/(?:^|$space)($attrib+) ($space*=$space* (?: # The attribute value: quoted or alone \"([^<\"]*)\" | '([^<']*)' | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+) | (\#[0-9a-fA-F]+) # Technically wrong, but lots of # colors are specified like this. # We'll be normalizing it. ) )?(?=$space|\$)/sx"
 
global $wgHtmlEntities
 List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html. More...
 
global $wgHtmlEntityAliases
 Character entity aliases accepted by MediaWiki. More...
 

Function Documentation

◆ codepointToUtf8()

codepointToUtf8 (   $codepoint)

Definition at line 327 of file Sanitizer.php.

Referenced by Sanitizer\hexCharReference().

332 {
333  if ($codepoint < 0x80) {
334  return chr($codepoint);
335  }
336  if ($codepoint < 0x800) {
337  return chr($codepoint >> 6 & 0x3f | 0xc0) .
338  chr($codepoint & 0x3f | 0x80);
339  }
340  if ($codepoint < 0x10000) {
341  return chr($codepoint >> 12 & 0x0f | 0xe0) .
342  chr($codepoint >> 6 & 0x3f | 0x80) .
343  chr($codepoint & 0x3f | 0x80);
344  }
345  if ($codepoint < 0x110000) {
346  return chr($codepoint >> 18 & 0x07 | 0xf0) .
347  chr($codepoint >> 12 & 0x3f | 0x80) .
348  chr($codepoint >> 6 & 0x3f | 0x80) .
+ Here is the caller graph for this function:

Variable Documentation

◆ $attrib

$attrib = '[A-Za-z0-9]'

Regular expression to match HTML/XML attribute pairs within a tag.

Allows some... latitude. Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes

Definition at line 42 of file Sanitizer.php.

Referenced by SurveyImportParser\handlerBeginTag().

◆ $space

$space = '[\x09\x0a\x0d\x20]'

Definition at line 43 of file Sanitizer.php.

◆ $wgHtmlEntities

$wgHtmlEntities
private

List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html.

Definition at line 63 of file Sanitizer.php.

Referenced by Sanitizer\hexCharReference().

◆ $wgHtmlEntityAliases

$wgHtmlEntityAliases
Initial value:
= array(
'רלמ' => 'rlm',
'رلم' => 'rlm',
)

Character entity aliases accepted by MediaWiki.

Definition at line 321 of file Sanitizer.php.

Referenced by Sanitizer\hexCharReference().

◆ MW_ATTRIBS_REGEX

const MW_ATTRIBS_REGEX "/(?:^|$space)($attrib+) ($space*=$space* (?: # The attribute value: quoted or alone \"([^<\"]*)\" | '([^<']*)' | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+) | (\#[0-9a-fA-F]+) # Technically wrong, but lots of # colors are specified like this. # We'll be normalizing it. ) )?(?=$space|\$)/sx"

Definition at line 44 of file Sanitizer.php.

◆ MW_CHAR_REFS_REGEX

const MW_CHAR_REFS_REGEX '/&([A-Za-z0-9\x80-\xff]+); |&\#([0-9]+); |&\#x([0-9A-Za-z]+); |&\#X([0-9A-Za-z]+); |(&)/x'

Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences.

Definition at line 30 of file Sanitizer.php.

Referenced by Sanitizer\hexCharReference().