ILIAS  release_8 Revision v8.24
Sanitizer.php File Reference

Go to the source code of this file.

Data Structures

class  Sanitizer
 

Functions

 codepointToUtf8 ($codepoint)
 

Variables

const MW_CHAR_REFS_REGEX '/&([A-Za-z0-9\x80-\xff]+); |&\#([0-9]+); |&\#x([0-9A-Za-z]+); |&\#X([0-9A-Za-z]+); |(&)/x'
 Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences. More...
 
 $attrib = '[A-Za-z0-9]'
 Regular expression to match HTML/XML attribute pairs within a tag. More...
 
 $space = '[\x09\x0a\x0d\x20]'
 
const MW_ATTRIBS_REGEX "/(?:^|$space)($attrib+) ($space*=$space* (?: \"([^<\"]*)\" | '([^<']*)' | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+) | (\#[0-9a-fA-F]+) # Technically wrong, but lots of ) )?(?=$space|\$)/sx"
 
global $wgHtmlEntities
 List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html. More...
 
global $wgHtmlEntityAliases
 Character entity aliases accepted by MediaWiki. More...
 

Function Documentation

◆ codepointToUtf8()

codepointToUtf8 (   $codepoint)

Definition at line 327 of file Sanitizer.php.

332{
333 if ($codepoint < 0x80) {
334 return chr($codepoint);
335 }
336 if ($codepoint < 0x800) {
337 return chr($codepoint >> 6 & 0x3f | 0xc0) .
338 chr($codepoint & 0x3f | 0x80);
339 }
340 if ($codepoint < 0x10000) {
341 return chr($codepoint >> 12 & 0x0f | 0xe0) .
342 chr($codepoint >> 6 & 0x3f | 0x80) .
343 chr($codepoint & 0x3f | 0x80);
344 }
345 if ($codepoint < 0x110000) {
346 return chr($codepoint >> 18 & 0x07 | 0xf0) .
347 chr($codepoint >> 12 & 0x3f | 0x80) .
348 chr($codepoint >> 6 & 0x3f | 0x80) .

Variable Documentation

◆ $attrib

$attrib = '[A-Za-z0-9]'

Regular expression to match HTML/XML attribute pairs within a tag.

Allows some... latitude. Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes

Definition at line 42 of file Sanitizer.php.

Referenced by SurveyImportParser\handlerBeginTag().

◆ $space

$space = '[\x09\x0a\x0d\x20]'

Definition at line 43 of file Sanitizer.php.

◆ $wgHtmlEntities

$wgHtmlEntities
private

List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html.

Definition at line 63 of file Sanitizer.php.

◆ $wgHtmlEntityAliases

$wgHtmlEntityAliases
Initial value:
= array(
'רלמ' => 'rlm',
'رلم' => 'rlm',
)

Character entity aliases accepted by MediaWiki.

Definition at line 321 of file Sanitizer.php.

◆ MW_ATTRIBS_REGEX

const MW_ATTRIBS_REGEX "/(?:^|$space)($attrib+) ($space*=$space* (?: \"([^<\"]*)\" | '([^<']*)' | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+) | (\#[0-9a-fA-F]+) # Technically wrong, but lots of ) )?(?=$space|\$)/sx"

Definition at line 56 of file Sanitizer.php.

◆ MW_CHAR_REFS_REGEX

const MW_CHAR_REFS_REGEX '/&([A-Za-z0-9\x80-\xff]+); |&\#([0-9]+); |&\#x([0-9A-Za-z]+); |&\#X([0-9A-Za-z]+); |(&)/x'

Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences.

Definition at line 35 of file Sanitizer.php.