ILIAS  trunk Revision v12.0_alpha-377-g3641b37b9db
Sanitizer.php File Reference

Go to the source code of this file.

Data Structures

class  Sanitizer
 

Functions

 codepointToUtf8 ($codepoint)
 

Variables

const MW_CHAR_REFS_REGEX '/&([A-Za-z0-9\x80-\xff]+); |&\#([0-9]+); |&\#x([0-9A-Za-z]+); |&\#X([0-9A-Za-z]+); |(&)/x'
 This file is part of ILIAS, a powerful learning management system published by ILIAS open source e-Learning e.V. More...
 
 $attrib = '[A-Za-z0-9]'
 Regular expression to match HTML/XML attribute pairs within a tag. More...
 
 $space = '[\x09\x0a\x0d\x20]'
 
const MW_ATTRIBS_REGEX "/(?:^|$space)($attrib+) ($space*=$space* (?: \"([^<\"]*)\" | '([^<']*)' | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+) | (\#[0-9a-fA-F]+) # Technically wrong, but lots of ) )?(?=$space|\$)/sx"
 
global $wgHtmlEntities
 List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html. More...
 
global $wgHtmlEntityAliases
 Character entity aliases accepted by MediaWiki. More...
 

Function Documentation

◆ codepointToUtf8()

codepointToUtf8 (   $codepoint)

Definition at line 320 of file Sanitizer.php.

325{
326 if ($codepoint < 0x80) {
327 return chr($codepoint);
328 }
329 if ($codepoint < 0x800) {
330 return chr($codepoint >> 6 & 0x3f | 0xc0) .
331 chr($codepoint & 0x3f | 0x80);
332 }
333 if ($codepoint < 0x10000) {
334 return chr($codepoint >> 12 & 0x0f | 0xe0) .
335 chr($codepoint >> 6 & 0x3f | 0x80) .
336 chr($codepoint & 0x3f | 0x80);
337 }
338 if ($codepoint < 0x110000) {
339 return chr($codepoint >> 18 & 0x07 | 0xf0) .
340 chr($codepoint >> 12 & 0x3f | 0x80) .
341 chr($codepoint >> 6 & 0x3f | 0x80) .

Variable Documentation

◆ $attrib

$attrib = '[A-Za-z0-9]'

Regular expression to match HTML/XML attribute pairs within a tag.

Allows some... latitude. Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes

Definition at line 35 of file Sanitizer.php.

Referenced by SurveyImportParser\handlerBeginTag().

◆ $space

$space = '[\x09\x0a\x0d\x20]'

◆ $wgHtmlEntities

$wgHtmlEntities
private

List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html.

Definition at line 56 of file Sanitizer.php.

◆ $wgHtmlEntityAliases

$wgHtmlEntityAliases
Initial value:
= array(
'רלמ' => 'rlm',
'رلم' => 'rlm',
)

Character entity aliases accepted by MediaWiki.

Definition at line 314 of file Sanitizer.php.

◆ MW_ATTRIBS_REGEX

const MW_ATTRIBS_REGEX "/(?:^|$space)($attrib+) ($space*=$space* (?: \"([^<\"]*)\" | '([^<']*)' | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+) | (\#[0-9a-fA-F]+) # Technically wrong, but lots of ) )?(?=$space|\$)/sx"

Definition at line 49 of file Sanitizer.php.

◆ MW_CHAR_REFS_REGEX

const MW_CHAR_REFS_REGEX '/&([A-Za-z0-9\x80-\xff]+); |&\#([0-9]+); |&\#x([0-9A-Za-z]+); |&\#X([0-9A-Za-z]+); |(&)/x'

This file is part of ILIAS, a powerful learning management system published by ILIAS open source e-Learning e.V.

ILIAS is licensed with the GPL-3.0, see https://www.gnu.org/licenses/gpl-3.0.en.html You should have received a copy of said license along with the source code, too.

If this is not the case or you just want to try ILIAS, you'll find us at: https://www.ilias.de https://github.com/ILIAS-eLearning Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences

Definition at line 28 of file Sanitizer.php.