ILIAS
release_5-3 Revision v5.3.23-19-g915713cf615
|
A UTF-8 specific character encoder that handles cleaning and transforming. More...
Static Public Member Functions | |
static | muteErrorHandler () |
Error-handler that mutes errors, alternative to shut-up operator. More... | |
static | unsafeIconv ($in, $out, $text) |
iconv wrapper which mutes errors, but doesn't work around bugs. More... | |
static | iconv ($in, $out, $text, $max_chunk_size=8000) |
iconv wrapper which mutes errors and works around bugs. More... | |
static | cleanUTF8 ($str, $force_php=false) |
Cleans a UTF-8 string for well-formedness and SGML validity. More... | |
static | unichr ($code) |
Translates a Unicode codepoint into its corresponding UTF-8 character. More... | |
static | iconvAvailable () |
static | convertToUTF8 ($str, $config, $context) |
Convert a string to UTF-8 based on configuration. More... | |
static | convertFromUTF8 ($str, $config, $context) |
Converts a string from UTF-8 based on configuration. More... | |
static | convertToASCIIDumbLossless ($str) |
Lossless (character-wise) conversion of HTML to ASCII. More... | |
static | testIconvTruncateBug () |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. More... | |
static | testEncodingSupportsASCII ($encoding, $bypass=false) |
This expensive function tests whether or not a given character encoding supports ASCII. More... | |
Data Fields | |
const | ICONV_OK = 0 |
No bugs detected in iconv. More... | |
const | ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found. More... | |
const | ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes. More... | |
Private Member Functions | |
__construct () | |
Constructor throws fatal error if you attempt to instantiate class. More... | |
A UTF-8 specific character encoder that handles cleaning and transforming.
Definition at line 7 of file Encoder.php.
|
private |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 13 of file Encoder.php.
|
static |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Specifically, it will permit: \x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF} Source: https://www.w3.org/TR/REC-xml/#NT-Char Arguably this function should be modernized to the HTML5 set of allowed characters: https://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream which simultaneously expand and restrict the set of allowed characters.
string | $str | The string to clean |
bool | $force_php |
Definition at line 134 of file Encoder.php.
Referenced by HTMLPurifier_Printer\escape(), HTMLPurifier_AttrDef\expandCSSEscape(), and HTMLPurifier_Lexer\normalize().
|
static |
Converts a string from UTF-8 based on configuration.
string | $str | The string to convert |
HTMLPurifier_Config | $config | |
HTMLPurifier_Context | $context |
Definition at line 426 of file Encoder.php.
References $config, convertToASCIIDumbLossless(), iconv(), iconvAvailable(), and testEncodingSupportsASCII().
Referenced by HTMLPurifier\purify().
|
static |
Lossless (character-wise) conversion of HTML to ASCII.
string | $str | UTF-8 string to be converted to ASCII |
Definition at line 480 of file Encoder.php.
Referenced by convertFromUTF8().
|
static |
Convert a string to UTF-8 based on configuration.
string | $str | The string to convert |
HTMLPurifier_Config | $config | |
HTMLPurifier_Context | $context |
Definition at line 378 of file Encoder.php.
References $config, iconvAvailable(), testIconvTruncateBug(), and unsafeIconv().
Referenced by HTMLPurifier\purify().
|
static |
iconv wrapper which mutes errors and works around bugs.
string | $in | Input encoding |
string | $out | Output encoding |
string | $text | The text to convert |
int | $max_chunk_size |
Definition at line 48 of file Encoder.php.
References $code, $i, $in, $out, $r, $text, testIconvTruncateBug(), and unsafeIconv().
Referenced by convertFromUTF8(), and unsafeIconv().
|
static |
Definition at line 362 of file Encoder.php.
References ICONV_UNUSABLE, and testIconvTruncateBug().
Referenced by convertFromUTF8(), and convertToUTF8().
|
static |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 21 of file Encoder.php.
|
static |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
string | $encoding | Encoding name to test, as per iconv format |
bool | $bypass | Whether or not to bypass the precompiled arrays. |
Definition at line 571 of file Encoder.php.
References $i, $r, $ret, and unsafeIconv().
Referenced by convertFromUTF8().
|
static |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly.
In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
Definition at line 537 of file Encoder.php.
References $code, $r, ICONV_OK, ICONV_TRUNCATES, ICONV_UNUSABLE, and unsafeIconv().
Referenced by convertToUTF8(), iconv(), and iconvAvailable().
|
static |
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition at line 315 of file Encoder.php.
References $code, $ret, $w, $x, and $y.
Referenced by HTMLPurifier_EntityParser\entityCallback(), HTMLPurifier_AttrDef\expandCSSEscape(), and HTMLPurifier_EntityParser\nonSpecialEntityCallback().
|
static |
iconv wrapper which mutes errors, but doesn't work around bugs.
string | $in | Input encoding |
string | $out | Output encoding |
string | $text | The text to convert |
Definition at line 32 of file Encoder.php.
References $in, $out, $r, $text, and iconv().
Referenced by convertToUTF8(), iconv(), testEncodingSupportsASCII(), and testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_OK = 0 |
No bugs detected in iconv.
Definition at line 513 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found.
Definition at line 517 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes.
Definition at line 521 of file Encoder.php.
Referenced by iconvAvailable(), and testIconvTruncateBug().