ILIAS
release_4-3 Revision
|
A UTF-8 specific character encoder that handles cleaning and transforming. More...
Static Public Member Functions | |
static | muteErrorHandler () |
Error-handler that mutes errors, alternative to shut-up operator. | |
static | unsafeIconv ($in, $out, $text) |
iconv wrapper which mutes errors, but doesn't work around bugs. | |
static | iconv ($in, $out, $text, $max_chunk_size=8000) |
iconv wrapper which mutes errors and works around bugs. | |
static | cleanUTF8 ($str, $force_php=false) |
Cleans a UTF-8 string for well-formedness and SGML validity. | |
static | unichr ($code) |
Translates a Unicode codepoint into its corresponding UTF-8 character. | |
static | iconvAvailable () |
static | convertToUTF8 ($str, $config, $context) |
Converts a string to UTF-8 based on configuration. | |
static | convertFromUTF8 ($str, $config, $context) |
Converts a string from UTF-8 based on configuration. | |
static | convertToASCIIDumbLossless ($str) |
Lossless (character-wise) conversion of HTML to ASCII. | |
static | testIconvTruncateBug () |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. | |
static | testEncodingSupportsASCII ($encoding, $bypass=false) |
This expensive function tests whether or not a given character encoding supports ASCII. |
Data Fields | |
const | ICONV_OK = 0 |
No bugs detected in iconv. | |
const | ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found. | |
const | ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes. |
Private Member Functions | |
__construct () | |
Constructor throws fatal error if you attempt to instantiate class. |
A UTF-8 specific character encoder that handles cleaning and transforming.
Definition at line 7 of file Encoder.php.
|
private |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 13 of file Encoder.php.
|
static |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Definition at line 109 of file Encoder.php.
Referenced by HTMLPurifier_Printer\escape(), HTMLPurifier_AttrDef\expandCSSEscape(), and HTMLPurifier_Lexer\normalize().
|
static |
Converts a string from UTF-8 based on configuration.
Definition at line 366 of file Encoder.php.
References convertToASCIIDumbLossless(), iconv(), iconvAvailable(), and testEncodingSupportsASCII().
Referenced by HTMLPurifier\purify().
|
static |
Lossless (character-wise) conversion of HTML to ASCII.
$str | UTF-8 string to be converted to ASCII |
Definition at line 413 of file Encoder.php.
References $result.
Referenced by convertFromUTF8().
|
static |
Converts a string to UTF-8 based on configuration.
Definition at line 336 of file Encoder.php.
References iconvAvailable(), and unsafeIconv().
Referenced by HTMLPurifier\purify().
|
static |
iconv wrapper which mutes errors and works around bugs.
Definition at line 35 of file Encoder.php.
References $in, $out, testIconvTruncateBug(), and unsafeIconv().
Referenced by convertFromUTF8(), and unsafeIconv().
|
static |
Definition at line 325 of file Encoder.php.
References ICONV_UNUSABLE, and testIconvTruncateBug().
Referenced by convertFromUTF8(), and convertToUTF8().
|
static |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 20 of file Encoder.php.
|
static |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
string | $encoding | Encoding name to test, as per iconv format |
bool | $bypass | Whether or not to bypass the precompiled arrays. |
Definition at line 498 of file Encoder.php.
References $ret, and unsafeIconv().
Referenced by convertFromUTF8().
|
static |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly.
In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
Definition at line 469 of file Encoder.php.
References ICONV_OK, ICONV_TRUNCATES, ICONV_UNUSABLE, and unsafeIconv().
Referenced by iconv(), and iconvAvailable().
|
static |
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition at line 288 of file Encoder.php.
References $ret.
Referenced by HTMLPurifier_AttrDef\expandCSSEscape(), and HTMLPurifier_EntityParser\nonSpecialEntityCallback().
|
static |
iconv wrapper which mutes errors, but doesn't work around bugs.
Definition at line 25 of file Encoder.php.
References $in, $out, and iconv().
Referenced by convertToUTF8(), iconv(), testEncodingSupportsASCII(), and testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_OK = 0 |
No bugs detected in iconv.
Definition at line 445 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found.
Definition at line 449 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes.
Definition at line 453 of file Encoder.php.
Referenced by iconvAvailable(), and testIconvTruncateBug().