ILIAS
Release_5_0_x_branch Revision 61816
|
A UTF-8 specific character encoder that handles cleaning and transforming. More...
Static Public Member Functions | |
static | muteErrorHandler () |
Error-handler that mutes errors, alternative to shut-up operator. | |
static | unsafeIconv ($in, $out, $text) |
iconv wrapper which mutes errors, but doesn't work around bugs. | |
static | iconv ($in, $out, $text, $max_chunk_size=8000) |
iconv wrapper which mutes errors and works around bugs. | |
static | cleanUTF8 ($str, $force_php=false) |
Cleans a UTF-8 string for well-formedness and SGML validity. | |
static | unichr ($code) |
Translates a Unicode codepoint into its corresponding UTF-8 character. | |
static | iconvAvailable () |
static | convertToUTF8 ($str, $config, $context) |
Convert a string to UTF-8 based on configuration. | |
static | convertFromUTF8 ($str, $config, $context) |
Converts a string from UTF-8 based on configuration. | |
static | convertToASCIIDumbLossless ($str) |
Lossless (character-wise) conversion of HTML to ASCII. | |
static | testIconvTruncateBug () |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. | |
static | testEncodingSupportsASCII ($encoding, $bypass=false) |
This expensive function tests whether or not a given character encoding supports ASCII. |
Data Fields | |
const | ICONV_OK = 0 |
No bugs detected in iconv. | |
const | ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found. | |
const | ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes. |
Private Member Functions | |
__construct () | |
Constructor throws fatal error if you attempt to instantiate class. |
A UTF-8 specific character encoder that handles cleaning and transforming.
Definition at line 7 of file Encoder.php.
|
private |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 13 of file Encoder.php.
|
static |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
string | $str | The string to clean |
bool | $force_php |
Definition at line 127 of file Encoder.php.
Referenced by HTMLPurifier_Printer\escape(), HTMLPurifier_AttrDef\expandCSSEscape(), and HTMLPurifier_Lexer\normalize().
|
static |
Converts a string from UTF-8 based on configuration.
string | $str | The string to convert |
HTMLPurifier_Config | $config | |
HTMLPurifier_Context | $context |
Definition at line 420 of file Encoder.php.
References convertToASCIIDumbLossless(), iconv(), iconvAvailable(), and testEncodingSupportsASCII().
Referenced by HTMLPurifier\purify().
|
static |
Lossless (character-wise) conversion of HTML to ASCII.
string | $str | UTF-8 string to be converted to ASCII |
Definition at line 474 of file Encoder.php.
References $result.
Referenced by convertFromUTF8().
|
static |
Convert a string to UTF-8 based on configuration.
string | $str | The string to convert |
HTMLPurifier_Config | $config | |
HTMLPurifier_Context | $context |
Definition at line 372 of file Encoder.php.
References iconvAvailable(), testIconvTruncateBug(), and unsafeIconv().
Referenced by HTMLPurifier\purify().
|
static |
iconv wrapper which mutes errors and works around bugs.
string | $in | Input encoding |
string | $out | Output encoding |
string | $text | The text to convert |
int | $max_chunk_size |
Definition at line 48 of file Encoder.php.
References $in, $out, testIconvTruncateBug(), and unsafeIconv().
Referenced by convertFromUTF8(), and unsafeIconv().
|
static |
Definition at line 356 of file Encoder.php.
References ICONV_UNUSABLE, and testIconvTruncateBug().
Referenced by convertFromUTF8(), and convertToUTF8().
|
static |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 21 of file Encoder.php.
|
static |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
string | $encoding | Encoding name to test, as per iconv format |
bool | $bypass | Whether or not to bypass the precompiled arrays. |
Definition at line 565 of file Encoder.php.
References $ret, and unsafeIconv().
Referenced by convertFromUTF8().
|
static |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly.
In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
Definition at line 531 of file Encoder.php.
References ICONV_OK, ICONV_TRUNCATES, ICONV_UNUSABLE, and unsafeIconv().
Referenced by convertToUTF8(), iconv(), and iconvAvailable().
|
static |
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition at line 309 of file Encoder.php.
References $ret.
Referenced by HTMLPurifier_AttrDef\expandCSSEscape(), and HTMLPurifier_EntityParser\nonSpecialEntityCallback().
|
static |
iconv wrapper which mutes errors, but doesn't work around bugs.
string | $in | Input encoding |
string | $out | Output encoding |
string | $text | The text to convert |
Definition at line 32 of file Encoder.php.
References $in, $out, and iconv().
Referenced by convertToUTF8(), iconv(), testEncodingSupportsASCII(), and testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_OK = 0 |
No bugs detected in iconv.
Definition at line 507 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found.
Definition at line 511 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes.
Definition at line 515 of file Encoder.php.
Referenced by iconvAvailable(), and testIconvTruncateBug().