Definition at line 45 of file CleanUpTest.php.
◆ doTestBytes()
CleanUpTest::doTestBytes |
( |
|
$head, |
|
|
|
$tail |
|
) |
| |
- Todo:
- document
Definition at line 141 of file CleanUpTest.php.
References $i, $tail, $x, UtfNormal\cleanUp(), and UTF8_REPLACEMENT.
Referenced by testAllBytes().
143 for (
$i = 0x0;
$i < 256;
$i++) {
144 $char = $head . chr(
$i) .
$tail;
146 $x = sprintf(
"%02X",
$i);
150 (
$i > 0x001f &&
$i < 0x80)) {
154 "ASCII byte $x should be intact" 156 if ($char != $clean) {
164 "Forbidden byte $x should be rejected" 166 if ($norm != $clean) {
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ doTestDoubleBytes()
CleanUpTest::doTestDoubleBytes |
( |
|
$head, |
|
|
|
$tail |
|
) |
| |
- Todo:
- document
Definition at line 185 of file CleanUpTest.php.
References $tail, $x, UtfNormal\cleanUp(), UtfNormal\NFC(), and UTF8_REPLACEMENT.
Referenced by testDoubleBytes().
187 for ($first = 0xc0; $first < 0x100; $first++) {
188 for ($second = 0x80; $second < 0x100; $second++) {
189 $char = $head . chr($first) . chr($second) .
$tail;
191 $x = sprintf(
"%02X,%02X", $first, $second);
199 "Pair $x should be intact" 201 if ($norm != $clean) {
204 } elseif ($first > 0xfd || $second > 0xbf) {
205 # fe and ff are not legal head bytes -- expect two replacement chars 210 "Forbidden pair $x should be rejected" 212 if ($norm != $clean) {
220 "Forbidden pair $x should be rejected" 222 if ($norm != $clean) {
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ doTestTripleBytes()
CleanUpTest::doTestTripleBytes |
( |
|
$head, |
|
|
|
$tail |
|
) |
| |
- Todo:
- document
Definition at line 240 of file CleanUpTest.php.
References $tail, $x, UtfNormal\cleanUp(), UtfNormal\NFC(), UTF8_REPLACEMENT, and UTF8_SURROGATE_FIRST.
Referenced by testTripleBytes().
242 for ($first = 0xc0; $first < 0x100; $first++) {
243 for ($second = 0x80; $second < 0x100; $second++) {
244 #for( $third = 0x80; $third < 0x100; $third++ ) { 245 for ($third = 0x80; $third < 0x81; $third++) {
246 $char = $head . chr($first) . chr($second) . chr($third) .
$tail;
248 $x = sprintf(
"%02X,%02X,%02X", $first, $second, $third);
249 if ($first >= 0xe0 &&
253 if ($first == 0xe0 && $second < 0xa0) {
257 "Overlong triplet $x should be rejected" 259 } elseif ($first == 0xed &&
264 "Surrogate triplet $x should be rejected" 270 "Triplet $x should be intact" 273 } elseif ($first > 0xc1 && $first < 0xe0 && $second < 0xc0) {
277 "Valid 2-byte $x + broken tail" 279 } elseif ($second > 0xc1 && $second < 0xe0 && $third < 0xc0) {
283 "Broken head + valid 2-byte $x" 285 } elseif (($first > 0xfd || $second > 0xfd) &&
286 (($second > 0xbf && $third > 0xbf) ||
287 ($second < 0xc0 && $third < 0xc0) ||
290 # fe and ff are not legal head bytes -- expect three replacement chars 294 "Forbidden triplet $x should be rejected" 296 } elseif ($first > 0xc2 && $second < 0xc0 && $third < 0xc0) {
300 "Forbidden triplet $x should be rejected" 306 "Forbidden triplet $x should be rejected" static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
const UTF8_SURROGATE_FIRST
◆ setUp()
◆ tearDown()
CleanUpTest::tearDown |
( |
| ) |
|
◆ testAllBytes()
CleanUpTest::testAllBytes |
( |
| ) |
|
◆ testAscii()
CleanUpTest::testAscii |
( |
| ) |
|
- Todo:
- document
Definition at line 58 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
60 $text =
'This is plain ASCII text.';
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testBomRegression()
CleanUpTest::testBomRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 421 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
423 $text =
"\xef\xbf\xbe" . # U+FFFE, illegal
char 427 $expect =
"\xef\xbf\xbd" .
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testChunkRegression()
CleanUpTest::testChunkRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 315 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
317 # Check for regression against a chunking bug 318 $text =
"\x46\x55\xb8" .
325 $expect =
"\x46\x55\xef\xbf\xbd" .
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testDoubleBytes()
CleanUpTest::testDoubleBytes |
( |
| ) |
|
◆ testForbiddenRegression()
CleanUpTest::testForbiddenRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 438 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
440 $text =
"\xef\xbf\xbf"; # U+FFFF, illegal
char 441 $expect =
"\xef\xbf\xbd";
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testHangulRegression()
CleanUpTest::testHangulRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 449 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
451 $text =
"\xed\x9c\xaf" . # Hangul
char 452 "\xe1\x87\x81"; # followed by another
final jamo
453 $expect =
$text; # Should *not* change.
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testInterposeRegression()
CleanUpTest::testInterposeRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 340 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
356 $expect =
"\x4e\x30" .
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testLatin()
CleanUpTest::testLatin |
( |
| ) |
|
- Todo:
- document
Definition at line 76 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
78 $text =
"L'\xc3\xa9cole";
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testLatinNormal()
CleanUpTest::testLatinNormal |
( |
| ) |
|
- Todo:
- document
Definition at line 83 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
85 $text =
"L'e\xcc\x81cole";
86 $expect =
"L'\xc3\xa9cole";
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testNull()
CleanUpTest::testNull |
( |
| ) |
|
- Todo:
- document
Definition at line 65 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
67 $text =
"a \x00 null";
68 $expect =
"a \xef\xbf\xbd null";
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testOverlongRegression()
CleanUpTest::testOverlongRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 377 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
380 "\x1a" . # forbidden ascii
382 "\xc1\xa6" . # overlong sequence
384 "\x1c" . # forbidden ascii
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testSurrogateRegression()
CleanUpTest::testSurrogateRegression |
( |
| ) |
|
- Todo:
- document
Definition at line 404 of file CleanUpTest.php.
References $text, and UtfNormal\cleanUp().
406 $text =
"\xed\xb4\x96" . # surrogate 0xDD16
410 $expect =
"\xef\xbf\xbd" .
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
◆ testTripleBytes()
CleanUpTest::testTripleBytes |
( |
| ) |
|
◆ XtestAllChars()
CleanUpTest::XtestAllChars |
( |
| ) |
|
This test is very expensive!
- Todo:
- document
Definition at line 94 of file CleanUpTest.php.
References $i, $utfCanonicalComp, $utfCanonicalDecomp, $x, UtfNormal\cleanUp(), codepointToUtf8(), UtfNormal\NFC(), UNICODE_MAX, UNICODE_SURROGATE_FIRST, UNICODE_SURROGATE_LAST, and UTF8_REPLACEMENT.
101 $x = sprintf(
"%04X",
$i);
102 if (
$i % 0x1000 == 0) {
110 (
$i > 0xffff &&
$i <= UNICODE_MAX)) {
111 if (isset($utfCanonicalComp[$char]) || isset($utfCanonicalDecomp[$char])) {
116 "U+$x should be decomposed" 122 "U+$x should be intact" 126 $this->assertEquals(bin2hex($rep), bin2hex($clean),
$x);
static cleanUp($string)
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C...
const UNICODE_SURROGATE_LAST
global $utfCanonicalDecomp
const UNICODE_SURROGATE_FIRST
codepointToUtf8($codepoint)
Return UTF-8 sequence for a given Unicode code point.
The documentation for this class was generated from the following file: