Inheritance diagram for CleanUpTest:

Collaboration diagram for CleanUpTest:

Public Member Functions
	setUp ()

	tearDown ()

	testAscii ()

	testNull ()

	testLatin ()

	testLatinNormal ()

	XtestAllChars ()
	This test is very expensive! More...

	testAllBytes ()

	doTestBytes ($head, $tail)

	testDoubleBytes ()

	doTestDoubleBytes ($head, $tail)

	testTripleBytes ()

	doTestTripleBytes ($head, $tail)

	testChunkRegression ()

	testInterposeRegression ()

	testOverlongRegression ()

	testSurrogateRegression ()

	testBomRegression ()

	testForbiddenRegression ()

	testHangulRegression ()

Detailed Description

Definition at line 45 of file CleanUpTest.php.

Member Function Documentation

◆ doTestBytes()

CleanUpTest::doTestBytes	(	$head,
		$tail
	)

Todo:: document

Definition at line 141 of file CleanUpTest.php.

References $i, $tail, $x, UtfNormal\cleanUp(), and UTF8_REPLACEMENT.

Referenced by testAllBytes().

     {
         for ($i = 0x0; $i < 256; $i++) {
             $char = $head . chr($i) . $tail;
             $clean = UtfNormal::cleanUp($char);
             $x = sprintf("%02X", $i);
             if ($i == 0x0009 ||
                 $i == 0x000a ||
                 $i == 0x000d ||
                 ($i > 0x001f && $i < 0x80)) {
                 $this->assertEquals(
                     bin2hex($char),
                     bin2hex($clean),
                     "ASCII byte $x should be intact"
                 );
                 if ($char != $clean) {
                     return;
                 }
             } else {
                 $norm = $head . UTF8_REPLACEMENT . $tail;
                 $this->assertEquals(
                     bin2hex($norm),
                     bin2hex($clean),
                     "Forbidden byte $x should be rejected"
                 );
                 if ($norm != $clean) {
                     return;
                 }
             }
         }
     }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ doTestDoubleBytes()

CleanUpTest::doTestDoubleBytes	(	$head,
		$tail
	)

Todo:: document

Definition at line 185 of file CleanUpTest.php.

References $tail, $x, UtfNormal\cleanUp(), UtfNormal\NFC(), and UTF8_REPLACEMENT.

Referenced by testDoubleBytes().

     {
         for ($first = 0xc0; $first < 0x100; $first++) {
             for ($second = 0x80; $second < 0x100; $second++) {
                 $char = $head . chr($first) . chr($second) . $tail;
                 $clean = UtfNormal::cleanUp($char);
                 $x = sprintf("%02X,%02X", $first, $second);
                 if ($first > 0xc1 &&
                     $first < 0xe0 &&
                     $second < 0xc0) {
                     $norm = UtfNormal::NFC($char);
                     $this->assertEquals(
                         bin2hex($norm),
                         bin2hex($clean),
                         "Pair $x should be intact"
                     );
                     if ($norm != $clean) {
                         return;
                     }
                 } elseif ($first > 0xfd || $second > 0xbf) {
                     # fe and ff are not legal head bytes -- expect two replacement chars
                     $norm = $head . UTF8_REPLACEMENT . UTF8_REPLACEMENT . $tail;
                     $this->assertEquals(
                         bin2hex($norm),
                         bin2hex($clean),
                         "Forbidden pair $x should be rejected"
                     );
                     if ($norm != $clean) {
                         return;
                     }
                 } else {
                     $norm = $head . UTF8_REPLACEMENT . $tail;
                     $this->assertEquals(
                         bin2hex($norm),
                         bin2hex($clean),
                         "Forbidden pair $x should be rejected"
                     );
                     if ($norm != $clean) {
                         return;
                     }
                 }
             }
         }
     }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ doTestTripleBytes()

CleanUpTest::doTestTripleBytes	(	$head,
		$tail
	)

Todo:: document

Definition at line 240 of file CleanUpTest.php.

References $tail, $x, UtfNormal\cleanUp(), UtfNormal\NFC(), UTF8_REPLACEMENT, and UTF8_SURROGATE_FIRST.

Referenced by testTripleBytes().

     {
         for ($first = 0xc0; $first < 0x100; $first++) {
             for ($second = 0x80; $second < 0x100; $second++) {
                 #for( $third = 0x80; $third < 0x100; $third++ ) {
                 for ($third = 0x80; $third < 0x81; $third++) {
                     $char = $head . chr($first) . chr($second) . chr($third) . $tail;
                     $clean = UtfNormal::cleanUp($char);
                     $x = sprintf("%02X,%02X,%02X", $first, $second, $third);
                     if ($first >= 0xe0 &&
                         $first < 0xf0 &&
                         $second < 0xc0 &&
                         $third < 0xc0) {
                         if ($first == 0xe0 && $second < 0xa0) {
                             $this->assertEquals(
                                 bin2hex($head . UTF8_REPLACEMENT . $tail),
                                 bin2hex($clean),
                                 "Overlong triplet $x should be rejected"
                             );
                         } elseif ($first == 0xed &&
                             (chr($first) . chr($second) . chr($third)) >= UTF8_SURROGATE_FIRST) {
                             $this->assertEquals(
                                 bin2hex($head . UTF8_REPLACEMENT . $tail),
                                 bin2hex($clean),
                                 "Surrogate triplet $x should be rejected"
                             );
                         } else {
                             $this->assertEquals(
                                 bin2hex(UtfNormal::NFC($char)),
                                 bin2hex($clean),
                                 "Triplet $x should be intact"
                             );
                         }
                     } elseif ($first > 0xc1 && $first < 0xe0 && $second < 0xc0) {
                         $this->assertEquals(
                             bin2hex(UtfNormal::NFC($head . chr($first) . chr($second)) . UTF8_REPLACEMENT . $tail),
                             bin2hex($clean),
                             "Valid 2-byte $x + broken tail"
                         );
                     } elseif ($second > 0xc1 && $second < 0xe0 && $third < 0xc0) {
                         $this->assertEquals(
                             bin2hex($head . UTF8_REPLACEMENT . UtfNormal::NFC(chr($second) . chr($third) . $tail)),
                             bin2hex($clean),
                             "Broken head + valid 2-byte $x"
                         );
                     } elseif (($first > 0xfd || $second > 0xfd) &&
                                 (($second > 0xbf && $third > 0xbf) ||
                                   ($second < 0xc0 && $third < 0xc0) ||
                                   ($second > 0xfd) ||
                                   ($third > 0xfd))) {
                         # fe and ff are not legal head bytes -- expect three replacement chars
                         $this->assertEquals(
                             bin2hex($head . UTF8_REPLACEMENT . UTF8_REPLACEMENT . UTF8_REPLACEMENT . $tail),
                             bin2hex($clean),
                             "Forbidden triplet $x should be rejected"
                         );
                     } elseif ($first > 0xc2 && $second < 0xc0 && $third < 0xc0) {
                         $this->assertEquals(
                             bin2hex($head . UTF8_REPLACEMENT . $tail),
                             bin2hex($clean),
                             "Forbidden triplet $x should be rejected"
                         );
                     } else {
                         $this->assertEquals(
                             bin2hex($head . UTF8_REPLACEMENT . UTF8_REPLACEMENT . $tail),
                             bin2hex($clean),
                             "Forbidden triplet $x should be rejected"
                         );
                     }
                 }
             }
         }
     }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ setUp()

CleanUpTest::setUp ( )

Todo:: document

Definition at line 48 of file CleanUpTest.php.

49 {

50 }

◆ tearDown()

CleanUpTest::tearDown ( )

Todo:: document

Definition at line 53 of file CleanUpTest.php.

54 {

55 }

◆ testAllBytes()

CleanUpTest::testAllBytes ( )

Todo:: document

Definition at line 132 of file CleanUpTest.php.

References doTestBytes().

     {
         $this->doTestBytes('', '');
         $this->doTestBytes('x', '');
         $this->doTestBytes('', 'x');
         $this->doTestBytes('x', 'x');
     }

Here is the call graph for this function:

◆ testAscii()

CleanUpTest::testAscii ( )

Todo:: document

Definition at line 58 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = 'This is plain ASCII text.';
         $this->assertEquals($text, UtfNormal::cleanUp($text));
     }

Here is the call graph for this function:

◆ testBomRegression()

CleanUpTest::testBomRegression ( )

Todo:: document

Definition at line 421 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "\xef\xbf\xbe" . # U+FFFE, illegal char
                   "\xb2" . # bad tail
                   "\xef" . # bad head
                   "\x59";
         $expect = "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\x59";
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testChunkRegression()

CleanUpTest::testChunkRegression ( )

Todo:: document

Definition at line 315 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         # Check for regression against a chunking bug
         $text = "\x46\x55\xb8" .
                   "\xdc\x96" .
                   "\xee" .
                   "\xe7" .
                   "\x44" .
                   "\xaa" .
                   "\x2f\x25";
         $expect = "\x46\x55\xef\xbf\xbd" .
                   "\xdc\x96" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\x44" .
                   "\xef\xbf\xbd" .
                   "\x2f\x25";
 
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testDoubleBytes()

CleanUpTest::testDoubleBytes ( )

Todo:: document

Definition at line 174 of file CleanUpTest.php.

References doTestDoubleBytes().

     {
         $this->doTestDoubleBytes('', '');
         $this->doTestDoubleBytes('x', '');
         $this->doTestDoubleBytes('', 'x');
         $this->doTestDoubleBytes('x', 'x');
     }

Here is the call graph for this function:

◆ testForbiddenRegression()

CleanUpTest::testForbiddenRegression ( )

Todo:: document

Definition at line 438 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "\xef\xbf\xbf"; # U+FFFF, illegal char
         $expect = "\xef\xbf\xbd";
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testHangulRegression()

CleanUpTest::testHangulRegression ( )

Todo:: document

Definition at line 449 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "\xed\x9c\xaf" . # Hangul char
                 "\xe1\x87\x81";  # followed by another final jamo
         $expect = $text;         # Should *not* change.
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testInterposeRegression()

CleanUpTest::testInterposeRegression ( )

Todo:: document

Definition at line 340 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "\x4e\x30" .
                   "\xb1" .              # bad tail
                   "\x3a" .
                   "\x92" .              # bad tail
                   "\x62\x3a" .
                   "\x84" .              # bad tail
                   "\x43" .
                   "\xc6" .              # bad head
                   "\x3f" .
                   "\x92" .              # bad tail
                   "\xad" .              # bad tail
                   "\x7d" .
                   "\xd9\x95";
 
         $expect = "\x4e\x30" .
                   "\xef\xbf\xbd" .
                   "\x3a" .
                   "\xef\xbf\xbd" .
                   "\x62\x3a" .
                   "\xef\xbf\xbd" .
                   "\x43" .
                   "\xef\xbf\xbd" .
                   "\x3f" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\x7d" .
                   "\xd9\x95";
 
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testLatin()

CleanUpTest::testLatin ( )

Todo:: document

Definition at line 76 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "L'\xc3\xa9cole";
         $this->assertEquals($text, UtfNormal::cleanUp($text));
     }

Here is the call graph for this function:

◆ testLatinNormal()

CleanUpTest::testLatinNormal ( )

Todo:: document

Definition at line 83 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "L'e\xcc\x81cole";
         $expect = "L'\xc3\xa9cole";
         $this->assertEquals($expect, UtfNormal::cleanUp($text));
     }

Here is the call graph for this function:

◆ testNull()

CleanUpTest::testNull ( )

Todo:: document

Definition at line 65 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "a \x00 null";
         $expect = "a \xef\xbf\xbd null";
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testOverlongRegression()

CleanUpTest::testOverlongRegression ( )

Todo:: document

Definition at line 377 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "\x67" .
                   "\x1a" . # forbidden ascii
                   "\xea" . # bad head
                   "\xc1\xa6" . # overlong sequence
                   "\xad" . # bad tail
                   "\x1c" . # forbidden ascii
                   "\xb0" . # bad tail
                   "\x3c" .
                   "\x9e";  # bad tail
         $expect = "\x67" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\x3c" .
                   "\xef\xbf\xbd";
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testSurrogateRegression()

CleanUpTest::testSurrogateRegression ( )

Todo:: document

Definition at line 404 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

     {
         $text = "\xed\xb4\x96" . # surrogate 0xDD16
                   "\x83" . # bad tail
                   "\xb4" . # bad tail
                   "\xac";  # bad head
         $expect = "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd" .
                   "\xef\xbf\xbd";
         $this->assertEquals(
             bin2hex($expect),
             bin2hex(UtfNormal::cleanUp($text))
         );
     }

Here is the call graph for this function:

◆ testTripleBytes()

CleanUpTest::testTripleBytes ( )

Todo:: document

Definition at line 231 of file CleanUpTest.php.

References doTestTripleBytes().

     {
         $this->doTestTripleBytes('', '');
         $this->doTestTripleBytes('x', '');
         $this->doTestTripleBytes('', 'x');
         $this->doTestTripleBytes('x', 'x');
     }

Here is the call graph for this function:

◆ XtestAllChars()

CleanUpTest::XtestAllChars ( )

This test is very expensive!

Todo:: document

Definition at line 94 of file CleanUpTest.php.

References $i, $utfCanonicalComp, $utfCanonicalDecomp, $x, UtfNormal\cleanUp(), codepointToUtf8(), UtfNormal\NFC(), UNICODE_MAX, UNICODE_SURROGATE_FIRST, UNICODE_SURROGATE_LAST, and UTF8_REPLACEMENT.

     {
         $rep = UTF8_REPLACEMENT;
         global $utfCanonicalComp, $utfCanonicalDecomp;
         for ($i = 0x0; $i < UNICODE_MAX; $i++) {
             $char = codepointToUtf8($i);
             $clean = UtfNormal::cleanUp($char);
             $x = sprintf("%04X", $i);
             if ($i % 0x1000 == 0) {
                 echo "U+$x\n";
             }
             if ($i == 0x0009 ||
                 $i == 0x000a ||
                 $i == 0x000d ||
                 ($i > 0x001f && $i < UNICODE_SURROGATE_FIRST) ||
                 ($i > UNICODE_SURROGATE_LAST && $i < 0xfffe) ||
                 ($i > 0xffff && $i <= UNICODE_MAX)) {
                 if (isset($utfCanonicalComp[$char]) || isset($utfCanonicalDecomp[$char])) {
                     $comp = UtfNormal::NFC($char);
                     $this->assertEquals(
                         bin2hex($comp),
                         bin2hex($clean),
                         "U+$x should be decomposed"
                     );
                 } else {
                     $this->assertEquals(
                         bin2hex($char),
                         bin2hex($clean),
                         "U+$x should be intact"
                     );
                 }
             } else {
                 $this->assertEquals(bin2hex($rep), bin2hex($clean), $x);
             }
         }
     }

Here is the call graph for this function:

The documentation for this class was generated from the following file:

include/Unicode/CleanUpTest.php

Public Member Functions

Detailed Description

Member Function Documentation

◆ doTestBytes()

◆ doTestDoubleBytes()

◆ doTestTripleBytes()

◆ setUp()

◆ tearDown()

◆ testAllBytes()

◆ testAscii()

◆ testBomRegression()

◆ testChunkRegression()

◆ testDoubleBytes()

◆ testForbiddenRegression()

◆ testHangulRegression()

◆ testInterposeRegression()

◆ testLatin()

◆ testLatinNormal()

◆ testNull()

◆ testOverlongRegression()

◆ testSurrogateRegression()

◆ testTripleBytes()

◆ XtestAllChars()