Inheritance diagram for CleanUpTest:

Collaboration diagram for CleanUpTest:

Public Member Functions
	setUp ()

	tearDown ()

	testAscii ()

	testNull ()

	testLatin ()

	testLatinNormal ()

	XtestAllChars ()
	This test is very expensive! More...

	testAllBytes ()

	doTestBytes ( $head, $tail)

	testDoubleBytes ()

	doTestDoubleBytes ( $head, $tail)

	testTripleBytes ()

	doTestTripleBytes ( $head, $tail)

	testChunkRegression ()

	testInterposeRegression ()

	testOverlongRegression ()

	testSurrogateRegression ()

	testBomRegression ()

	testForbiddenRegression ()

	testHangulRegression ()

Detailed Description

Definition at line 45 of file CleanUpTest.php.

Member Function Documentation

◆ doTestBytes()

CleanUpTest::doTestBytes	(	$head,
		$tail
	)

Todo:: document

Definition at line 127 of file CleanUpTest.php.

References $x, UtfNormal\cleanUp(), and UTF8_REPLACEMENT.

Referenced by testAllBytes().

                                              {
                 for( $i = 0x0; $i < 256; $i++ ) {
                         $char = $head . chr( $i ) . $tail;
                         $clean = UtfNormal::cleanUp( $char );
                         $x = sprintf( "%02X", $i );
                         if( $i == 0x0009 ||
                             $i == 0x000a ||
                             $i == 0x000d ||
                             ($i > 0x001f && $i < 0x80) ) {
                                 $this->assertEquals(
                                         bin2hex( $char ),
                                         bin2hex( $clean ),
                                         "ASCII byte $x should be intact" );
                                 if( $char != $clean ) return;
                         } else {
                                 $norm = $head . UTF8_REPLACEMENT . $tail;
                                 $this->assertEquals(
                                         bin2hex( $norm ),
                                         bin2hex( $clean ),
                                         "Forbidden byte $x should be rejected" );
                                 if( $norm != $clean ) return;
                         }
                 }
         }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ doTestDoubleBytes()

CleanUpTest::doTestDoubleBytes	(	$head,
		$tail
	)

Todo:: document

Definition at line 163 of file CleanUpTest.php.

References $x, UtfNormal\cleanUp(), UtfNormal\NFC(), and UTF8_REPLACEMENT.

Referenced by testDoubleBytes().

                                                    {
                 for( $first = 0xc0; $first < 0x100; $first++ ) {
                         for( $second = 0x80; $second < 0x100; $second++ ) {
                                 $char = $head . chr( $first ) . chr( $second ) . $tail;
                                 $clean = UtfNormal::cleanUp( $char );
                                 $x = sprintf( "%02X,%02X", $first, $second );
                                 if( $first > 0xc1 &&
                                     $first < 0xe0 &&
                                     $second < 0xc0 ) {
                                     $norm = UtfNormal::NFC( $char );
                                         $this->assertEquals(
                                                 bin2hex( $norm ),
                                                 bin2hex( $clean ),
                                                 "Pair $x should be intact" );
                                     if( $norm != $clean ) return;
                                 } elseif( $first > 0xfd || $second > 0xbf ) {
                                         # fe and ff are not legal head bytes -- expect two replacement chars
                                         $norm = $head . UTF8_REPLACEMENT . UTF8_REPLACEMENT . $tail;
                                         $this->assertEquals(
                                                 bin2hex( $norm ),
                                                 bin2hex( $clean ),
                                                 "Forbidden pair $x should be rejected" );
                                         if( $norm != $clean ) return;
                                 } else {
                                         $norm = $head . UTF8_REPLACEMENT . $tail;
                                         $this->assertEquals(
                                                 bin2hex( $norm ),
                                                 bin2hex( $clean ),
                                                 "Forbidden pair $x should be rejected" );
                                         if( $norm != $clean ) return;
                                 }
                         }
                 }
         }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ doTestTripleBytes()

CleanUpTest::doTestTripleBytes	(	$head,
		$tail
	)

Todo:: document

Definition at line 207 of file CleanUpTest.php.

References $x, UtfNormal\cleanUp(), UtfNormal\NFC(), UTF8_REPLACEMENT, and UTF8_SURROGATE_FIRST.

Referenced by testTripleBytes().

                                                    {
                 for( $first = 0xc0; $first < 0x100; $first++ ) {
                         for( $second = 0x80; $second < 0x100; $second++ ) {
                                 #for( $third = 0x80; $third < 0x100; $third++ ) {
                                 for( $third = 0x80; $third < 0x81; $third++ ) {
                                         $char = $head . chr( $first ) . chr( $second ) . chr( $third ) . $tail;
                                         $clean = UtfNormal::cleanUp( $char );
                                         $x = sprintf( "%02X,%02X,%02X", $first, $second, $third );
                                         if( $first >= 0xe0 &&
                                                 $first < 0xf0 &&
                                                 $second < 0xc0 &&
                                                 $third < 0xc0 ) {
                                                 if( $first == 0xe0 && $second < 0xa0 ) {
                                                         $this->assertEquals(
                                                                 bin2hex( $head . UTF8_REPLACEMENT . $tail ),
                                                                 bin2hex( $clean ),
                                                                 "Overlong triplet $x should be rejected" );
                                                 } elseif( $first == 0xed &&
                                                         ( chr( $first ) . chr( $second ) . chr( $third ))  >= UTF8_SURROGATE_FIRST ) {
                                                         $this->assertEquals(
                                                                 bin2hex( $head . UTF8_REPLACEMENT . $tail ),
                                                                 bin2hex( $clean ),
                                                                 "Surrogate triplet $x should be rejected" );
                                                 } else {
                                                         $this->assertEquals(
                                                                 bin2hex( UtfNormal::NFC( $char ) ),
                                                                 bin2hex( $clean ),
                                                                 "Triplet $x should be intact" );
                                                 }
                                         } elseif( $first > 0xc1 && $first < 0xe0 && $second < 0xc0 ) {
                                                 $this->assertEquals(
                                                         bin2hex( UtfNormal::NFC( $head . chr( $first ) . chr( $second ) ) . UTF8_REPLACEMENT . $tail ),
                                                         bin2hex( $clean ),
                                                         "Valid 2-byte $x + broken tail" );
                                         } elseif( $second > 0xc1 && $second < 0xe0 && $third < 0xc0 ) {
                                                 $this->assertEquals(
                                                         bin2hex( $head . UTF8_REPLACEMENT . UtfNormal::NFC( chr( $second ) . chr( $third ) . $tail ) ),
                                                         bin2hex( $clean ),
                                                         "Broken head + valid 2-byte $x" );
                                         } elseif( ( $first > 0xfd || $second > 0xfd ) &&
                                                     ( ( $second > 0xbf && $third > 0xbf ) ||
                                                       ( $second < 0xc0 && $third < 0xc0 ) ||
                                                       ( $second > 0xfd ) ||
                                                       ( $third > 0xfd ) ) ) {
                                                 # fe and ff are not legal head bytes -- expect three replacement chars
                                                 $this->assertEquals(
                                                         bin2hex( $head . UTF8_REPLACEMENT . UTF8_REPLACEMENT . UTF8_REPLACEMENT . $tail ),
                                                         bin2hex( $clean ),
                                                         "Forbidden triplet $x should be rejected" );
                                         } elseif( $first > 0xc2 && $second < 0xc0 && $third < 0xc0 ) {
                                                 $this->assertEquals(
                                                         bin2hex( $head . UTF8_REPLACEMENT . $tail ),
                                                         bin2hex( $clean ),
                                                         "Forbidden triplet $x should be rejected" );
                                         } else {
                                                 $this->assertEquals(
                                                         bin2hex( $head . UTF8_REPLACEMENT . UTF8_REPLACEMENT . $tail ),
                                                         bin2hex( $clean ),
                                                         "Forbidden triplet $x should be rejected" );
                                         }
                                 }
                         }
                 }
         }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ setUp()

CleanUpTest::setUp ( )

Todo:: document

Definition at line 47 of file CleanUpTest.php.

47 {

48 }

◆ tearDown()

CleanUpTest::tearDown ( )

Todo:: document

Definition at line 51 of file CleanUpTest.php.

51 {

52 }

◆ testAllBytes()

CleanUpTest::testAllBytes ( )

Todo:: document

Definition at line 119 of file CleanUpTest.php.

References doTestBytes().

                                 {
                 $this->doTestBytes( '', '' );
                 $this->doTestBytes( 'x', '' );
                 $this->doTestBytes( '', 'x' );
                 $this->doTestBytes( 'x', 'x' );
         }

Here is the call graph for this function:

◆ testAscii()

CleanUpTest::testAscii ( )

Todo:: document

Definition at line 55 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                              {
                 $text = 'This is plain ASCII text.';
                 $this->assertEquals( $text, UtfNormal::cleanUp( $text ) );
         }

Here is the call graph for this function:

◆ testBomRegression()

CleanUpTest::testBomRegression ( )

Todo:: document

Definition at line 371 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                      {
                 $text   = "\xef\xbf\xbe" . # U+FFFE, illegal char
                           "\xb2" . # bad tail
                           "\xef" . # bad head
                           "\x59";
                 $expect = "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\x59";
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testChunkRegression()

CleanUpTest::testChunkRegression ( )

Todo:: document

Definition at line 273 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                        {
                 # Check for regression against a chunking bug
                 $text   = "\x46\x55\xb8" .
                           "\xdc\x96" .
                           "\xee" .
                           "\xe7" .
                           "\x44" .
                           "\xaa" .
                           "\x2f\x25";
                 $expect = "\x46\x55\xef\xbf\xbd" .
                           "\xdc\x96" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\x44" .
                           "\xef\xbf\xbd" .
                           "\x2f\x25";
 
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testDoubleBytes()

CleanUpTest::testDoubleBytes ( )

Todo:: document

Definition at line 153 of file CleanUpTest.php.

References doTestDoubleBytes().

                                    {
                 $this->doTestDoubleBytes( '', '' );
                 $this->doTestDoubleBytes( 'x', '' );
                 $this->doTestDoubleBytes( '', 'x' );
                 $this->doTestDoubleBytes( 'x', 'x' );
         }

Here is the call graph for this function:

◆ testForbiddenRegression()

CleanUpTest::testForbiddenRegression ( )

Todo:: document

Definition at line 386 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                            {
                 $text   = "\xef\xbf\xbf"; # U+FFFF, illegal char
                 $expect = "\xef\xbf\xbd";
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testHangulRegression()

CleanUpTest::testHangulRegression ( )

Todo:: document

Definition at line 395 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                         {
                 $text = "\xed\x9c\xaf" . # Hangul char
                                 "\xe1\x87\x81";  # followed by another final jamo
                 $expect = $text;         # Should *not* change.
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testInterposeRegression()

CleanUpTest::testInterposeRegression ( )

Todo:: document

Definition at line 296 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                            {
                 $text   = "\x4e\x30" .
                           "\xb1" .              # bad tail
                           "\x3a" .
                           "\x92" .              # bad tail
                           "\x62\x3a" .
                           "\x84" .              # bad tail
                           "\x43" .
                           "\xc6" .              # bad head
                           "\x3f" .
                           "\x92" .              # bad tail
                           "\xad" .              # bad tail
                           "\x7d" .
                           "\xd9\x95";
 
                 $expect = "\x4e\x30" .
                           "\xef\xbf\xbd" .
                           "\x3a" .
                           "\xef\xbf\xbd" .
                           "\x62\x3a" .
                           "\xef\xbf\xbd" .
                           "\x43" .
                           "\xef\xbf\xbd" .
                           "\x3f" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\x7d" .
                           "\xd9\x95";
 
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testLatin()

CleanUpTest::testLatin ( )

Todo:: document

Definition at line 70 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                              {
                 $text = "L'\xc3\xa9cole";
                 $this->assertEquals( $text, UtfNormal::cleanUp( $text ) );
         }

Here is the call graph for this function:

◆ testLatinNormal()

CleanUpTest::testLatinNormal ( )

Todo:: document

Definition at line 76 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                    {
                 $text = "L'e\xcc\x81cole";
                 $expect = "L'\xc3\xa9cole";
                 $this->assertEquals( $expect, UtfNormal::cleanUp( $text ) );
         }

Here is the call graph for this function:

◆ testNull()

CleanUpTest::testNull ( )

Todo:: document

Definition at line 61 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                             {
                 $text = "a \x00 null";
                 $expect = "a \xef\xbf\xbd null";
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testOverlongRegression()

CleanUpTest::testOverlongRegression ( )

Todo:: document

Definition at line 331 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                           {
                 $text   = "\x67" .
                           "\x1a" . # forbidden ascii
                           "\xea" . # bad head
                           "\xc1\xa6" . # overlong sequence
                           "\xad" . # bad tail
                           "\x1c" . # forbidden ascii
                           "\xb0" . # bad tail
                           "\x3c" .
                           "\x9e";  # bad tail
                 $expect = "\x67" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\x3c" .
                           "\xef\xbf\xbd";
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testSurrogateRegression()

CleanUpTest::testSurrogateRegression ( )

Todo:: document

Definition at line 356 of file CleanUpTest.php.

References $text, and UtfNormal\cleanUp().

                                            {
                 $text   = "\xed\xb4\x96" . # surrogate 0xDD16
                           "\x83" . # bad tail
                           "\xb4" . # bad tail
                           "\xac";  # bad head
                 $expect = "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd" .
                           "\xef\xbf\xbd";
                 $this->assertEquals(
                         bin2hex( $expect ),
                         bin2hex( UtfNormal::cleanUp( $text ) ) );
         }

Here is the call graph for this function:

◆ testTripleBytes()

CleanUpTest::testTripleBytes ( )

Todo:: document

Definition at line 199 of file CleanUpTest.php.

References doTestTripleBytes().

                                    {
                 $this->doTestTripleBytes( '', '' );
                 $this->doTestTripleBytes( 'x', '' );
                 $this->doTestTripleBytes( '', 'x' );
                 $this->doTestTripleBytes( 'x', 'x' );
         }

Here is the call graph for this function:

◆ XtestAllChars()

CleanUpTest::XtestAllChars ( )

This test is very expensive!

Todo:: document

Definition at line 86 of file CleanUpTest.php.

References $utfCanonicalComp, $utfCanonicalDecomp, $x, UtfNormal\cleanUp(), codepointToUtf8(), UtfNormal\NFC(), UNICODE_MAX, UNICODE_SURROGATE_FIRST, UNICODE_SURROGATE_LAST, and UTF8_REPLACEMENT.

                                  {
                 $rep = UTF8_REPLACEMENT;
                 global $utfCanonicalComp, $utfCanonicalDecomp;
                 for( $i = 0x0; $i < UNICODE_MAX; $i++ ) {
                         $char = codepointToUtf8( $i );
                         $clean = UtfNormal::cleanUp( $char );
                         $x = sprintf( "%04X", $i );
                         if( $i % 0x1000 == 0 ) echo "U+$x\n";
                         if( $i == 0x0009 ||
                             $i == 0x000a ||
                             $i == 0x000d ||
                             ($i > 0x001f && $i < UNICODE_SURROGATE_FIRST) ||
                             ($i > UNICODE_SURROGATE_LAST && $i < 0xfffe ) ||
                             ($i > 0xffff && $i <= UNICODE_MAX ) ) {
                                 if( isset( $utfCanonicalComp[$char] ) || isset( $utfCanonicalDecomp[$char] ) ) {
                                     $comp = UtfNormal::NFC( $char );
                                         $this->assertEquals(
                                                 bin2hex( $comp ),
                                                 bin2hex( $clean ),
                                                 "U+$x should be decomposed" );
                                 } else {
                                         $this->assertEquals(
                                                 bin2hex( $char ),
                                                 bin2hex( $clean ),
                                                 "U+$x should be intact" );
                                 }
                         } else {
                                 $this->assertEquals( bin2hex( $rep ), bin2hex( $clean ), $x );
                         }
                 }
         }

Here is the call graph for this function:

The documentation for this class was generated from the following file:

include/Unicode/CleanUpTest.php

Public Member Functions

Detailed Description

Member Function Documentation

◆ doTestBytes()

◆ doTestDoubleBytes()

◆ doTestTripleBytes()

◆ setUp()

◆ tearDown()

◆ testAllBytes()

◆ testAscii()

◆ testBomRegression()

◆ testChunkRegression()

◆ testDoubleBytes()

◆ testForbiddenRegression()

◆ testHangulRegression()

◆ testInterposeRegression()

◆ testLatin()

◆ testLatinNormal()

◆ testNull()

◆ testOverlongRegression()

◆ testSurrogateRegression()

◆ testTripleBytes()

◆ XtestAllChars()