ILIAS  release_4-3 Revision
 All Data Structures Namespaces Files Functions Variables Groups Pages
UtfNormalUtil.php File Reference

Go to the source code of this file.

Namespaces

namespace  UtfNormal
 Unicode normalization routines for working with UTF-8 strings.

Functions

 codepointToUtf8 ($codepoint)
 Return UTF-8 sequence for a given Unicode code point.
 hexSequenceToUtf8 ($sequence)
 Take a series of space-separated hexadecimal numbers representing Unicode code points and return a UTF-8 string composed of those characters.
 utf8ToHexSequence ($str)
 Take a UTF-8 string and return a space-separated series of hex numbers representing Unicode code points.
 utf8ToCodepoint ($char)
 Determine the Unicode codepoint of a single-character UTF-8 sequence.
 escapeSingleString ($string)
 Escape a string for inclusion in a PHP single-quoted string literal.

Function Documentation

codepointToUtf8 (   $codepoint)

Return UTF-8 sequence for a given Unicode code point.

May die if fed out of range data.

Parameters
int$codepoint
Returns
string public

Definition at line 38 of file UtfNormalUtil.php.

Referenced by hexSequenceToUtf8(), and CleanUpTest\XtestAllChars().

{
if($codepoint < 0x80) return chr($codepoint);
if($codepoint < 0x800) return chr($codepoint >> 6 & 0x3f | 0xc0) .
chr($codepoint & 0x3f | 0x80);
if($codepoint < 0x10000) return chr($codepoint >> 12 & 0x0f | 0xe0) .
chr($codepoint >> 6 & 0x3f | 0x80) .
chr($codepoint & 0x3f | 0x80);
if($codepoint < 0x110000) return chr($codepoint >> 18 & 0x07 | 0xf0) .
chr($codepoint >> 12 & 0x3f | 0x80) .
chr($codepoint >> 6 & 0x3f | 0x80) .
chr($codepoint & 0x3f | 0x80);
die("Asked for code outside of range ($codepoint)\n");
}

+ Here is the caller graph for this function:

escapeSingleString (   $string)

Escape a string for inclusion in a PHP single-quoted string literal.

Parameters
string$string
Returns
string public

Definition at line 133 of file UtfNormalUtil.php.

{
return strtr( $string,
array(
'\\' => '\\\\',
'\'' => '\\\''
));
}
hexSequenceToUtf8 (   $sequence)

Take a series of space-separated hexadecimal numbers representing Unicode code points and return a UTF-8 string composed of those characters.

Used by UTF-8 data generation and testing routines.

Parameters
string$sequence
Returns
string private

Definition at line 62 of file UtfNormalUtil.php.

References $n, and codepointToUtf8().

{
$utf = '';
foreach( explode( ' ', $sequence ) as $hex ) {
$n = hexdec( $hex );
$utf .= codepointToUtf8( $n );
}
return $utf;
}

+ Here is the call graph for this function:

utf8ToCodepoint (   $char)

Determine the Unicode codepoint of a single-character UTF-8 sequence.

Does not check for invalid input data.

Parameters
string$char
Returns
int public

Definition at line 93 of file UtfNormalUtil.php.

{
# Find the length
$z = ord( $char{0} );
if ( $z & 0x80 ) {
$length = 0;
while ( $z & 0x80 ) {
$length++;
$z <<= 1;
}
} else {
$length = 1;
}
if ( $length != strlen( $char ) ) {
return false;
}
if ( $length == 1 ) {
return ord( $char );
}
# Mask off the length-determining bits and shift back to the original location
$z &= 0xff;
$z >>= $length;
# Add in the free bits from subsequent bytes
for ( $i=1; $i<$length; $i++ ) {
$z <<= 6;
$z |= ord( $char{$i} ) & 0x3f;
}
return $z;
}
utf8ToHexSequence (   $str)

Take a UTF-8 string and return a space-separated series of hex numbers representing Unicode code points.

For debugging.

Parameters
string$str
Returns
string private

Definition at line 79 of file UtfNormalUtil.php.

{
return rtrim( preg_replace( '/(.)/uSe',
'sprintf("%04x ", utf8ToCodepoint("$1"))',
$str ) );
}