ILIAS  release_5-1 Revision 5.0.0-5477-g43f3e3fab5
UtfNormal

Additional tests for UtfNormal::cleanUp() function, inclusion regression checks for known problems. More...

Additional tests for UtfNormal::cleanUp() function, inclusion regression checks for known problems.

Some of these functions are adapted from places in MediaWiki.

Implements the conformance test at: http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt.

This script generates UniNormalData.inc from the Unicode Character Database and supplementary files.

Approximate benchmark for some basic operations.

Unicode normalization routines for working with UTF-8 strings.

Runs the UTF-8 decoder test at: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt.

Test feeds random 16-byte strings to both the pure PHP and ICU-based UtfNormal::cleanUp() code paths, and checks to see if there's a difference.

Requires PHPUnit.

Will run forever until it finds one or you kill it.

private

private

Currently assumes that input strings are valid UTF-8!

Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly deterimine is already normalized.

All functions can be called static.

See description of forms at http://www.unicode.org/reports/tr15/

Should probably merge them for consistency.