Healthy Code


Check for ClassID in new class hierarchies

Check that there are no "poor man's RTTI" methods in new class hierarchies.

After ICU 50: New class hierarchies should not declare getDynamicClassID() at all. UObject has a dummy implementation for it.

ICU 4.6..50: New class hierarchies used UOBJECT_DEFINE_NO_RTTI_IMPLEMENTATION. See Normalizer2 for an example declaration and implementation that satisfies the virtual-function override without adding new ClassID support.

We do need to keep and add "poor man's RTTI" in old classes, and in new classes extending existing class hierarchies (where parent classes have @stable RTTI functions). (For example, a new Format subclass.)

One easy way to check for this is to search through the API change report and look for "UClassID" or similar.


Check for non-ascii characters in ICU4C source files [obsolete]

Note:  ICU4C and ICU4J source files are UTF-8.  The ASCII check is no longer appropriate for them.

cd icu4c/source
find . \( -name "*.[ch]" -or -name "*.cpp" \) -exec grep -PHn [^[:ascii:]] {} \;

Check svn properties, valid UTF-8 and text file line endings

Verify that all source and text files in the svn repository have plain LF line endings, and have the svn properties set as specified by svn:auto-props

To do this on Linux, In an up-to-date svn working copy of icu,

cd trunk/icu4c/source
tools/icu-svnprops-check.py          # reports problems
tools/icu-svnprops-check.py  --fix   # fixes problems

The same python script from the icu4c tools will fix icu4j

cd trunk/icu4j
../icu4c/source/tools/icu-svnprops-check.py

svn commit any changes made by the script.  The commit is needed both for line ending changes and svn property changes.  Copyright dates do not need to be changed for these.

To double-check the line endings, the following grep will find all text files containing \r characters. Do not run from Windows, where \r\n line endings are expected.

cd trunk
grep -rPIl "\r" *


Check UTF-8 file properties

Note: As of ICU 59, ICU4C source files are UTF-8 encoded, and have the svn mime-type property "text/plain;charset=utf-8". They must not have a BOM.

This is checked by the above task, Check svn properties, valid UTF-8 and text file line endings.


The bomfix.py script, formerly used for this task, must not be run over the ICU4C sources

Note: This task is only applicable to ICU4C. ICU4J .java source files are encoded by UTF-8, but must be without UTF-8 BOM.

Check that the following match: Files marked as UTF-8 vs. Files beginning with the UTF-8 signature byte sequence ("BOM").

Run:

cd {icu}/icu4c

python ../tools/release/c/bomfix.py

Clean up import statements

The Eclipse IDE provides a feature which allow you to organize import statements for multiple files. Right click on projects/source folders/files, you can select [Source] - [Organize Imports] which resolve all wildcard imports and sort the import statements in a consistent order. (Note: You may experience OOM problem when your run this for projects/folders which contain many files. In this case, you may need to narrow a selection per iteration.)

Check library dependencies

We want to keep dependencies between .c/.cpp/.o files reasonable, both between and inside ICU's libraries.

On Linux, run source/test/depstest/depstest.py, for example:

~/svn.icu/trunk/src/icu4c/source/test/depstest$ ./depstest.py ~/svn.icu/trunk/dbg/icu4c

Do this twice: Once for a release build (optimized) and once for a debug build (unoptimized). They pull in slightly different sets of standard library symbols (see comments in dependencies.txt).

If everything is fine, the test will print "OK: Specified and actual dependencies match." If not...

Get changes reviewed by others, including Markus; including changes in dependencies.txt, .py scripts, or ICU4C code.

At first, the test will likely complain about .o files not listed in its dependencies.txt or, if files were removed, the other way around. Try to add them to groups or create new groups as appropriate.

As a rule, smaller groups with fewer dependencies are preferable. If public API (e.g., constructing a UnicodeString via conversion) is not needed inside ICU (e.g., unistr_cnv), make its library depend on its new group.

If the test prints "Info:  group icuplug  does not need to depend on  platform" then the plug-in code is disabled, as is the default since ICU 56. Consider enabling it for dependency checking, but make sure to revert that before you commit changes!

The test might complain "is this a new system symbol?" We should be careful about adding those. For example, we must not call printf() from library code, nor the global operator new.

The test might complain that some .o file "imports  icu_48::UnicodeString::UnicodeString(const char *)  but does not depend on  unistr_cnv.o". This probably means that someone passes a simple "string literal" or a char* into a function that takes a UnicodeString, which invokes the default-conversion constructor. We do not want that! In most cases, such code should be fixed, like in changeset 30186. Only implementations of API that require conversion should depend on it; for example, group formattable_cnv depends on group unistr_cnv, but then nothing inside ICU depends on that.

Verify proper memory allocation functions

Verify the following for library code (common, i18n, layout, ustdio). The requirement is for ICU's memory management to be customizable by changing cmemory.h and the common base class.

Note: The requirement remains. The techniques to fix issues are valid. For testing, see the section "Check library dependencies" above.

  • No raw malloc/free/realloc but their uprv_ versions.
  • All C++ classes must inherit the common base class UObject or UMemory
    • But not for mixin/interface classes (pure virtual, no data members, no constructors) because that would break the single-inheritance model.
    • Also not for pure-static classes (used for name scoping) declare but don't implement a private default constructor to prevent instantiation.
    • Simple struct-like C++ classes (and structs) that do not have constructors, destructors, and virtual methods, need not inherit the base class but must then be allocated with uprv_malloc.
  • All simple types (UChar, int32_t, pointers(!), ...) must be allocated with uprv_malloc and released with uprv_free.
  • For Testing that this is the case, on Linux
    • run the Perl script ICUMemCheck.pl. Follow the instructions in the script. The script is in in the ICU4C sources at source/tools/memcheck/ICUMemCheck.pl.
    • also, depstest.py will show an error message if the global operator new is used (see section about checking dependencies)
  • For testing that this is the case, on Windows:
    • Don't bother, as of Nov, 2004. Failures appear in many dozens of files from the mixin class destructors. Do the check on Linux. But, for reference, here are the old instructions.
      • Make sure that in uobject.h UObject::new and delete are defined by default. Currently, this means to grep to see that U_OVERRIDE_CXX_ALLOCATION is defined to 1 (in pwin32.h for Windows).
      • Check the *.obj's for linkage to the global (non-UObject::) operators new/delete; see uobject.cpp for details.
      • Global new must never be imported. Global delete will be imported and used by empty-virtual destructors in interface/mixin classes. However, they are not called because implementation classes always derive from UMemory. No other functions must use global delete.
      • There are now (2002-dec-17) definitions in utypes.h for global new/delete, with inline implementations that will always crash. These global new/delete operators are only defined for code inside the ICU4C libraries (but must be there for all of those). See ticket #2581.
      • If a global new/delete is used somewhere, then change the class inheritance and/or use uprv_malloc/free until no global new/delete are used in the libraries (and the tests still pass...). See the User Guide Coding Guidelines for details.

Run static code analysis tools

(Purify, Boundary Checker, valgrind...)

Make sure we fixed all the memory leak problems that were discovered when running these tools.

Build ICU with debug information. On Linux,

runConfigureICU --enable-debug --disable-release Linux

Run all of the standard tests under valgrind.  For intltest, for example,

cd <where ever>/source/test/intltest
LD_LIBRARY_PATH=../../lib:../../stubdata:../../tools/ctestfw:$LD_LIBRARY_PATH  valgrind  ./intltest

You can grab the command line for running the tests from the output from "make check", and then just insert "valgrind" before the executable.

Check the code coverage numbers

Our goal is that all releases go out to the public with 100% API test and at least 85% code coverage.

Test ICU4C headers

Test ICU4C public headers

Testing external dependencies in header files:

(on Unixes) Prerequisite: Configure with --prefix (../trunk/source/runConfigureICU Linux --prefix=/some/temp/folder) and do 'make install'. Then set the PATH so that the installed icu-config script can be found. (export PATH=/some/temp/folder/bin:$PATH)

Then go to the 'icu4c/test/hdrtst' directory (note: not 'source/test/hdrtst') and do 'make check'. This will attempt to compile against each header file individually on C and C++, to make sure there aren't any hidden include ordering problems. Output looks like this, if no error springs up all is in order. If a C++ file fails to compile as a C file, add it to the 'cxxfiles.txt' located in the hdrtst directory. Run this test with all the uconfig.h variations (see below).

ctest unicode/docmain.h
ctest unicode/icudataver.h
ctest unicode/icuplug.h

Test ICU4C internal headers

Run the following script straight from the source tree (from inside the "source" folder, not on the top level), no need to build nor install.

For a new release, also look for new tools and tests and add their folders to the script. You can ignore messages stating that no '*.h' files were found in a particular directory.

The command line is simply

~/svn.icu/trunk/src/source$ test/hdrtst/testinternalheaders.sh

See http://bugs.icu-project.org/trac/ticket/12141 "every header file should include all other headers if it depends on definitions from them"

Test uconfig.h variations

Test ICU completely, and run the header test (above) with:

  1. none of the 'UCONFIG_NO_XXX' switches turned on (i.e., the normal case)
  2. all of the 'UCONFIG_NO_XXX' switches turned on (set to 1)
  3. For each switch, test once with just that one switch on. But note the exception regarding UCONFIG_NO_CONVERSION test below.

(See common/unicode/uconfig.h for more documentation.)

There is a script available which will automatically test ICU in this way on UNIXes, it lives in: tools/release/c/uconfigtest.sh. See docs at top of script for information.

Test C++ Namespace Use

Verify that ICU builds without enabling the default use of the ICU namespace. To test on Linux,

./runConfigureICU Linux CXXFLAGS="-DU_USING_ICU_NAMESPACE=0"
make check

Any problems will show up as compilation errors.

When definitions outside the ICU C++ namespace refer to ICU C++ classes, those need to be qualified with "icu::", as in "icu::UnicodeString". In rare cases, a C++ type is also visible in C code (e.g., ucol_imp.h has definitions that are visible to cintltst) and then we use U_NAMESPACE_QUALIFIER which is defined to be empty when compiling for C.

The automated build system should have a machine that sets both -DU_USING_ICU_NAMESPACE=0 and -DU_CHARSET_IS_UTF8=1.

Test UCONFIG_NO_CONVERSION

Make sure that the ICU4C common and i18n libraries build with UCONFIG_NO_CONVERSION set to 1. We cannot do this as part of "Test uconfig.h variations" because the test suites cannot be built like this, but the library code must support it.

The simplest is to take an ICU4C workspace, modify uconfig.h temporarily by changing the value of UCONFIG_NO_CONVERSION to 1, and do "make" (not "make check" or "make tests"). Verify that the stubdata, common & i18n libraries build fine; layout should build too but toolutil will fail, that's expected.

Fix any stubdata/common/i18n issues, revert the UCONFIG_NO_CONVERSION value, and verify that it still works with the normal setting.

If this breaks, someone probably inadvertently uses the UnicodeString(const char *) constructor. See the "Check library dependencies" section and example fixes in changeset 30186.

Test U_CHARSET_IS_UTF8

Verify that ICU builds with default charset hardcoded to UTF-8. To test on Linux,

./runConfigureICU Linux CPPFLAGS="-DU_CHARSET_IS_UTF8=1"
make check

Any problems will show up as compilation or test errors.

Rather than setting the CPPFLAGS, you can also temporarily add  #define U_CHARSET_IS_UTF8 1  in unicode/platform.h before it gets its default definition, or modify the default definition there. (In ICU 4.8 and earlier, this flag was in unicode/utypes.h.)

This works best on a machine that is set to use UTF-8 as its system charset, which is not possible on Windows.

The automated build system should have a machine that sets both -DU_USING_ICU_NAMESPACE=0 and -DU_CHARSET_IS_UTF8=1.

Test U_OVERRIDE_CXX_ALLOCATION=0

Verify that ICU builds with U_OVERRIDE_CXX_ALLOCATION=0 on Linux. Problems will show as build failures.

CPPFLAGS="-DU_OVERRIDE_CXX_ALLOCATION=0./runConfigureICU Linux
make clean
make -j12 check

Test ICU_USE_THREADS=0

Only necessary up to ICU4C 49.

  • ICU 50m1 removes ICU_USE_THREADS from the runtime code (ticket #9010).
  • It is still possible to build and test intltest with ICU_USE_THREADS=0 but not nearly as important.
  • In ICU 50m1, the --disable-threads configure option is gone. If you want to test with ICU_USE_THREADS=0 then temporarily change this flag in intltest.h or in the intltest Makefile.

Verify that ICU builds and tests with threading disabled.  To test on Linux,

./runConfigureICU Linux --disable-threads
make check

Fix MisTicketed Changesets

Fix wrong ticket numbers in changeset commit statements. See http://bugs.icu-project.org/trac/wiki/MisTicketted


Comments