This file is part of the Perl 6 Archive

To see what is currently happening visit http://www.perl6.org/

NAME

docs/pdds/pdd07_codingstd.pod - Conventions and Guidelines for Parrot Source Code

ABSTRACT

This document describes the various rules, guidelines and advice for those wishing to contribute to the source code of Parrot, in such areas as code structure, naming conventions, comments etc.

DESCRIPTION

One of the criticisms of Perl 5 is that its source code is impenetrable to newcomers, due to such things as inconsistent or obscure variable naming conventions, lack of comments in the source code, and so on. We don't intend to make the same mistake when writing Parrot. Hence this document.

We define three classes of conventions. Those that say must are mandatory, and code will not be accepted (apart from in exceptional circumstances) unless it follows these rules. Those that say should are strong guidelines that should normally be followed unless there is a sensible reason to do otherwise. Finally, those that say may, are tentative suggestions to be used at your discretion.

Note this particular PDD makes some recommendations that are specific to the C programming language. This does not preclude Parrot (or Perl 6) being implemented in other languages, but in this case, additional PDDs may need to be authored for the extra language-specific features.

IMPLEMENTATION

Coding style

The following must apply:

The following should apply

To enforce the spacing, indenting, and bracing guidelines mentioned above, the following arguments to GNU Indent should be used:

   -kr -nce -sc -cp0 -l79 -lc79 -psl -nut -cdw -ncs -lps

This expands out to:

Please note that it is also necessary to include all typedef types with the "-T" option to ensure that everything is formatted properly.

A script (tools/dev/run_indent.pl) is provided which runs indent properly automatically.

Naming conventions

Code comments

The importance of good code documentation cannot be stressed enough. To make your code understandable by others (and indeed by yourself when you come to make changes a year later :-), the following conventions apply to all source files.

Extensibility

If Perl 5 is anything to go by, the lifetime of Perl 6 will be at least seven years. During this period, the source code will undergo many major changes never envisaged by its original authors - cf threads, unicode in perl 5. To this end, your code should balance out the assumptions that make things possible, fast or small, with the assumptions that make it difficult to change things in future. This is especially important for parts of the code which are exposed through APIs - the requirements of src or binary compatibility for such things as extensions can make it very hard to change things later on.

For example, if you define suitable macros to set/test flags in a struct, then you can later add a second word of flags to the struct without breaking source compatibility. (Although you might still break binary compatibility if you're not careful.) Of the following two methods of setting a common combination of flags, the second doesn't assume that all the flags are contained within a single field:

    foo->flags |= (FOO_int_FLAG | FOO_num_FLAG | FOO_str_FLAG);
    FOO_valid_value_SETALL(foo);

Similarly, avoid using a char* (or {char*,length}) if it is feasible to later use a PMC* at the same point: cf UTF-8 hash keys in Perl 5.

Of course, private code hidden behind an API can play more fast and loose than code which gets exposed.

Portability

Related to extensibility is portability. Perl runs on many, many platforms, and will no doubt be ported to ever more bizarre and obscure ones over time. You should never assume an operating system, processor architecture, endian-ness, word size, or whatever. In particular, don't fall into any of the following common traps:

Internal data types and their utility functions (especially for strings) should be used over a bare char * whenever possible. Ideally there should be no char * in the source anywhere, and no use of C's standard string library.

Don't assume GNU C, and don't use any GNU extensions unless protected by #ifdefs for non-GNU-C builds.

TBC ... Any contributions welcome !!!

Performance

We want Perl to be fast. Very fast. But we also want it to be portable and extensible. Based on the 90/10 principle, (or 80/20, or 95/5, depending on who you speak to), most performance is gained or lost in a few small but critical areas of code. Concentrate your optimization efforts there.

Note that the most overwhelmingly important factor in performance is in choosing the correct algorithms and data structures in the first place. Any subsequent tweaking of code is secondary to this. Also, any tweaking that is done should as far as possible be platform independent, or at least likely to cause speed-ups in a wide variety of environments, and do no harm elsewhere. Only in exceptional circumstances should assembly ever even be considered, and then only if generic fallback code is made available that can still be used by all other non-optimized platforms.

Probably the dominant factor (circa 2001) that effects processor performance is the cache. Processor clock rates have increased far in excess of main memory access rates, and the only way for the processor to proceed without stalling is for most of the data items it needs to be found to hand in the cache. It is reckoned that even a 2% cache miss rate can cause a slowdown in the region of 50%. It is for this reason that algorithms and data structures must be designed to be 'cache-friendly'.

A typical cache may have a block size of anywhere between 4 and 256 bytes. When a program attempts to read a word from memory and the word is already in the cache, then processing continues unaffected. Otherwise, the processor is typically stalled while a whole contiguous chunk of main memory is read in and stored in a cache block. Thus, after incurring the initial time penalty, you then get all the memory adjacent to the initially read data item for free. Algorithms that make use of this fact can experience quite dramatic speedups. For example, the following pathological code ran four times faster on my machine by simply swapping i and j.

    int a[1000][1000];

    ... (a gets populated) ...

    int i,j,k;
    for (i=0; i<1000; i++) {
        for (j=0; j<1000; j++) {
            k += a[j][i];
        }
    }

This all boils down to: keep things near to each other that get accessed at around the same time. (This is why the important optimizations occur in data structure and algorithm design rather than in the detail of the code.) This rule applies both to the layout of different objects relative to each other, and to the relative positioning of individual fields within a single structure.

If you do put an optimization in, time it on as many architectures as you can, and be suspicious of it if it slows down on any of them! Perhaps it will be slow on other architectures too (current and future). Perhaps it wasn't so clever after all? If the optimization is platform specific, you should probably put it in a platform-specific function in a platform-specific file, rather than cluttering the main source with zillions of #ifdefs.

And remember to document it.

Loosely speaking, Perl tends to optimism for speed rather than space, so you may want to code for speed first, then tweak to reclaim some space while not affecting performance.

REFERENCES

The section on coding style is based on Perl5's Porting/patching.pod by Daniel Grisinger. The section on naming conventions grew from some suggestions by Paolo Molaro <lupus@lettere.unipd.it>. Other snippets came from various P5Pers. The rest of it is probably my fault.

VERSION

CURRENT

   Maintainer: Dave Mitchell <davem@fdgroup.com>
   Class: Internals
   PDD Number: 7
   Version: 1
   Status: Developing
   Last Modified: 6 August 2001
   PDD Format: 1
   Language: English

HISTORY

Based on an earlier draft which covered only code comments.

CHANGES

None. First version