This file is part of the Perl 6 Archive

To see what is currently happening visit http://www.perl6.org/

NAME

docs/pdds/pdd09_gc.pod - Garbage Collection Subsystems

ABSTRACT

This PDD describes how DOD/GC systems work, and what's required of PMC classes.

DESCRIPTION

Doing DOD takes a bit of work--we need to make sure that everything is findable from the root set, and that we don't go messing up data shared between interpreters.

GC TERMS

GC Schemes

There are basically three general schemes to achieve garbage collection.

  • Reference counting
  • All objects have a count, how often they are refered to by other objects. If that count reaches zero, the object's space can be reclaimed. This scheme can't cope with reference loops, i.e, a loop of dead objects, all referencing one another but not reachable from elsewhere, never gets collected.

  • Mark and Sweep
  • Starting from the root set (Parrot registers, stacks, internal structures) all reachable objects (and objects reachable from these) are marked being alive.

    Ojbects not reached are considered being dead and get collected by a sweep through the objects arenas.

  • Copying collection
  • Live objects are copied into a new memory region. The old memory space can then be reclaimed.

GC Variants

  • stop-the-world
  • During a GC cycle the processing of user code is stopped. Normal operation continues only after the whole GC cycle is performed. This can lead to arbitrary long pauses during program execution.

  • incremental
  • GC is done in small increments intermittent with normal program operation.

  • real-time
  • The pauses caused by GC don't exceed a certain limit.

  • concurrent
  • The GC system runs as a separate thread and really concurrently on a multi-processor machine.

  • parallel
  • Multiple threads are participating in GC.

  • generational
  • The object space is divided between a young generation (short-lived temporaries) and one or more old generations. By not scanning the old generation this can considerably speed up GC.

GC SUBSYSTEMS

Fix-sized Headers

All managed objects (PMCs, Strings, Buffers) inside Parrot are subject to garbage collection. As these objects aren't allowed to move after creation, garbage collection is done by a non-copying scheme. Further: as we have to cope with pointers to objects on the C stack and in CPU registers, the garbage collection scheme is a conservative one.

DOD/GC is normally triggered by allocation of new objects, which happens usually from some stack nesting below the run-loop. There is a small chance that an integer on the C stack is misinterpreted as a pointer to an object. This object would kept alive in such a case.

The live-ness information gained by dead object detection (DOD) is also the base for collecting variable sized-data that may hang off buffers.

Variable-sized Buffer Memory

Variable-sized memory like string memory gets collected, when the associated header isn't found to be alive during DOD. While a copying collection could basically[1] be done at any time, its inefficient to copy buffers of objects that are non yet detected being dead. This implies that before a collection in the memory pools is run, a DOD run for fixed-sized headers is triggered.

[1] Dead objects stay dead, there is no possibility of a reusal of dead objects.

IMPLEMENTATION OF FIXED-SIZED HEADER GC

General Notes

GC subsystems are rather independent. The goal for Parrot is just to provide new object headers in the fasted possible way. How that is achieved can be considered as an implementation detail.

While GC subsystems are independent they may share some code to reduce Parrot memory footprint. E.g. stop-the-world mark and sweep and incremental mark and sweep use the same arena structures and share arena creation and DOD routines.

Initialization

Currently only one GC system is active (selected at configure or compile time). But future versions might support switching GC systems during runtime to accomodate for different work loads.

  • void Parrot_gc_XXX_init(Interp *)
  • Initialize GC system named XXX.

    Called from src/memory.c:mem_setup_allocator() after creating arena_base. The initialization code is responsible for the creation of the header pools and has to fill the following function pointer slots in arena_base:

Arena_base function pointers

  • void (*do_dod_run) (Interp *, int flags)
  • Trigger or perform a DOD/GC run.

    Flags is one of:

    • DOD_trace_normal | DOD_trace_stack_FLAG
    • Run a normal GC cycle. This is normally called by resource shortage in the buffer memory pools before a collection is run. The bit named DOD_trace_stack_FLAG indicates that the C-stack (and other system areas like the processor registers) have to be traced too.

      The implementation might or might not actually run a full GC cycle. If e.g an incremental GC system just has finished the mark phase, it would do nothing. OTOH if no objects are currently marked live, the implementation should run the mark phase, so that copying of dead objects is avoided.

    • DOD_lazy_FLAG
    • Do a timely destruction run. The goal is to either detect all objects that need timely destruction or to do a full collection. In the former case the collection can be interrupted or postponed. This is called from the Parrot run-loop. No system areas have to be traced.

    • DOD_finish_FLAG
    • Called during interpreter destruction. The GC subsystem must clear the live state of all objects and perform a sweep in the PMC header pool, so that destructors and finalizers get called.

    • DOD_no_trace_volatile_roots
    • Trace root set except volatile roots. That is skip e.g. registers.

  • void (*de_init_gc_system) (Interp*)
  • Called during interpreter destruction. Free used resources and memory pools.

  • void (*mark_object) (Interp*, Pobj*)
  • Mark the object being live. This function gets invoked by the following macro:

  • Parrot_pobj_lives(Interp*, PObj *)
  • which might do nothing if the object is already marked live.

Object allocation

Each header pool provides one function pointer to get a new object from that pool.

  • PObj * (*get_free_object) (Interp*, struct Small_Object_Pool*)
  • It should return one free object from the given pool. Object flags are returned clear, except flags that are used by the garbage collector itself. If the pool is a buffer header pool, all other object memory is zeroed.

Write Barrier

The GC subsystem has to provide these (possibly empty) macros:

  • DOD_WRITE_BARRIER(Interp*, PMC *agg, PMC *old, PMC *new)
  • This macro is invoked when in aggregate agg the element old is getting overritten by new. Both old and new may be NULL.

  • DOD_WRITE_BARRIER_KEY(Interp*, PMC *agg, PMC *old, PObj *old_key, PMC *new, PObj *new_key)
  • Like above. Invoked when a hash key is inserted, possibly replacing and old key.

The Arena_base structure

The arena_base holds the mentioned function pointers, pointers to the header pools, some statistic counters, and a pointer void *gc_private reserved for the GC subsystem.

The GC subsystem is responsible for updating the appropriate statistic fields of the structure.

Blocking GC

Being able to block GC and DOD is important--you'd hate to have the newly allocated Buffers or PMCs you've got yanked out from underneath you. That'd be no fun. Use the following routines to control GC:

  • Parrot_block_DOD(Interp *interpreter)
  • Block DOD for the passed interpreter. (But not GC)

  • Parrot_block_GC(Interp *interpreter)
  • Block GC for the passed interpreter. (But not DOD)

  • Parrot_unblock_DOD(Interp *interpreter)
  • Unblock DOD for the passed interpreter. (But not GC)

  • Parrot_unblock_GC(Interp *interpreter)
  • Unblock GC for the passed interpreter. (But not DOD)

Note that the blocking is recursive--if you call Parrot_block_DOD() three times in succession, you need to call Parrot_unblock_DOD() three times to re-enable DOD.

Important flags

For PMCs and Buffers to be collected properly, you must get the flags set on them properly. Otherwise Bad Things Will Happen.

Note: don't manipulate these flags directly. Always use the macros defined in include/parrot/pobj.h.

  • PObj_active_destroy_FLAG
  • The PMC has some sort of active destructor, and will have that destructor called when the PMC is destroyed.

  • PObj_custom_mark_FLAG
  • The mark vtable slot will be called during DOD. The mark function must call Parrot_pobj_lives for all non-NULL objects that PMC refers too.

    Please note that Parrot_pobj_lives may be a macro.

  • PObj_data_is_PMC_array_FLAG
  • Set if the data pointer points to an array of objects. The length of the array is PMC_int_val(SELF).

  • PObj_external_FLAG
  • Set if the buffer points to memory that came from outside Parrot's memory system.

  • PObj_sysmem_FLAG
  • Set if the memory came from the system malloc. When the buffer is considered dead, the memory will be freed back to the system.

  • PObj_COW_FLAG
  • The buffer's memory is copy on write. Any changes to the buffer must first have the buffer's memory copied. The COW flag should then be removed.

The following flags can be used by the GC subsystem:

  • PObj_live_FLAG
  • The system considers the object to be alive for collection purposes.

  • PObj_on_free_list_FLAG
  • The object is unused, and on the free list for later allocation.

  • PObj_custom_GC_FLAG
  • DWIM.

ATTACHMENTS

None.

FOOTNOTES

None.

REFERENCES

None.

VERSION

CURRENT

    Maintainer: Dan Sugalski
    Class: Internals
    PDD Number: 9
    Version: 1.2
    Status: Developing
    Last Modified: 26 August 2004
    PDD Format: 1
    Language: English

HISTORY

  • Version 1.2
  • 26 August 2004

  • Version 1.1
  • 26 February 2002

  • version 1
  • None. First version

CHANGES

  • Version 1.2
  • Removed old flags. Documented GC schemes and subsystem interface.

  • Version 1.1
  • Started documenting the internal routines

  • Version 1.0
  • None. First version