As I mentioned I’ll probably be stuck forever as n00b, so I have absolutely no intention to compare the real merits of languages. It’s just that I finally got to learn Ruby (and then Python) for real, and it’s so different from C++ and Java that I think it’s worth documenting.

I don’t have any emotion for or against C++ since it’s just too big. I’ve used Java for just over 2 years and like its relative simplicity. I’ve learned Ruby for a day, and would take

a = { 1 => "blah" }


Map map = new HashMap(); map.put(1, "blah");

any given time.






single base class, multiple interfaces

single base class, multiple mixins


throw try catch

throw try catch finally

raise begin rescue ensure
catch throw (with label, not exception)

raise try except finally







Object …


*args, **args



inner class

anonymous class


lambda (limited)

final (not quite)


enum: int only

enum: full class, but no more inheritance

N/A (type-safety is moot)


pointer, iterator

JRE, javadoc, IDE

yield, block/closure

list comprehension

hash table, regexp, thread



complex and cryptic

String ==

elsif, no ++

elif, no switch/case

This seems to explain things pretty clearly.

Definition Task Manager Process Explorer vadump -s
Physical memory in use Mem Usage Working Set

Private (no DLL) VM allocated/committed VM Size Private Bytes PagefileUsage
Total VM (including mmap, dll, etc) N/A Virtual Size (Image + Priv + Mapped) Commitment + Dynamic Reserved Memory

On 32-bit Windows, max address space (virtual size) is 2GB.

The 3 types of commitment in vadump -s are:

  • Image: process executable code
  • Mapped: memory mapped stuff like files
  • Private: process heap (and stack?)

vadump -so has a section that breaks down working set. Two entries are important: Heap is Windows native heap, and Other Data would be, e.g. CLR and JVM stuff.

No, it’s not about religion, or reality TV, or random rant.

Those are names in Java’s memory management model.

In the beginning, Java gc was simple and stupid: run when heap is full. So your app happily gobbles up memory until… a… lo…ng pau…se.

Then the Java guys found a common pattern among most apps: most objects die young (used only for a short time), but those who survive live long. The X unit of the graph is object life span not in time, but in terms of number of bytes allocated between their birth and death.

Therefore Java memory is now divided into 3 generations:

  1. Young
    1. eden
    2. two survivor spaces
  2. Tenured
  3. Permanent (and code cache): stores JVM’s own stuff

Heap = young + tenured. It starts at physical memory / 64, and max is min(mem/4, 1GB), unless you specify -Xms and -Xmx. Default perm size is 64MB (-XX:MaxPermSize). Default code cache is 32MB (-XX:ReservedCodeCacheSize).

Now object life cycle is like this:

  1. Objects are always allocated to eden.
  2. When eden fills up, a fast but not comprehensive gc (minor collection) is run over the young generation only.
  3. All survivors are moved into one survivor space, plus everything from the other survivor space (survivors from the previous minor collection).
  4. When objects in survivor space is old enough (or survivor fills up), they are moved to tenured.
  5. When tenured fills up, a major collection is run that is comprehensive: all heap, all objects.

Run java with -verbose:gc (or -Xloggc:file) and it prints stuff like this:

[GC 15081K->14088K(20988K), 0.0110810 secs]
[Full GC 15078K->13996K(20988K), 0.1845024 secs]

GC = minor collection and Full GC = major. Numbers are pre gc -> post gc (total committed heap).

I was asked the question during interview, and later found Meyers & Alexandrescu’s great article C++ and the Perils of Double-Checked Locking, one of the references in this greater Wikipedia page. DCLP is listed as one of anti pattern.

The essence of why DCLP doesn’t work is that compiler may rearrange instruction order, so another thread may reference a not-fully-initialized singleton. On a multiprocessor platform, cache coherence may also lead to problem where another thread on another processor gets a bad singleton.

Meyers offers several solutions:

  • Use multithread library, not simple synchronization construct like mutex.
  • Client code caches singleton instance locally
  • Use eager initialization for singleton, i.e. instantiate at startup

DDJ has an article about a nice and simple C++ message processing technique.

class Message
    template void dynamicDispatch(MessageHandlerBase* handler,MessageType* self)
        dynamic_cast*>(handler)->process(self); // should test against NULL from dynamic_cast
class Message1 : public MessageBase
    void dispatch(MessageHandlerBase* handler)
        dynamicDispatch(handler, this);

class MessageHandlerBase
template class MessageHandler : public virtual MessageHandlerBase
    virtual void process(MessageType*)=0;
class SpecificMessageHandler : public MessageHandler, public MessageHandler
    void process(Message1*);
    void process(Message2*);

Only the handler of specific message types needs to include message declaration.

Double dispatching refers to calling handler.process from message.dispatch. Dynamic refers to the dynamic_cast. message.dispatch can be a macro to save copy-n-paste.

Next Page »