Programming


As I mentioned I’ll probably be stuck forever as n00b, so I have absolutely no intention to compare the real merits of languages. It’s just that I finally got to learn Ruby (and then Python) for real, and it’s so different from C++ and Java that I think it’s worth documenting.

I don’t have any emotion for or against C++ since it’s just too big. I’ve used Java for just over 2 years and like its relative simplicity. I’ve learned Ruby for a day, and would take

a = { 1 => "blah" }

over

Map map = new HashMap(); map.put(1, "blah");

any given time.

C++

Java

Ruby

Python
Inheritance

multiple

single base class, multiple interfaces

single base class, multiple mixins

multiple
Syntax

throw try catch

throw try catch finally

raise begin rescue ensure
catch throw (with label, not exception)

raise try except finally
#include

import

require/load

import
dynamic_cast/typeid

instanceof

kind_of

isinstance
varargs

Object …

*args

*args, **args
NULL

null

nil

None
inner class

anonymous class

block

lambda (limited)
const

final (not quite)

object.freeze

N/A
enum: int only

enum: full class, but no more inheritance

N/A (type-safety is moot)

N/A
Hits

pointer, iterator

JRE, javadoc, IDE

yield, block/closure

list comprehension
Misses

hash table, regexp, thread

verbose

performance?
Idiosyncrasy

complex and cryptic

String ==

elsif, no ++

elif, no switch/case

This seems to explain things pretty clearly.

Definition Task Manager Process Explorer vadump -s
Physical memory in use Mem Usage Working Set

WorkingSetSize
Private (no DLL) VM allocated/committed VM Size Private Bytes PagefileUsage
Total VM (including mmap, dll, etc) N/A Virtual Size (Image + Priv + Mapped) Commitment + Dynamic Reserved Memory

On 32-bit Windows, max address space (virtual size) is 2GB.

The 3 types of commitment in vadump -s are:

  • Image: process executable code
  • Mapped: memory mapped stuff like files
  • Private: process heap (and stack?)

vadump -so has a section that breaks down working set. Two entries are important: Heap is Windows native heap, and Other Data would be, e.g. CLR and JVM stuff.

No, it’s not about religion, or reality TV, or random rant.

Those are names in Java’s memory management model.

In the beginning, Java gc was simple and stupid: run when heap is full. So your app happily gobbles up memory until… a… lo…ng pau…se.

Then the Java guys found a common pattern among most apps: most objects die young (used only for a short time), but those who survive live long. The X unit of the graph is object life span not in time, but in terms of number of bytes allocated between their birth and death.

Therefore Java memory is now divided into 3 generations:

  1. Young
    1. eden
    2. two survivor spaces
  2. Tenured
  3. Permanent (and code cache): stores JVM’s own stuff

Heap = young + tenured. It starts at physical memory / 64, and max is min(mem/4, 1GB), unless you specify -Xms and -Xmx. Default perm size is 64MB (-XX:MaxPermSize). Default code cache is 32MB (-XX:ReservedCodeCacheSize).

Now object life cycle is like this:

  1. Objects are always allocated to eden.
  2. When eden fills up, a fast but not comprehensive gc (minor collection) is run over the young generation only.
  3. All survivors are moved into one survivor space, plus everything from the other survivor space (survivors from the previous minor collection).
  4. When objects in survivor space is old enough (or survivor fills up), they are moved to tenured.
  5. When tenured fills up, a major collection is run that is comprehensive: all heap, all objects.

Run java with -verbose:gc (or -Xloggc:file) and it prints stuff like this:


[GC 15081K->14088K(20988K), 0.0110810 secs]
[Full GC 15078K->13996K(20988K), 0.1845024 secs]

GC = minor collection and Full GC = major. Numbers are pre gc -> post gc (total committed heap).

I was asked the question during interview, and later found Meyers & Alexandrescu’s great article C++ and the Perils of Double-Checked Locking, one of the references in this greater Wikipedia page. DCLP is listed as one of anti pattern.

The essence of why DCLP doesn’t work is that compiler may rearrange instruction order, so another thread may reference a not-fully-initialized singleton. On a multiprocessor platform, cache coherence may also lead to problem where another thread on another processor gets a bad singleton.

Meyers offers several solutions:

  • Use multithread library, not simple synchronization construct like mutex.
  • Client code caches singleton instance locally
  • Use eager initialization for singleton, i.e. instantiate at startup

DDJ has an article about a nice and simple C++ message processing technique.

class Message
{
protected:
    template void dynamicDispatch(MessageHandlerBase* handler,MessageType* self)
    {
        dynamic_cast*>(handler)->process(self); // should test against NULL from dynamic_cast
    }
};
class Message1 : public MessageBase
{
    void dispatch(MessageHandlerBase* handler)
    {
        dynamicDispatch(handler, this);
    }
};

class MessageHandlerBase
{};
template class MessageHandler : public virtual MessageHandlerBase
{
    virtual void process(MessageType*)=0;
};
class SpecificMessageHandler : public MessageHandler, public MessageHandler
{
    void process(Message1*);
    void process(Message2*);
};

Only the handler of specific message types needs to include message declaration.

Double dispatching refers to calling handler.process from message.dispatch. Dynamic refers to the dynamic_cast. message.dispatch can be a macro to save copy-n-paste.

I was wondering whether enum has internal or external linkage, and this article explains it very well. enum has no storage or linkage. It’s the same as a macro, only safer.

There was an email thread between my colleagues Mike and Joe on this.
In C++, all 3 swap’s work. Only the 3rd one doesn’t incur a temporary object, but it doesn’t swap the original objects a and b, but rather the pointers pa and pb.

class CTest
{
public:
int a;
};

void swap(CTest& ra, CTest& rb)
{
CTest rc = ra;
ra = rb;
rb = rc;
}

void swap(CTest* ra, CTest* rb)
{
CTest rc;
rc = *ra;
*ra = *rb;
*rb = rc;
}

void swap(CTest** ra, CTest** rb)
{
CTest* rc = *ra;
*ra = *rb;
*rb = rc;
}

int main(int argc, char* argv[])
{
CTest a, b;
const CTest* pa = &a;
const CTest* pb = &b;
a.a = 1;
b.a = 2;
swap(a, b);
swap(&a, &b);
swap(&pa, &pb);
}

In Java, nothing works! You just can’t write a simple swap function in Java. You’d have to swap all members in the two objects explicitly (use reflection).

Next Page »