Erlang Ring Benchmark

Back to home page

Background

Having spent the last 25 years writing programs in imperative languages such as Assembler, PL/1, Visual Basic, ABAP and Java, I decided it was high time to give my brain a new challenge.

Several factors were involved here during the search for a new language, these being my interests in:

Now that the concepts of the Semantic Web are becoming more and more of a reality, I thought it appropriate to look at a language that has the built-in capabilities needed to implement these concepts.

After hunting around for a while, I discovered that back in the mid-eighties, Ericson developed a cool language called Erlang to run telephone exchanges. Ericson's design requirements were to create a language that would:

Because computers are now being shipped with multi-core processors, it has become practical to re-examine programming languages designed for concurrency - hence my decision to learn Erlang.

When I started looking at Erlang, the syntax was easy enough to learn, but I rapidly discovered that functional languages are based on a fundamentally different way of thinking, and use design principles that are very different from those used by imperative languages. Therefore, I first had to learn how to think in Erlang before I could write even the simplest programs.

Since Erlang is based on the concepts of iteration and the Lambda Calculus, I found I could call on very little of my previous programming experience to act as a guide.


The Problem

In his book "Programming Erlang", Joe Armstrong (the author of the language) casually throws out the following programming challenge...

Write a ring benchmark. Create N processes in a ring.
Send a message round the ring M times so that a total of N * M messages get sent.

Time how long this takes for different values of N and M.


The Solution

I put this explanation of the solution together as a means of explaining to myself what is going in an Erlang program - and since it helped me, I venture to say that it will help you also!  :-)

Since I am still quite new to thinking in Erlang, I figured that the easiest way to arrive at an answer was to reverse engineer the answers other have created. Therefore, I hunted around the internet for existing solutions, found one that I thought looked suitable and took it to pieces. In the process of doing so, I discovered I could implement my own solution in a slightly more efficient manner (not by much though...).

So using Rogier Peters' solution as a starting point, I took it to bits, renamed the functions in a way that made more sense to me, and tweaked a couple of things here and there to help me understand what was going on. You'll notice that I have defined many macros to trace how each step of the solution is being executed. These statements can be switched on and off using a compiler option.

 
001-module(ring_benchmark).
002-export([start/2, newLink/3]).
003%%-------------------------------------------------------------------------------------------------
004%%
005%% Define various macros
006%%
007%%-------------------------------------------------------------------------------------------------
008-ifdef(trace).
009  -define(TraceLinkCreation(Pid,N),
010            io:format("~w is link ~w~n",[Pid, N])).
011  -define(TraceLastLink(Pid),
012            io:format("~w is the last link in the chain.~n~n", [Pid])).
013  -define(TraceLap(N),
014            io:format("Passing Go, ~p more laps to go~n", [N-1])).
015  -define(TraceLastLap,
016            io:format("Passing Go, 1 more lap to go~n", [])).
017  -define(TraceWaitToPassMsgOn(Pid, NextLink),
018            io:format("~w is waiting to send next message to ~w~n",[Pid, NextLink])).
019  -define(TracePassMsgOn(Pid, Msg, NextLink),
020            io:format("~w received message ~p, passing it on to ~w~n", [Pid, Msg, NextLink])).
021  -define(TraceRingComplete(Pid, NextLink),
022            io:format("~w received connect message from ~w~nAll links in the chain now form a ring.~n",[Pid, NextLink])).
023  -define(TraceSendingMsg(Msg, Pid, NextLink),
024            io:format("Sending message ~p from ~w to ~p~n",[Msg, Pid, NextLink])).
025  -define(TraceEstablishConnection(Pid, NextLink),
026            io:format("~w received connect message from ~w~n",[Pid, NextLink])).
027-else.
028  -define(TraceLinkCreation(Pid,N),                void).
029  -define(TraceLastLink(Pid),                      void).
030  -define(TraceLap(N),                             void).
031  -define(TraceLastLap,                            void).
032  -define(TraceWaitToPassMsgOn(Pid, NextLink),     void).
033  -define(TracePassMsgOn(Pid, Msg, NextLink),      void).
034  -define(TraceRingComplete(Pid, NextLink),        void).
035  -define(TraceSendingMsg(Msg, Pid, NextLink),     void).
036  -define(TraceEstablishConnection(Pid, NextLink), void).
037-endif.
038
039%%-------------------------------------------------------------------------------------------------
040%%
041%% Entry point
042%%
043%%-------------------------------------------------------------------------------------------------
044start(N, M) ->
045% Reset timers
046  statistics(runtime),
047  statistics(wall_clock),
048
049% Define the message to be sent around the ring
050  Blah = "BOO!",
051
052% Start ring creation
053  io:format("Creating a ring of ~p processes.~n",[N]),
054  ?TraceLinkCreation(self(),N),
055  spawn(ring_benchmark, newLink, [N-1, self(), self()]),
056
057% Wait for connect message from the next link in the chain
058% then start the message going
059  receive
060    {connect, NextLink} ->
061      ?TraceRingComplete(self(), NextLink),
062
063% How long did it take to create the processes?
064      {_, CPUTime_CreateProcesses} = statistics(runtime),
065      {_, ElapseTime_CreateProcesses} = statistics(wall_clock),
066
067      io:format("~nCPU Time (process creation) = ~p ms~nElapse time (process creation) = ~p ms~n",
068                [CPUTime_CreateProcesses, ElapseTime_CreateProcesses]),
069      io:format("~nStarting message round with message ~p~n",[Blah]),
070
071      ?TraceSendingMsg(Blah, self(), NextLink),
072
073      NextLink ! {M, Blah},
074      lapCounter(NextLink)
075  end,
076
077% Find out how long that lot took
078  {_, CPUTime_SendMessage} = statistics(runtime),
079  {_, ElapseTime_SendMessage} = statistics(wall_clock),
080
081  io:format("~nCPU Time (sending message) = ~p ms~nElapse Time (sending message) = ~p ms~n",
082            [CPUTime_SendMessage, ElapseTime_SendMessage]),
083  io:format("~nCPU Time (total) = ~p ms~nElapse Time (total) = ~p ms~n",
084            [CPUTime_CreateProcesses + CPUTime_SendMessage,
085            ElapseTime_CreateProcesses + ElapseTime_SendMessage]).
086
087%%-------------------------------------------------------------------------------------------------
088%%
089%% Create the last link in the chain
090%%
091%%-------------------------------------------------------------------------------------------------
092newLink(1, PrevLink, Main) ->
093  ?TraceLastLink(self()),
094
095% Send connect message to the previous link
096  PrevLink ! {connect, self()},
097
098% Get ready to complete the lap by passing the message back to the first link
099  passMsgOn(Main);
100
101%%-------------------------------------------------------------------------------------------------
102%%
103%% Create the next link in the chain
104%%
105%%-------------------------------------------------------------------------------------------------
106newLink(N, PrevLink, Main) ->
107  ?TraceLinkCreation(self(),N),
108
109% Create a new link.
110% Tell it where its predecessor is (i.e. this process) and also which process
111% is at the start of the chain (this value is only used when the last link
112% in the chain is created)
113  NextLink = spawn(ring_benchmark, newLink, [N-1, self(), Main]),
114
115% Wait for connect message from the next link in the chain
116% This will identify the process to which the message should be sent
117  receive
118    {connect, NextLink} ->
119      ?TraceEstablishConnection(self(), NextLink),
120
121% Send a connect message to previous link
122      PrevLink ! {connect, self()},
123
124% Get ready to pass message on to the next link
125      passMsgOn(NextLink)
126  end.
127
128%%-------------------------------------------------------------------------------------------------
129%%
130%% Pass message to the next link in the chain, then restart so
131%% that the message can be sent around the ring multiple times
132%%
133%%-------------------------------------------------------------------------------------------------
134passMsgOn(NextLink) ->
135  ?TraceWaitToPassMsgOn(self(), NextLink),
136
137% Wait for message
138  receive
139    Msg ->
140% Pass message on
141      ?TracePassMsgOn(self(), Msg, NextLink),
142      NextLink ! Msg,
143      passMsgOn(NextLink)
144  end.
145
146%%-------------------------------------------------------------------------------------------------
147%%
148%% Control how many laps of the ring are performed
149%%
150%%-------------------------------------------------------------------------------------------------
151lapCounter(Pid) ->
152% Wait for whatever message is being passed around.
153% This message will only ever be sent by the start process
154% because it is the first link in the chain
155  receive
156    {1, _} ->
157      io:format("Message round complete~n", []);
158
159    {2, Msg} ->
160      ?TraceLastLap,
161      Pid ! {1, Msg},
162      lapCounter(Pid);
163
164    {N, Msg} ->
165      ?TraceLap(N),
166      Pid ! {N-1, Msg},
167      lapCounter(Pid)
168  end.

How It Works

Erlang (BEAM) emulator version 5.6.5 [smp:2] [async-threads:0]

consulting .erlang in "C:/Program Files/erl5.6.5/bin"
c:/Erlang/code
Now in:"c:/Erlang/code"
Eshell V5.6.5  (abort with ^G)
1> c(ring_benchmark, {d, trace}).
{ok,ring_benchmark}
        

After the Erlang shell starts, the .erlang config file instructs the shell to change into the directory where the source code lives.

To compile the program with tracing switched on, use the function c/2 and enter c(ring_Benchmark, {d, trace}). at the ERL (or WERL) command line. Otherwise, compile as normal using c/1.

The second parameter is a tuple that defines compiler options. In this case {d, trace} causes the value of a flag called trace to be set to true. If this compiler option is omitted, then although the trace flag has not been explicitly defined, it acquires the implicit value of false.

If the trace flag is on, the macros defined by the -define() statements will be translated to calls to io:format(), else they will just become the atom void.

2> ring_benchmark:start(3,3).
Creating a ring of 3 processes.

The ring benchmark program is run by calling the start/2 function with the parameters 3 and 3. This specifies that the chain should contain three links and that the message should be passed around the loop three times.

Function start/2 operates in the following way:

 
001-module(ring_benchmark).
002-export([start/2, newLink/3]).
...
...%% Trace statements snipped
044start(N, M) ->
045% Reset timers
046  statistics(runtime),
047  statistics(wall_clock),
048
049% Define the message to be sent around the ring
050  Blah = "BOO!",
051
052% Start ring creation
053  io:format("Creating a ring of ~p processes.~n",[N]),
054  ?TraceLinkCreation(self(),N),
055  spawn(ring_benchmark, newLink, [N-1, self(), self()]),
056
057% Wait for connect message from the next link in the chain
058% then start the message going
059  receive
060    {connect, NextLink} ->
061      ?TraceRingComplete(self(), NextLink),
062
063% How long did it take to create the processes?
064      {_, CPUTime_CreateProcesses} = statistics(runtime),
065      {_, ElapseTime_CreateProcesses} = statistics(wall_clock),
066
067      io:format("~nCPU Time (process creation) = ~p ms~nElapse time (process creation) = ~p ms~n",
068                [CPUTime_CreateProcesses, ElapseTime_CreateProcesses]),
069      io:format("~nStarting message round with message ~p~n",[Blah]),
070
071      ?TraceSendingMsg(Blah, self(), NextLink),
072
073      NextLink ! {M, Blah},
074      lapCounter(NextLink)
075  end,
076
077% Find out how long that lot took
078  {_, CPUTime_SendMessage} = statistics(runtime),
079  {_, ElapseTime_SendMessage} = statistics(wall_clock),
080
081  io:format("~nCPU Time (sending message) = ~p ms~nElapse Time (sending message) = ~p ms~n",
082            [CPUTime_SendMessage, ElapseTime_SendMessage]),
083  io:format("~nCPU Time (total) = ~p ms~nElapse Time (total) = ~p ms~n",
084            [CPUTime_CreateProcesses + CPUTime_SendMessage,
085            ElapseTime_CreateProcesses + ElapseTime_SendMessage]).
086



First, the start/2 function resets two system timers to measure CPU time and elapse time.



The message to be passed around the ring is then bound to the variable Blah.


Then a new process instance of the newLink/3 function is spawned. The parameters it passes are the number of processes yet to be created, a reference to the previous link in the chain and a reference to the start link in the chain. Since this is the first link to be created, the value of the previous link and the starting link are the same.

The start/2 process then sits and waits for a message to arrive. The only message it will respond to is one that matches the pattern {connect, NextLink}. In otherwords, the message must start with the atom connect followed by a single value. Since the variable NextLink is unbound at this point in time, it doesn't actually matter at what the second value is. Whatever value is received will therefore be bound to the variable NextLink.

When this message is received, the process creation phase of the program has completed. So the current CPU and elapse timer values are stored and the message round phase of the program begins.

The message round phase is started by sending the next link in the chain the message {M, Blah} - where the variable M is bound to the start parameter defining the number of laps, and the variable Blah is bound to the message value. Function passMsgOn/1 will respond to this message.

The lapCounter/1 function is then called with the process id of the next link in the chain. This function receives a message at the end of each lap, and is responsible for invoking the next lap. It continues to do this until the lap counter reaches 1, at which point all the laps have been completed and the program terminates.

The final step is to output the elapse and CPU time values for the message round phase of the program.

The following trace information shows the steps involved whilst the chain is being created.

2> ring_benchmark:start(3,3).
Creating a ring of 3 processes.
<0.29.0> is link 3
<0.36.0> is link 2
<0.37.0> is the last link in the chain.

<0.37.0> is waiting to send next message to <0.29.0>
<0.36.0> received connect message from <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.29.0> received connect message from <0.36.0>
All links in the chain now form a ring.

CPU Time (process creation) = 0 ms
Elapse time (process creation) = 0 ms

The newLink/3 function is where the interesting stuff happens as far as process creation is concerned.

087%%-------------------------------------------------------------------------------------------------
088%%
089%% Create the last link in the chain
090%%
091%%-------------------------------------------------------------------------------------------------
092newLink(1, PrevLink, Main) ->
093  ?TraceLastLink(self()),
094
095% Send connect message to the previous link
096  PrevLink ! {connect, self()},
097
098% Get ready to complete the lap by passing the message back to the first link
099  passMsgOn(Main);
100
101%%-------------------------------------------------------------------------------------------------
102%%
103%% Create the next link in the chain
104%%
105%%-------------------------------------------------------------------------------------------------
106newLink(N, PrevLink, Main) ->
107  ?TraceLinkCreation(self(),N),
108
109% Create a new link.
110% Tell it where its predecessor is (i.e. this process) and also which process
111% is at the start of the chain (this value is only used when the last link
112% in the chain is created)
113  NextLink = spawn(ring_benchmark, newLink, [N-1, self(), Main]),
114
115% Wait for connect message from the next link in the chain
116% This will identify the process to which the message should be sent
117  receive
118    {connect, NextLink} ->
119      ?TraceEstablishConnection(self(), NextLink),
120
121% Send a connect message to previous link
122      PrevLink ! {connect, self()},
123
124% Get ready to pass message on to the next link
125      passMsgOn(NextLink)
126  end.
127



When the newLink/3 function is first called, the value of parameter N will be greater than 1, therefore the version of this function seen below will be called.

When the newLink/3 function is finally called with parameter N set to 1, it means this is the last link in the chain. At this point we start a message propagating backwards through the links in the chain. When this message arrives back at the first link (see function start/2, this is the trigger that starts the message round pahse of the program.

The chain is wrapped around into a loop by the fact that function passMsgOn/1 connects back to the id of the start/2 process.



As long as parameter N has some value other than 1, this version of function newLink/3 will be invoked.



newLink/3 iteratively spawns a separate instance of itself using a decremented link counter value.


Then, it sits and waits for a message to arrive. As with function start/2, the only message newLink/3 will respond to is one that matches the pattern {connect, NextLink}. However, the value of NextLink has already been bound by the previous call to spawn/3. Therefore, the only message this process will respond to is one containing the atom connect from precisely the process that is next in the chain. When the connect message is received, it is propagated back to the previous link in the chain.

Finally, function passMsgOn/1 is called. This is the process that performs the actually message passing during the message round phase of the program.

The message round phase of the program is performed by function passMsgOn/1.

128%%-------------------------------------------------------------------------------------------------
129%%
130%% Pass message to the next link in the chain, then restart so
131%% that the message can be sent around the ring multiple times
132%%
133%%-------------------------------------------------------------------------------------------------
134passMsgOn(NextLink) ->
135  ?TraceWaitToPassMsgOn(self(), NextLink),
136
137% Wait for message
138  receive
139    Msg ->
140% Pass message on
141      ?TracePassMsgOn(self(), Msg, NextLink),
142      NextLink ! Msg,
143      passMsgOn(NextLink)
144  end.
145





Function passMsgOn/1 is a simple function that simply listens for a message... any message!

Rather that explicitly looking for an atom followed by a value (as newLink/3 does), it simply takes whatever it receives as a single tuple and passes it unparsed to the next link in the chain. Then it calls itself and waits for the next message.

Message propagation continues like this until we get to the instance of passMsgOn/1 that was created by newLink(1, PrevLink, Main). At this point, the instance of lapCounter/1 invoked by start/2 responds, and we know that we have reached the end of a lap.

The laps are counted off by function lapCounter/1.

146%%-------------------------------------------------------------------------------------------------
147%%
148%% Control how many laps of the ring are performed
149%%
150%%-------------------------------------------------------------------------------------------------
151lapCounter(Pid) ->
152% Wait for whatever message is being passed around.
153% This message will only ever be sent by the start process
154% because it is the first link in the chain
155  receive
156    {1, _} ->
157      io:format("Message round complete~n", []);
158
159    {2, Msg} ->
160      ?TraceLastLap,
161      Pid ! {1, Msg},
162      lapCounter(Pid);
163
164    {N, Msg} ->
165      ?TraceLap(N),
166      Pid ! {N-1, Msg},
167      lapCounter(Pid)
168  end.







All other links in the chain pass messages to each other using function passMsgOn/1. But, in order to count the number of laps we've done, we need a different function to do that. Function lapCounter/1 is invoked by function start/2 and only responds to the message sent from the last link in the chain.

Here, the lap counter is decremented, another lap of the circuit is invoked and lapCounter/1 calls itself in order to respond to the message that will be received once the current lap finishes.

This sequence of processing keeps happening until the lap counter gets down to 1. At this point we know we've just finished the last lap, so processing can terminate and control passes back to function start/2.

Once the message round phase starts, the following trace output is generated:

Starting message round with message "BOO!"
Sending message "BOO!" from <0.29.0> to <0.36.0>
<0.36.0> received message {3,"BOO!"}, passing it on to <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.37.0> received message {3,"BOO!"}, passing it on to <0.29.0>
<0.37.0> is waiting to send next message to <0.29.0>
Passing Go, 2 more laps to go
<0.36.0> received message {2,"BOO!"}, passing it on to <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.37.0> received message {2,"BOO!"}, passing it on to <0.29.0>
<0.37.0> is waiting to send next message to <0.29.0>
Passing Go, 1 more lap to go
<0.36.0> received message {1,"BOO!"}, passing it on to <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.37.0> received message {1,"BOO!"}, passing it on to <0.29.0>
<0.37.0> is waiting to send next message to <0.29.0>
Message round complete

CPU Time (sending message) = 0 ms
Elapse Time (sending message) = 0 ms

CPU Time (total) = 0 ms
Elapse Time (total) = 0 ms
ok

Benchmark test results

I ran the following tests on an IBM Thinkpad T60 with a dual core T7200 processor running @ 2GHz and 2Gb of RAM. The test was configured such that only one of the two processor cores was used.

The Erlang virtual machine was start with the switch +P 500000 to cause the system process limit to be raised from the default value of 32768 to 500,000. This seems to be maximum possible value on a Windoze implementation, but the Linux implementation allows for 2^27 - 1 = 134,217,727 processes.

Since the Erlang runtime itself uses several processes, the final test had to be run with slightly fewer processes than the system limit would indicate. This avoids a system crash!

Processes Creation
Time (ms)
1,000 0
5,000 15
10,000 47
50,000 187
100,000 390
200,000 594
300,000 1,031
400,000 1,344
499,900 1,703
 
Message Laps
100 500 1,000 5,000 10,000
Processes 1,000 172 610 1,016 5,031 9,844
5,000 641 3,047 5,985 30,032 59,344
10,000 1,313 6,531 13,078 65,469 130,984
50,000 6,531 32,657 65,922 330,610 670,125
100000 12,937 65,781 131,844 665,782 1,364,092
200,000 26,281 132,093 267,219 1,413,618 2,717,368
300,000 39,125 198,484 402,125 2,057,991 4,101,583
400,000 52,343 267,422 536,187 2,720,232 5,451,293
499,900 65,531 334,562 666,953 3,438,453 6,976,977

I was really surprised to see the rate at which Erlang can spawn new processes! The average rate from the above tests show that Erlang can consistently create just under 300 processes per millisecond.

Once the process ring had been created, the second part of the test was to send a small message for a certain number of laps around the ring. This is where the CPU time really starts to be consumed.

As can be seen from the table above, the runtime stays very linear as both the number of message laps and the size of the ring increases

The final test shows that the message is passed for 10,000 laps around a 499,900 process ring in just under 7000 seconds. Switching to "bucket maths" mode, this is roughly the same as passing a single message along a chain 5 billion links in length in just under 2 hours. In other words, the message was being passed at a rate of 716 links per millisecond.

Pretty impressive really!

3D graph of performance test results for Erlang Ring Benchmark

The above graph shows that as the number of processes in the ring and the number of laps rises, the runtime remains very nearly linear.

The X axis shows the number of processes in the ring, the Z axis shows hows many times the message was passed around the ring, and the Y axis shows the overall runtime in milliseconds.


Running the program with tracing switched off and a more realistic number of processes in the ring, gives the following results.

Here, the program is told to pass the message 100 times around a ring containing 10,000 processes.

1> ring_benchmark:start(10000,100).
Creating a ring of 10000 processes.

CPU Time (process creation) = 31 ms
Elapse time (process creation) = 63 ms

Starting message round with message "BOO!"
Message round complete

CPU Time (sending message) = 1359 ms
Elapse Time (sending message) = 1360 ms

CPU Time (total) = 1390 ms
Elapse Time (total) = 1423 ms
ok
2>

If you're learning Erlang (and you've read this far down the page!), then I hope I have helped increase your understanding of how Erlang programs work.



© Chris Whealy, 2009