Having spent the last 25 years writing programs in imperative languages such as Assembler, PL/1, Visual Basic, ABAP and Java, I decided it was high time to give my brain a new challenge.
Several factors were involved here during the search for a new language, these being my interests in:
Now that the concepts of the Semantic Web are becoming more and more of a reality, I thought it appropriate to look at a language that has the built-in capabilities needed to implement these concepts.
After hunting around for a while, I discovered that back in the mid-eighties, Ericson developed a cool language called Erlang to run telephone exchanges. Ericson's design requirements were to create a language that would:
Because computers are now being shipped with multi-core processors, it has become practical to re-examine programming languages designed for concurrency - hence my decision to learn Erlang.
When I started looking at Erlang, the syntax was easy enough to learn, but I rapidly discovered that functional languages are based on a fundamentally different way of thinking, and use design principles that are very different from those used by imperative languages. Therefore, I first had to learn how to think in Erlang before I could write even the simplest programs.
Since Erlang is based on the concepts of iteration and the Lambda Calculus, I found I could call on very little of my previous programming experience to act as a guide.
In his book "Programming Erlang", Joe Armstrong (the author of the language) casually throws out the following programming challenge...
Write a ring benchmark. Create N processes in a ring.
Send a message round the ring M times so that a total of N * M messages get sent.
Time how long this takes for different values of N and M.
I put this explanation of the solution together as a means of explaining to myself what is going in an Erlang program - and since it helped me, I venture to say that it will help you also! :-)
Since I am still quite new to thinking in Erlang, I figured that the easiest way to arrive at an answer was to reverse engineer the answers other have created. Therefore, I hunted around the internet for existing solutions, found one that I thought looked suitable and took it to pieces. In the process of doing so, I discovered I could implement my own solution in a slightly more efficient manner (not by much though...).
So using Rogier Peters' solution as a starting point, I took it to bits, renamed the functions in a way that made more sense to me, and tweaked a couple of things here and there to help me understand what was going on. You'll notice that I have defined many macros to trace how each step of the solution is being executed. These statements can be switched on and off using a compiler option.
| 001 | -module(ring_benchmark). |
| 002 | -export([start/2, newLink/3]). |
| 003 | %%------------------------------------------------------------------------------------------------- |
| 004 | %% |
| 005 | %% Define various macros |
| 006 | %% |
| 007 | %%------------------------------------------------------------------------------------------------- |
| 008 | -ifdef(trace). |
| 009 | -define(TraceLinkCreation(Pid,N), |
| 010 | io:format("~w is link ~w~n",[Pid, N])). |
| 011 | -define(TraceLastLink(Pid), |
| 012 | io:format("~w is the last link in the chain.~n~n", [Pid])). |
| 013 | -define(TraceLap(N), |
| 014 | io:format("Passing Go, ~p more laps to go~n", [N-1])). |
| 015 | -define(TraceLastLap, |
| 016 | io:format("Passing Go, 1 more lap to go~n", [])). |
| 017 | -define(TraceWaitToPassMsgOn(Pid, NextLink), |
| 018 | io:format("~w is waiting to send next message to ~w~n",[Pid, NextLink])). |
| 019 | -define(TracePassMsgOn(Pid, Msg, NextLink), |
| 020 | io:format("~w received message ~p, passing it on to ~w~n", [Pid, Msg, NextLink])). |
| 021 | -define(TraceRingComplete(Pid, NextLink), |
| 022 | io:format("~w received connect message from ~w~nAll links in the chain now form a ring.~n",[Pid, NextLink])). |
| 023 | -define(TraceSendingMsg(Msg, Pid, NextLink), |
| 024 | io:format("Sending message ~p from ~w to ~p~n",[Msg, Pid, NextLink])). |
| 025 | -define(TraceEstablishConnection(Pid, NextLink), |
| 026 | io:format("~w received connect message from ~w~n",[Pid, NextLink])). |
| 027 | -else. |
| 028 | -define(TraceLinkCreation(Pid,N), void). |
| 029 | -define(TraceLastLink(Pid), void). |
| 030 | -define(TraceLap(N), void). |
| 031 | -define(TraceLastLap, void). |
| 032 | -define(TraceWaitToPassMsgOn(Pid, NextLink), void). |
| 033 | -define(TracePassMsgOn(Pid, Msg, NextLink), void). |
| 034 | -define(TraceRingComplete(Pid, NextLink), void). |
| 035 | -define(TraceSendingMsg(Msg, Pid, NextLink), void). |
| 036 | -define(TraceEstablishConnection(Pid, NextLink), void). |
| 037 | -endif. |
| 038 | |
| 039 | %%------------------------------------------------------------------------------------------------- |
| 040 | %% |
| 041 | %% Entry point |
| 042 | %% |
| 043 | %%------------------------------------------------------------------------------------------------- |
| 044 | start(N, M) -> |
| 045 | % Reset timers |
| 046 | statistics(runtime), |
| 047 | statistics(wall_clock), |
| 048 | |
| 049 | % Define the message to be sent around the ring |
| 050 | Blah = "BOO!", |
| 051 | |
| 052 | % Start ring creation |
| 053 | io:format("Creating a ring of ~p processes.~n",[N]), |
| 054 | ?TraceLinkCreation(self(),N), |
| 055 | spawn(ring_benchmark, newLink, [N-1, self(), self()]), |
| 056 | |
| 057 | % Wait for connect message from the next link in the chain |
| 058 | % then start the message going |
| 059 | receive |
| 060 | {connect, NextLink} -> |
| 061 | ?TraceRingComplete(self(), NextLink), |
| 062 | |
| 063 | % How long did it take to create the processes? |
| 064 | {_, CPUTime_CreateProcesses} = statistics(runtime), |
| 065 | {_, ElapseTime_CreateProcesses} = statistics(wall_clock), |
| 066 | |
| 067 | io:format("~nCPU Time (process creation) = ~p ms~nElapse time (process creation) = ~p ms~n", |
| 068 | [CPUTime_CreateProcesses, ElapseTime_CreateProcesses]), |
| 069 | io:format("~nStarting message round with message ~p~n",[Blah]), |
| 070 | |
| 071 | ?TraceSendingMsg(Blah, self(), NextLink), |
| 072 | |
| 073 | NextLink ! {M, Blah}, |
| 074 | lapCounter(NextLink) |
| 075 | end, |
| 076 | |
| 077 | % Find out how long that lot took |
| 078 | {_, CPUTime_SendMessage} = statistics(runtime), |
| 079 | {_, ElapseTime_SendMessage} = statistics(wall_clock), |
| 080 | |
| 081 | io:format("~nCPU Time (sending message) = ~p ms~nElapse Time (sending message) = ~p ms~n", |
| 082 | [CPUTime_SendMessage, ElapseTime_SendMessage]), |
| 083 | io:format("~nCPU Time (total) = ~p ms~nElapse Time (total) = ~p ms~n", |
| 084 | [CPUTime_CreateProcesses + CPUTime_SendMessage, |
| 085 | ElapseTime_CreateProcesses + ElapseTime_SendMessage]). |
| 086 | |
| 087 | %%------------------------------------------------------------------------------------------------- |
| 088 | %% |
| 089 | %% Create the last link in the chain |
| 090 | %% |
| 091 | %%------------------------------------------------------------------------------------------------- |
| 092 | newLink(1, PrevLink, Main) -> |
| 093 | ?TraceLastLink(self()), |
| 094 | |
| 095 | % Send connect message to the previous link |
| 096 | PrevLink ! {connect, self()}, |
| 097 | |
| 098 | % Get ready to complete the lap by passing the message back to the first link |
| 099 | passMsgOn(Main); |
| 100 | |
| 101 | %%------------------------------------------------------------------------------------------------- |
| 102 | %% |
| 103 | %% Create the next link in the chain |
| 104 | %% |
| 105 | %%------------------------------------------------------------------------------------------------- |
| 106 | newLink(N, PrevLink, Main) -> |
| 107 | ?TraceLinkCreation(self(),N), |
| 108 | |
| 109 | % Create a new link. |
| 110 | % Tell it where its predecessor is (i.e. this process) and also which process |
| 111 | % is at the start of the chain (this value is only used when the last link |
| 112 | % in the chain is created) |
| 113 | NextLink = spawn(ring_benchmark, newLink, [N-1, self(), Main]), |
| 114 | |
| 115 | % Wait for connect message from the next link in the chain |
| 116 | % This will identify the process to which the message should be sent |
| 117 | receive |
| 118 | {connect, NextLink} -> |
| 119 | ?TraceEstablishConnection(self(), NextLink), |
| 120 | |
| 121 | % Send a connect message to previous link |
| 122 | PrevLink ! {connect, self()}, |
| 123 | |
| 124 | % Get ready to pass message on to the next link |
| 125 | passMsgOn(NextLink) |
| 126 | end. |
| 127 | |
| 128 | %%------------------------------------------------------------------------------------------------- |
| 129 | %% |
| 130 | %% Pass message to the next link in the chain, then restart so |
| 131 | %% that the message can be sent around the ring multiple times |
| 132 | %% |
| 133 | %%------------------------------------------------------------------------------------------------- |
| 134 | passMsgOn(NextLink) -> |
| 135 | ?TraceWaitToPassMsgOn(self(), NextLink), |
| 136 | |
| 137 | % Wait for message |
| 138 | receive |
| 139 | Msg -> |
| 140 | % Pass message on |
| 141 | ?TracePassMsgOn(self(), Msg, NextLink), |
| 142 | NextLink ! Msg, |
| 143 | passMsgOn(NextLink) |
| 144 | end. |
| 145 | |
| 146 | %%------------------------------------------------------------------------------------------------- |
| 147 | %% |
| 148 | %% Control how many laps of the ring are performed |
| 149 | %% |
| 150 | %%------------------------------------------------------------------------------------------------- |
| 151 | lapCounter(Pid) -> |
| 152 | % Wait for whatever message is being passed around. |
| 153 | % This message will only ever be sent by the start process |
| 154 | % because it is the first link in the chain |
| 155 | receive |
| 156 | {1, _} -> |
| 157 | io:format("Message round complete~n", []); |
| 158 | |
| 159 | {2, Msg} -> |
| 160 | ?TraceLastLap, |
| 161 | Pid ! {1, Msg}, |
| 162 | lapCounter(Pid); |
| 163 | |
| 164 | {N, Msg} -> |
| 165 | ?TraceLap(N), |
| 166 | Pid ! {N-1, Msg}, |
| 167 | lapCounter(Pid) |
| 168 | end. |
Erlang (BEAM) emulator version 5.6.5 [smp:2] [async-threads:0]
consulting .erlang in "C:/Program Files/erl5.6.5/bin"
c:/Erlang/code
Now in:"c:/Erlang/code"
Eshell V5.6.5 (abort with ^G)
1> c(ring_benchmark, {d, trace}).
{ok,ring_benchmark}
|
After the Erlang shell starts, the .erlang config file instructs the shell to change into the directory where the source code lives.
To compile the program with tracing switched on, use the function c/2 and enter c(ring_Benchmark, {d, trace}). at the ERL (or WERL) command line. Otherwise, compile as normal using c/1.
The second parameter is a tuple that defines compiler options. In this case {d, trace} causes the value of a flag called trace to be set to true. If this compiler option is omitted, then although the trace flag has not been explicitly defined, it acquires the implicit value of false.
If the trace flag is on, the macros defined by the -define() statements will be translated to calls to io:format(), else they will just become the atom void.
2> ring_benchmark:start(3,3). Creating a ring of 3 processes. |
The ring benchmark program is run by calling the start/2 function with the parameters 3 and 3. This specifies that the chain should contain three links and that the message should be passed around the loop three times.
Function start/2 operates in the following way:
|
First, the start/2 function resets two system timers to measure CPU time and elapse time. The message to be passed around the ring is then bound to the variable Blah. Then a new process instance of the newLink/3 function is spawned. The parameters it passes are the number of processes yet to be created, a reference to the previous link in the chain and a reference to the start link in the chain. Since this is the first link to be created, the value of the previous link and the starting link are the same. The start/2 process then sits and waits for a message to arrive. The only message it will respond to is one that matches the pattern {connect, NextLink}. In otherwords, the message must start with the atom connect followed by a single value. Since the variable NextLink is unbound at this point in time, it doesn't actually matter at what the second value is. Whatever value is received will therefore be bound to the variable NextLink. When this message is received, the process creation phase of the program has completed. So the current CPU and elapse timer values are stored and the message round phase of the program begins. The message round phase is started by sending the next link in the chain the message {M, Blah} - where the variable M is bound to the start parameter defining the number of laps, and the variable Blah is bound to the message value. Function passMsgOn/1 will respond to this message. The lapCounter/1 function is then called with the process id of the next link in the chain. This function receives a message at the end of each lap, and is responsible for invoking the next lap. It continues to do this until the lap counter reaches 1, at which point all the laps have been completed and the program terminates. The final step is to output the elapse and CPU time values for the message round phase of the program. |
The following trace information shows the steps involved whilst the chain is being created.
2> ring_benchmark:start(3,3). Creating a ring of 3 processes. <0.29.0> is link 3 <0.36.0> is link 2 <0.37.0> is the last link in the chain. <0.37.0> is waiting to send next message to <0.29.0> <0.36.0> received connect message from <0.37.0> <0.36.0> is waiting to send next message to <0.37.0> <0.29.0> received connect message from <0.36.0> All links in the chain now form a ring. CPU Time (process creation) = 0 ms Elapse time (process creation) = 0 ms |
The newLink/3 function is where the interesting stuff happens as far as process creation is concerned.
|
When the newLink/3 function is first called, the value of parameter N will be greater than 1, therefore the version of this function seen below will be called. When the newLink/3 function is finally called with parameter N set to 1, it means this is the last link in the chain. At this point we start a message propagating backwards through the links in the chain. When this message arrives back at the first link (see function start/2, this is the trigger that starts the message round pahse of the program. The chain is wrapped around into a loop by the fact that function passMsgOn/1 connects back to the id of the start/2 process. As long as parameter N has some value other than 1, this version of function newLink/3 will be invoked. newLink/3 iteratively spawns a separate instance of itself using a decremented link counter value. Then, it sits and waits for a message to arrive. As with function start/2, the only message newLink/3 will respond to is one that matches the pattern {connect, NextLink}. However, the value of NextLink has already been bound by the previous call to spawn/3. Therefore, the only message this process will respond to is one containing the atom connect from precisely the process that is next in the chain. When the connect message is received, it is propagated back to the previous link in the chain. Finally, function passMsgOn/1 is called. This is the process that performs the actually message passing during the message round phase of the program. |
The message round phase of the program is performed by function passMsgOn/1.
|
Function passMsgOn/1 is a simple function that simply listens for a message... any message! Rather that explicitly looking for an atom followed by a value (as newLink/3 does), it simply takes whatever it receives as a single tuple and passes it unparsed to the next link in the chain. Then it calls itself and waits for the next message. Message propagation continues like this until we get to the instance of passMsgOn/1 that was created by newLink(1, PrevLink, Main). At this point, the instance of lapCounter/1 invoked by start/2 responds, and we know that we have reached the end of a lap. |
The laps are counted off by function lapCounter/1.
|
All other links in the chain pass messages to each other using function passMsgOn/1. But, in order to count the number of laps we've done, we need a different function to do that. Function lapCounter/1 is invoked by function start/2 and only responds to the message sent from the last link in the chain. Here, the lap counter is decremented, another lap of the circuit is invoked and lapCounter/1 calls itself in order to respond to the message that will be received once the current lap finishes. This sequence of processing keeps happening until the lap counter gets down to 1. At this point we know we've just finished the last lap, so processing can terminate and control passes back to function start/2. |
Once the message round phase starts, the following trace output is generated:
Starting message round with message "BOO!"
Sending message "BOO!" from <0.29.0> to <0.36.0>
<0.36.0> received message {3,"BOO!"}, passing it on to <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.37.0> received message {3,"BOO!"}, passing it on to <0.29.0>
<0.37.0> is waiting to send next message to <0.29.0>
Passing Go, 2 more laps to go
<0.36.0> received message {2,"BOO!"}, passing it on to <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.37.0> received message {2,"BOO!"}, passing it on to <0.29.0>
<0.37.0> is waiting to send next message to <0.29.0>
Passing Go, 1 more lap to go
<0.36.0> received message {1,"BOO!"}, passing it on to <0.37.0>
<0.36.0> is waiting to send next message to <0.37.0>
<0.37.0> received message {1,"BOO!"}, passing it on to <0.29.0>
<0.37.0> is waiting to send next message to <0.29.0>
Message round complete
CPU Time (sending message) = 0 ms
Elapse Time (sending message) = 0 ms
CPU Time (total) = 0 ms
Elapse Time (total) = 0 ms
ok |
I ran the following tests on an IBM Thinkpad T60 with a dual core T7200 processor running @ 2GHz and 2Gb of RAM. The test was configured such that only one of the two processor cores was used.
The Erlang virtual machine was start with the switch +P 500000 to cause the system process limit to be raised from the default value of 32768 to 500,000. This seems to be maximum possible value on a Windoze implementation, but the Linux implementation allows for 2^27 - 1 = 134,217,727 processes.
Since the Erlang runtime itself uses several processes, the final test had to be run with slightly fewer processes than the system limit would indicate. This avoids a system crash!
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I was really surprised to see the rate at which Erlang can spawn new processes! The average rate from the above tests show that Erlang can consistently create just under 300 processes per millisecond. Once the process ring had been created, the second part of the test was to send a small message for a certain number of laps around the ring. This is where the CPU time really starts to be consumed. |
As can be seen from the table above, the runtime stays very linear as both the number of message laps and the size of the ring increases The final test shows that the message is passed for 10,000 laps around a 499,900 process ring in just under 7000 seconds. Switching to "bucket maths" mode, this is roughly the same as passing a single message along a chain 5 billion links in length in just under 2 hours. In other words, the message was being passed at a rate of 716 links per millisecond. Pretty impressive really! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The above graph shows that as the number of processes in the ring and the number of laps rises, the runtime remains very nearly linear. The X axis shows the number of processes in the ring, the Z axis shows hows many times the message was passed around the ring, and the Y axis shows the overall runtime in milliseconds. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Running the program with tracing switched off and a more realistic number of processes in the ring, gives the following results.
Here, the program is told to pass the message 100 times around a ring containing 10,000 processes.
1> ring_benchmark:start(10000,100). Creating a ring of 10000 processes. CPU Time (process creation) = 31 ms Elapse time (process creation) = 63 ms Starting message round with message "BOO!" Message round complete CPU Time (sending message) = 1359 ms Elapse Time (sending message) = 1360 ms CPU Time (total) = 1390 ms Elapse Time (total) = 1423 ms ok 2> |
If you're learning Erlang (and you've read this far down the page!), then I hope I have helped increase your understanding of how Erlang programs work.
© Chris Whealy, 2009