Exploring the Erlang NIF with Murmur3: Part 1

Yesterday I read about calling C libraries in Haskell, especially how the
Haskell foreign function interface
works
.
Never had to do this in Erlang I decided it would be a nice exercise, so here we
go wrapping Peter Scotts Murmur3. The
end result is available on github
for anybody interested. Also I am not an Erlang so take everything with a grain
of salt.

Looking for FFI in Erlang

There are actually multiple ways to call out from Erlang, you can use Ports and
actually communicate with an external process via passing messages which is nice
and save, or you can implement wrappers in C and use them as NIFs. NIF stands
for native implemented function, so the library is actually dynamically linked
to Erlang which can be dangerous since an error in the library can crash the
Erlang VM, but it is also the fastest way to call out to C land. Since I was
interested in the closest thing to FFI really I decided to create a NIF wrapper.

Exploring Murmur3

The C library for the Murmur3 hash function actually consists of 3 functions, 2
optimized for x86 and one for x64, but since they all work on any architecture
it made sense to wrap all of them. All functions look really similar.

void MurmurHash3_x86_32 (const void *key, int len, uint32_t seed, void *out);
void MurmurHash3_x86_128 (const void *key, int len, uint32_t seed, void *out);
void MurmurHash3_x64_128 (const void *key, int len, uint32_t seed, void *out);

The first argument is the thing to be hashed, being of length len. The hash is
also seeded so passing in the seed value is mandatory. After it is done the
result will be placed in out, which is either 32 Bit or 128 Bit in size.

Setting up the project

Using rebar has proved to be the easiest way to
setup a project.

$ mkdir murmerl
$ cd murmerl
$ wget http://cloud.github.com/downloads/basho/rebar/rebar && chmod u+x rebar
$ mkdir src
$ touch src/murmerl.app.src
$ touch src/murmerl.erl

murmerl.app.src is really just a really basic app file containing only the
following.

http://gist-it.appspot.com/github/sideshowcoder/murmerl/blob/6eee6fb4a1f26890a86dbdbbfc6fef72c34d5a4a/src/murmerl.app.src?footer=0

Now all we need is a place for storing the C code, and put the Sources in.

$ mkdir c_src
$ cp <Download>/murmur3.c c_src/
$ cp <Download>/murmur3.h c_src/
$ touch c_src/murmur3_nif.c

After telling rebar about the C sources the project can be compiled and run.

http://gist-it.appspot.com/github/sideshowcoder/murmerl/blob/6eee6fb4a1f26890a86dbdbbfc6fef72c34d5a4a/rebar.conf?footer=0

$ rebar compile

Which will create priv/murmerl_drv.so as well as the beam and app files in
ebin.

NIF in the Erlang world

NIFs behave like any other function from a callers perspective, so we need to
define a module for them which is in charge of loading the library as well as
provide some fallback if loading fails. Providing fallbacks is done via
providing alternative function implementations.

http://gist-it.appspot.com/github/sideshowcoder/murmerl/blob/6eee6fb4a1f26890a86dbdbbfc6fef72c34d5a4a/src/murmerl.erl?footer=0&slice=32:35

To make it now work we need to load the library via the on_load for the Erlang
module and make sure all the functions are exported as well.

http://gist-it.appspot.com/github/sideshowcoder/murmerl/blob/6eee6fb4a1f26890a86dbdbbfc6fef72c34d5a4a/src/murmerl.erl?footer=0&slice=0:30

Thats it for the Erlang side of things, now for C.

NIF in the C world

Everything needed to bind everything together is present in erl_nif.h which
needs to be imported. Each of our functions looks really similar, they return an
Erlang Term, and take the current environment, argument count and arguments
array. Pulling the arguments apart needs to be done inside the function so this
is the first thing.

http://gist-it.appspot.com/github/sideshowcoder/murmerl/blob/6eee6fb4a1f26890a86dbdbbfc6fef72c34d5a4a/c_src/murmur3_nif.c?footer=0&slice=20:42

Erlang provides a lot of functions
here
we expect something string
like as the first argument so we pull the first argument inside the in struct
and extract the second as an int. The actual return value will be a binary so we
need to allocate one of the right size, 4 to get 4 Bytes = 32 Bit. Since this
is just a piece of memory, it is best to zero it out before using. Now all
there is to do is let MurmurHash3_x86_32 write to the allocated chunk and return
as a Erlang Binary for everybody to use.

To make the functions available to Erlang we need to tell the VM about them,
this is done here

http://gist-it.appspot.com/github/sideshowcoder/murmerl/blob/6eee6fb4a1f26890a86dbdbbfc6fef72c34d5a4a/c_src/murmur3_nif.c?footer=0&slice=88:1000

Which exports the nif_funcs array containing a structure with

{ name_in_erlang, arity, c_function }

for each of the exported functions. And thats it the functions can now be called
from Erlang like any normal Erlang Function in the module. For the full code
see the github repository
.

Where to go from here?

Currently all the module does is forward the calls as is to the C library, which
does not feel very Erlang like, so building a nicer API is up next.

Resources

More examples can be found in davisp github
repository
, also the Erlang NIF
page
can be of great help but you
need to know what to look for.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s