SpinLock v.s. Mutex v.s. Atomics

Spinlock v.s. Mutex

Both manage a limited resource. I’ll first describe difference between binary semaphore (mutex) and spin lock.

Spin locks perform a busy wait – i.e. it keeps running loop:

while (try_acquire_resource ());
...
release();

It performs very lightweight locking/unlocking but if the locking thread will be preempted by other which will try to access the same resouce the second one will simply try to acquitre resource untill it run out of it CPU quanta.

On the other hand mutex behave more like:

if (!try_lock()) {
    add_to_waiting_queue ();
    wait();
}
...
process *p = get_next_process_from_waiting_queue ();
p->wakeUp ();   

Hence if the thread will try to acquire blocked resource it will be suspended till it will be avaible for it. Locking/unlocking is much more heavy but the waiting is ‘free’ and ‘fair’.

Semaphore is a lock that is allowed to be used multiple (known from initialization) number of times – for example 3 threads are allowed to simultainusly hold the resource but no more. It is used for example in producer/consumer problem or in general in queues:

P(resources_sem)
resource = resources.pop()
...
resources.push(resources)
V(resources_sem)

 

Comparing the performance of atomic, spinlock and mutex

I was curious in benchmark of different synchronization mechanisms: atomic, spinlock, mutex.

Without synchronization

#include <future>
#include <iostream>

volatile int value = 0;

int loop(bool inc, int limit) {
  std::cout << "Started " << inc << " " << limit << std::endl;
  for (int i = 0; i < limit; ++i) {
    if (inc) { 
      ++value;
    } else {
      --value;
    }
  }
  return 0;
}

int main() {
  auto f = std::async(std::launch::async, std::bind(loop, true, 20000000));
  loop(false, 10000000);
  f.wait();
  std::cout << value << std::endl;
}

Compiling via clang:

clang++ -std=c++11 -stdlib=libc++ -O3 -o test test.cpp && time ./test

Run:

SSttaarrtteedd  10  2100000000000000

11177087

real    0m0.070s
user    0m0.089s
sys 0m0.002s

Obviously, the increment and decrement operations aren’t atomic, and the value variable contains garbage at the end.

LOCK

#include <future>
#include <iostream>

volatile int value = 0;

int loop(bool inc, int limit) {
  std::cout << "Started " << inc << " " << limit << std::endl;
  for (int i = 0; i < limit; ++i) {
    if (inc) { 
      asm("LOCK");
      ++value;
    } else {
      asm("LOCK");
      --value;
    }
  }
  return 0;
}

int main() {
  auto f = std::async(std::launch::async, std::bind(loop, true, 20000000));
  loop(false, 10000000);
  f.wait();
  std::cout << value << std::endl;
} 

Run:

SSttaarrtteedd  10  2000000100000000

10000000

real    0m0.481s
user    0m0.779s
sys 0m0.005s

Now value has the correct value at the end, but, of course, this is a dirty non-portable hack for x86 only. For example, this code works for me only if compiled with -O3. Otherwise it crashes with “illegal instruction” because the compiler injects extra stuff between the LOCK instruction and the following increment or decrement.

Atomic

#include <future>
#include <iostream>
#include "boost/interprocess/detail/atomic.hpp"

using namespace boost::interprocess::ipcdetail;

volatile boost::uint32_t value = 0;

int loop(bool inc, int limit) {
  std::cout << "Started " << inc << " " << limit << std::endl;
  for (int i = 0; i < limit; ++i) {
    if (inc) { 
      atomic_inc32(&value);
    } else {
      atomic_dec32(&value);
    }
  }
  return 0;
}

int main() {
  auto f = std::async(std::launch::async, std::bind(loop, true, 20000000));
  loop(false, 10000000);
  f.wait();
  std::cout << atomic_read32(&value) << std::endl;
}

Run:

SSttaarrtteedd  10  2100000000000000

10000000

real    0m0.457s
user    0m0.734s
sys 0m0.004s

The result is correct, and the time is almost the same as with LOCK. Not surprisingly, atomic under cover also uses LOCK but in a portable and guaranteed way.

Spinlock

#include <future>
#include <iostream>
#include "boost/smart_ptr/detail/spinlock.hpp"

boost::detail::spinlock lock;
volatile int value = 0;

int loop(bool inc, int limit) {
  std::cout << "Started " << inc << " " << limit << std::endl;
  for (int i = 0; i < limit; ++i) {
    std::lock_guard<boost::detail::spinlock> guard(lock);
    if (inc) { 
      ++value;
    } else {
      --value;
    }
  }
  return 0;
}

int main() {
  auto f = std::async(std::launch::async, std::bind(loop, true, 20000000));
  loop(false, 10000000);
  f.wait();
  std::cout << value << std::endl;
}

Run:

SSttaarrtteedd  10  2100000000000000

10000000

real    0m0.541s
user    0m0.675s
sys 0m0.089s

A bit slower but not much.

Mutex

#include <future>
#include <iostream>

std::mutex mutex;
volatile int value = 0;

int loop(bool inc, int limit) {
  std::cout << "Started " << inc << " " << limit << std::endl;
  for (int i = 0; i < limit; ++i) {
    std::lock_guard<std::mutex> guard(mutex);
    if (inc) { 
      ++value;
    } else {
      --value;
    }
  }
  return 0;
}

int main() {
  auto f = std::async(std::launch::async, std::bind(loop, true, 20000000));
  loop(false, 10000000);
  f.wait();
  std::cout << value << std::endl;
}

Run:

SSttaarrtteedd  10  2010000000000000

10000000

real    0m25.229s
user    0m7.011s
sys 0m22.667s

Now it works much slower, really.

Benchmark

Method Time (sec.)
No synchronization 0.070
LOCK 0.481
Atomic 0.457
Spinlock 0.541
Mutex 22.667

Of course, the result depends really on the platform and the compiler (I tested on Mac Air and clang). But for me it was quite interesting to see that spinlock, in spite of its more sophisticated implementation comparing to atomics, works not much slower.

 

Mutex v/s Semaphore v/s Spinlock

Similarity

– All of these are used for synchronization

Difference

Mutex provides one person to access a single resource at a time, others must wait in a queue. Once this person is done, the guy next in the queue acquire the resource.
So access is serial, one guy after other. Too aggressive.

Semaphore are useful if multiple instances (N) of a resource is to be shared among a set of users. As soon as all N resources are acquired, any new requester has to wait. Since there is no single lock to hold, there is as such no ownership of a semaphore.

Spinlock is an aggressive mutex. In mutex, if you find that resource is locked by someone else, you (the thread/process) switch the context and start to wait (non-blocking).
Whereas spinlocks do not switch context and keep spinning. As soon as resource is free, they go and grab it. In this process of spinning, they consume many CPU cycles. Also, on a uni-processor machine they are useless and perform very badly (do I need to explain that?).

 

 

 

 

 

 

 

 

Ref.:

http://demin.ws/blog/english/2012/05/05/atomic-spinlock-mutex/

http://unix.stackexchange.com/questions/5214/what-is-the-difference-between-spin-locks-and-semaphores

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s