Skip to content

ThreadSanitizerGoManual

Alexander Potapenko edited this page Aug 31, 2015 · 1 revision

Introduction

Note: The data race detector is not yet available as part of any release. To use it you need to build Go tip from sources. It will be available in the Go 1.1 release.

Data races are one of the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable w/o proper synchronization and at least one of the accesses is write.

Here is an example of a data race on map variable that can lead to crashes and memory corruptions:

func main() {
	c := make(chan bool)
	m := make(map[string]string)
	go func() {
		m["1"] = "a"  // First conflicting access.
		c <- true
	}()
	m["2"] = "b"  // Second conflicting access.
	<-c
	for k, v := range m {
		fmt.Println(k, v)
	}
}

Usage

Fortunately, Go includes built-in data race detector. The usage is very simple -- you just need to add -race flag to go command:

$ go test -race mypkg // to test the package
$ go run -race mysrc.go // to run the source file
$ go build -race mycmd // to build the command
$ go install -race mypkg // to install the package

Report Format

When the race detector finds a data race in the program, it prints an informative report. The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. You may see an example below:

WARNING: DATA RACE
Read by goroutine 185:
  net.(*pollServer).AddFD()
      src/pkg/net/fd_unix.go:89 +0x398
  net.(*pollServer).WaitWrite()
      src/pkg/net/fd_unix.go:247 +0x45
  net.(*netFD).Write()
      src/pkg/net/fd_unix.go:540 +0x4d4
  net.(*conn).Write()
      src/pkg/net/net.go:129 +0x101
  net.func·060()
      src/pkg/net/timeout_test.go:603 +0xaf

Previous write by goroutine 184:
  net.setWriteDeadline()
      src/pkg/net/sockopt_posix.go:135 +0xdf
  net.setDeadline()
      src/pkg/net/sockopt_posix.go:144 +0x9c
  net.(*conn).SetDeadline()
      src/pkg/net/net.go:161 +0xe3
  net.func·061()
      src/pkg/net/timeout_test.go:616 +0x3ed

Goroutine 185 (running) created at:
  net.func·061()
      src/pkg/net/timeout_test.go:609 +0x288

Goroutine 184 (running) created at:
  net.TestProlongTimeout()
      src/pkg/net/timeout_test.go:618 +0x298
  testing.tRunner()
      src/pkg/testing/testing.go:301 +0xe8

Options

You can pass some options to the race detector by means of GORACE environment variable. The format is:

GORACE="option1=val1 option2=val2"

The options are:

  • log_path: Tells race detector to write reports to 'log_path.pid' file. The special values are 'stdout' and 'stderr'. The default is 'stderr'.
  • strip_path_prefix: Allows to strip beginnings of file paths in reports to make them more concise.
  • history_size: Per-goroutine history size, controls how many previous memory accesses are remembered per goroutine. Possible values are [0..7]. history_size=0 amounts to 32K memory accesses. Each next value doubles the amount of memory accesses, up to history_size=7 that amounts to 4M memory accesses. The default value is 1 (64K memory accesses). Try to increase this value when you see "failed to restore the stack" in reports. However, it can significantly increase memory consumption.

Example:

$ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race

How To Use

You may start with just running your tests under the race detector (go test -race). However sometimes tests have limited coverage, especially with respect to concurrency. The race detector finds only races that actually happen in the execution, it can't find races in code paths that were not executed. So it may be beneficial to run the whole program built with -race under a realistic workload, frequently it discovers much more bugs than tests.

Typical Data Races

Here are some example of typical data races. All of them can be automatically detected with the race detector.

Race on loop counter

func main() {
	var wg sync.WaitGroup
	wg.Add(5)
	for i := 0; i < 5; i++ {
		go func() {
			fmt.Println(i)  // Not the 'i' you are looking for.
			wg.Done()
		}()
	}
	wg.Wait()
}

Closures capture variables by reference rather than by value, so the reads of the 'i' variable in the goroutines race with 'i' increment in the loop statement. Such program typically outputs 55555 instead of expected 01234. The program can be fixed by explicitly making a copy of the loop counter:

func main() {
	var wg sync.WaitGroup
	wg.Add(5)
	for i := 0; i < 5; i++ {
		go func(j int) {
			fmt.Println(j)  // Good. Read local copy of the loop counter.
			wg.Done()
		}(i)
	}
	wg.Wait()
}

Accidentally shared variable

// ParallelWrite writes data to file1 and file2, returns the errors.
func ParallelWrite(data []byte) chan error {
	res := make(chan error, 2)
	f1, err := os.Create("file1")
	if err != nil {
		res <- err
	} else {
		go func() {
			// This err is shared with the main goroutine,
			// so the write races with the write below.
			_, err = f1.Write(data)
			res <- err
			f1.Close()
		}()
	}
	f2, err := os.Create("file2")  // The second conflicting write to err.
	if err != nil {
		res <- err
	} else {
		go func() {
			_, err = f2.Write(data)
			res <- err
			f2.Close()
		}()
	}
	return res
}

The fix is simple, one just needs to introduce new variables in the goroutines (note :=):

			_, err := f1.Write(data)
			...
			_, err := f2.Write(data)

Unprotected global variable

If the following code is called from several goroutines, it leads to bad races on the services map.

var services map[string]net.Addr

func RegisterService(name string, addr net.Addr) {
	services[name] = addr
}

func GetService(name string) net.Addr {
	return services[name]
}

It can be fixed by protecting the accesses with a mutex:

var services map[string]net.Addr
var mu sync.Mutex

func RegisterService(name string, addr net.Addr) {
	mu.Lock()
	defer mu.Unlock()
	services[name] = addr
}

func GetService(name string) net.Addr {
	mu.Lock()
	defer mu.Unlock()
	return services[name]
}

Primitive unprotected variable

Data races can happen on variables of primitive types as well (bool, int, int64), like in the following example:

type Watchdog struct { last int64 }

func (w *Watchdog) KeepAlive() {
	w.last = time.Now().UnixNano()  // First conflicting access.
}

func (w *Watchdog) Start() {
	go func() {
		for {
			time.Sleep(time.Second)
			// Second conflicting access.
			if w.last < time.Now().Add(-10*time.Second).UnixNano() {
				fmt.Println("No keepalives for 10 seconds. Dying.")
				os.Exit(1)
			}
		}
	}()
}

Even such "innocent" data races can lead to hard to debug problems caused by (1) non-atomicity of the memory accesses, (2) interference with compiler optimizations and (3) processor memory access reordering issues.

To fix such data race one can use (aside from chan and sync.Mutex) package sync/atomic, which provides atomic operations on primitive types. sync/atomic functions solve all of the above issues.

type Watchdog struct { last int64 }

func (w *Watchdog) KeepAlive() {
	atomic.StoreInt64(&w.last, time.Now().UnixNano())
}

func (w *Watchdog) Start() {
	go func() {
		for {
			time.Sleep(time.Second)
			if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() {
				fmt.Println("No keepalives for 10 seconds. Dying.")
				os.Exit(1)
			}
		}
	}()
}

Supported Platforms

Supported platforms are darwin/amd64, linux/amd64 and windows/amd64.

Runtime Overheads

The data race detector significantly increases both memory consumption and execution time. The concrete numbers highly dependent on the particular program, but some reference numbers would be: memory consumption ~5-10x, execution time ~2-20x.

Comments/Questions?

Send comments/questions to [email protected]

Clone this wiki locally