Backgrounding long running tasks is a classic and ubiquitous problem for web applications.
E.g. Something triggered the need to download and manipulate a file, but we don’t want to hold up the main thread responsible for bringing a response back to the client.
Most likely you’d want to offload this task to a background worker or another service, but sometimes it’s nice to be able to just handle the processing right then and there.
Go makes concurrency incredibly simple with the go keyword. So simple in fact, that you might quickly run into problems if you are handling files in this manner and just spin up new goroutines for everything.
File descriptors
Herein lies the problem. Most systems don’t have an unlimited number of file descriptors for a process to use. Probing the two machines within my current reach yields the following:
$ sw_vers ProductName: Mac OS X ProductVersion: 10.11.4 BuildVersion: 15E65 $ ulimit -n 256 |
~$ cat /etc/issue Ubuntu 14.04.4 LTS \n \l ~$ ulimit -n 1024 |
Evidently there isn’t all that many available on either system. Opening a ton of sockets and files at once, will inevitably lead to errors; “Too many open files” or similar.
Limitation
Go’s concurrency primitives lends itself to definitions laid out by Tony Hoare in Communicating sequential processes and as such Go has the concept of channels.
Channels provides the necessary “blocking” mechanism, to allow a collection of goroutines to handle a common workload. Instead of creating a lot of goroutines all at once, rather a handful can be created, each waiting to handle incoming items on a channel.
Below is a very simple example, where a, (buffered channel with five slots*), channel is created as a work queue. Subsequently five goroutines are created and each set to wait on incoming work from the queue. In the main thread, the queue is then filled up all at once with work for the goroutines.
Once all the work have been loaded onto the queue, the channel is closed, which is Go’s way of telling consuming goroutines that nothing more will appear on the channel. This works in tandem with the range keyword to keep receiving on the channel until it is closed.
package main | |
import ( | |
"log" | |
"sync" | |
"time" | |
) | |
const Limit = 5 | |
func main() { | |
log.SetFlags(log.Ltime) // format log output hh:mm:ss | |
wg := sync.WaitGroup{} | |
queue := make(chan string) | |
doWork := func(i int, j string) { | |
time.Sleep(2 * time.Second) | |
log.Printf("Worker %d working on %s\n", i, j) | |
} | |
for worker := 0; worker < Limit; worker++ { | |
wg.Add(1) | |
go func(worker int) { | |
defer wg.Done() | |
for work := range queue { | |
doWork(worker, work) // blocking wait for work | |
} | |
}(worker) | |
} | |
for j := 0; j < 15; j++ { | |
work := string(rune(97 + j)) | |
log.Printf("Work %s enqueued\n", work) | |
queue <- work | |
} | |
close(queue) | |
wg.Wait() | |
} |
Note, the WaitGroup is effectively a Monitor; another concurrency construct that allows the main thread to wait for all the goroutines to finish.
Running the program produces the following output:
$ go run conc.go 10:25:37 Work a enqueued 10:25:37 Work b enqueued 10:25:37 Work c enqueued 10:25:37 Work d enqueued 10:25:37 Work e enqueued 10:25:37 Work f enqueued 10:25:39 Worker 1 working on a 10:25:39 Worker 3 working on d 10:25:39 Work g enqueued 10:25:39 Work h enqueued 10:25:39 Worker 2 working on b 10:25:39 Work i enqueued 10:25:39 Worker 4 working on e 10:25:39 Work j enqueued 10:25:39 Worker 0 working on c 10:25:39 Work k enqueued 10:25:41 Worker 0 working on j 10:25:41 Worker 3 working on g 10:25:41 Worker 2 working on h 10:25:41 Worker 1 working on f 10:25:41 Worker 4 working on i 10:25:41 Work l enqueued 10:25:41 Work m enqueued 10:25:41 Work n enqueued 10:25:41 Work o enqueued 10:25:43 Worker 0 working on k 10:25:43 Worker 1 working on n 10:25:43 Worker 3 working on l 10:25:43 Worker 2 working on m 10:25:43 Worker 4 working on o |
Go’s concurrency primitives is that rare combination of easy and powerful, making it effortless to write threaded code.
It doesn’t save you from inherent limitations of the host system however, which is a good thing. Awareness of what the code actually does on the machine, is a virtue to strive for.
Edit: Thanks to Jemma for pointing out that buffering the channel isn’t needed afterall.
This post was included in the Go Newsletter issue 110.