This post follows https://galler.dev/getting-background-tasks-to-play-nicely-with-deployments/ and discusses different Kotlin code styles we can use for stopping a long running task. It is a summary of a discussion we had over a PR in which we wanted to terminate an execution loop over a collection.
Iterating over a collection in Kotlin
There are many ways to iterate over collections in Kotlin. Assuming you also want to collect results as you iterate, you can run the gamut with syntax styles ranging from the very Java-esque to super streamlined functional code.
Naive iteration on a collection
Let’s take an example of a loop that receives a list of customer IDs and sends them an email. Our boilerplate will look like this:
class Iterations { private val customerIds = List(100) { Integer.toHexString(Random.nextInt(256 * 256)) } private fun sendEmail(customerId: String): Boolean { println("Sending email to customer $customerId") sleep(500) return Random.nextInt(2) == 0 } }
We have a list of 100 customer ID strings and a function that calls our email provider. The function takes 500ms to complete and returns a boolean success indicator.
Our first pass at creating the send loop is a style that you might see with Java developers making the transition to Kotlin, with little experience in functional programming.
Create a list of responses, iterate with a for
loop and add the response to the result list. Return the statuses.
fun sendEmails1(): List<Boolean> { val statuses = mutableListOf<Boolean>() for (customerId in customerIds) { val response = sendEmail(customerId) statuses.add(response) } return statuses }
This is, of course, very verbose, and also makes unnecessary use of a mutable list. We can improve on this quite easily:
fun sendEmails2(): List<Boolean> { val statuses = mutableListOf<Boolean>() customerIds.forEach { customerId -> val response = sendEmail(customerId) statuses.add(response) } return statuses }
We’re still using a mutable list, but using a forEach
on the collection. Not much of an improvement. Let’s keep going:
fun sendEmails3(): List<Boolean> { val statuses = customerIds.map { customerId -> val response = sendEmail(customerId) response } return statuses }
We’ve made the jump from forEach
to map
which saves us the trouble of allocating a list of results. Now to clean up and streamline the code:
fun sendEmails4() = customerIds.map { customerId -> sendEmail(customerId) }
The function is inlined. The response
variable has been removed. We’ve even removed the explicit return type for brevity. From 6 lines of code we are down to 1.
As a side-note, implicit return types and putting everything in one line are not necessarily Best Practice. I recommend watching Putting Down the Golden Hammer from KotlinConf 2019 for an interesting discussion on that. This is just an example taken to the absolute extreme for the sake of discussion.
As far as performance, all implementations will take ~50 seconds to run (100 x 500ms).
A wrench in the gears
Now comes the twist: We have to be able to stop mid-way. We are shutting down our process and we need to interrupt our run.
Let’s expand our boilerplate to include a shutdown flag:
class Iterations { private val customerIds = List(100) { Integer.toHexString(Random.nextInt(256 * 256)) } private val shouldShutDown = AtomicBoolean(false) private fun isShuttingDown(): AtomicBoolean { sleep(200) return shouldShutDown } init { Thread { sleep(3000) shouldShutDown.set(true) }.start() } private fun sendEmail(customerId: String): Boolean { println("Sending email to customer $customerId") sleep(500) return Random.nextInt(2) == 0 }
This shutdown flag comes with a cost – 200ms of performance. In the example, the flag is in-memory, but it could also be a call to the DB or an external API. The list of customers could be 100K items long, not 100. Therefore we are simulating a time penalty for the flag check. We also initialize a background thread that will raise the shutdown flag after 3 seconds. Total time for each iteration will be 700ms (500 for sending the email + 200 for checking the flag)
How can we add the check to our code?
Let’s try and make a minimal change:
fun sendWithInterrupt1(): List<Boolean> = customerIds.mapNotNull { customerId -> if (isShuttingDown().get()) return@mapNotNull null sendEmail(customerId) }
We’re still using map
. Sort of. We just modified it to mapNotNull
. We’ve made the minimal change, and the code is still relatively clean and readable. But mapNotNull
will evaluate the entire collection. After 3 seconds, we have processed 5 customers (3000ms / 700ms rounded up). We will pass through 95 more redundant iterations that will cost us about 19 seconds to pass through – 95 calls x 200ms to check the flag. This is unacceptable. We should stop immediately.
The solution that’s staring us in the face is to abandon our map
and go back to using forEach
:
fun sendWithInterrupt2(): List<Boolean> { val statuses = mutableListOf<Boolean>() customerIds.forEach { customerId -> if (isShuttingDown().get()) return statuses val response = sendEmail(customerId) statuses.add(response) } return statuses }
This is ugly. We have a return
in two places. We’re back to a mutable list. But we will stop after 3.5 seconds. We have traded readability and elegance in return for performance.
Is there a way to eat our cake and have it too (or is it having your cake and eating it too)? Anyway, we want to stick to a functional style, but also be able to stop mid-evaluation. Turns out that there is a way to do this in Kotlin: Sequences.
Quick reminder here: Kotlin collections are eagerly evaluated by default. Sequences are lazily evaluated.
Let’s see what we can do:
fun sendWithInterrupt3() = customerIds .asSequence() .map { customerId -> sendEmail(customerId) } .takeWhile { !isShuttingDown().get() } .toList()
We added asSequence
before map
. The flag check is in the takeWhile
section. This acts as our stop criteries – a While condition. We wrap it up with toList()
.
Now, the only terminal operation here is the toList
. All other operations return an intermediate sequence that is not immediately evaluated. When we run this, total running time is about 3.5 seconds – the time it takes to process 5 email sends and flag checks.
What’s the right way to go? (A.K.A. Tradeoffs)
Looking at this final version, I can’t say wholeheartedly that it is the ideal solution. Yes, the style is very functional and looks like a pipeline, but on the other hand, another developer looking at this code may not be familiar with these lesser knows operators (takeWhile
, asSequence
). We are adding a slight learning curve.
We did manage to avoid mutable lists and intermediate variables, keeping the code tight and leaving less room for future modifications that might lead us down a dark path.
The code is definitely more Kotlin-y and more idiomatic, but some might argue that the KISS principle should guide us to stick with the mutable, non-functional version.
Overall, my personal preference would be to go with this last version, but my recommendation to the engineer who was responsible for the change was to make the call himself according to his preference. There are more important hills to die on ¯\_(ツ)_/¯