Preventing software antipatterns with Detekt

Detekt is a static code analyzer for Kotlin. This post talks about how I used Detekt for non-trivial, systematic improvements to code quality, using custom Detekt rules.

Source files can be found here: https://github.com/ygaller/detekt-antipatterns

Antipatterns, antipatterns everywhere

I review a lot of pull requests on a regular basis. During reviews, antipatterns show up all the time. Many of them are repetitive and addressing them is important, but does not involve deep insight into the code. In fact, repetitive nitpicking in code reviews is an antipattern in itself. Repeating the same points over and over is exasperating, and they keep popping up in PRs coming from different developers.

Sometimes you see an antipattern that is so bad that it sends you on a gardening task to weed out all instances of it in the codebase. One such example is the instantiation of Jackson’s ObjectMapper.

Antipattern example – misusing ObjectMapper

Instantiating an ObjectMapper is intensive as far as object allocation and memory use. If you’re not careful and causally use it in code that is called frequently, it will cause memory churn – the intensive allocation and release of objects, leading to frequent garbage collection. This can directly contribute to the instability of a JVM process.

Let’s have a look at an example of bad usage of the ObjectMapper: We have a CustomerRecord class which represents a DB record. This record has a json string field. When we fetch the record we want to convert the record into a DTO. This requires us to deserialize the string so that we can access the serialized object.

data class CustomerRecord(
    val firstName: String,
    val lastName: String,
    val emailAddress: String,
    val preferences: String // json
) {
 
  fun toDto(): CustomerDto {
    val preferences = ObjectMapper().registerModule(KotlinModule()).readValue<CustomerPreferences>(preferences)
    return CustomerDto(
        firstName,
        lastName,
        emailAddress,
        preferences.language,
        preferences.currency
    )
  }
}

The problem here is that with each invocation of toDto(), we are creating a new instance of ObjectMapper.

How bad is it? Let’s create a test and compare performance. For comparison, version #2 of this code has a statically instantiated ObjectMapper:

data class CustomerRecord(
    val firstName: String,
    val lastName: String,
    val emailAddress: String,
    val preferences: String // json
) {
  companion object {
    private val objectMapper = ObjectMapper()
        .registerModule(KotlinModule()) // To enable no default constructor
  }

  fun toDtoStatic(): CustomerDto {
    val preferences = objectMapper.readValue<CustomerPreferences>(preferences)
    return CustomerDto(
        firstName,
        lastName,
        emailAddress,
        preferences.language,
        preferences.currency
    )
  }
}

Let’s compare performance:

IterationsNew Duration
(Deserializations / s)
Static Duration
(Deserializations / s)
Ratio
10K767ms (13,037)44ms (227,272)x17
100K4675ms (21,390)310ms (322,580)x15
1M37739ms (26,504)1556ms (642,673)x24
New – creating ObjectMapper() in each iteration
Static – creating ObjectMapper() once

This is an amazing result! Extracting the ObjectMapper to a static field and preventing an instance from being created each time has improved performance by a factor of over 20 in some cases. This is the ratio of time that initializing an ObjectMapper takes vs. what the actual derserialization takes.

The problem with the one-time gardener

A large codebase could have hundreds of cases of bad ObjectMapper usages. Even worse, a month after cleaning up the codebase and getting rid of all these cases, some new engineers will join the group. They are not familiar with the best practice of static ObjectMappers and inadvertently reintroduce this antipattern into the code.

The solution here is to proactively prevent engineers from committing such code. It should not be up to me, the reviewer to catch such errors. After all, I should be focusing on deeper insights into the code, not on being a human linter.

Detekt to our aid

To catch these antipatterns in Kotlin, I used a framework called Detekt. Detekt is a static code analyzer for Kotlin. It provides many code smell rules out of the box. A lesser known feature is that it can be extended with custom rules.

Let’s first try and figure out what would valid uses of ObjectMapper look like. An ObjectMapper would be valid if instantiated:

  • Statically, in a companion class
  • In a top level object which is a singleton itself
  • In a private variable in the top-level of a file, outside of a class but not accessible from outside the file

What would an ObjectMapper code smell look like? When it’s instantiated:

  • As a member of a class that may be instantiated many times
  • As a member of an object that is not a companion object within a class (highly unlikely but still…)

I created a rule to detect the bad usages and permit the good ones. In order to do this, I extended Detekt’s Rule abstract class.

Detekt uses the visitor pattern, exposing a semantic tree of the code via dozens of functions that can be overridden in Rule. Each function is invoked for a different component in the code. At first glance, it’s hard to understand which is the correct method to override. It took quite a bit of trial and error to figure out that the correct function to override is visitCallExpression. This function is invoked for all expressions in the code, including the instantiation of ObjectMapper.

An example of how we examine the expression can be found in class JsonMapperAntipatternRule. We search the expression’s parents for the first class & object instances. Depending on what we find, we examine the expression in the context of a class or of a file.

    val objectIndex = expression.parents.indexOfFirst { it is KtObjectDeclaration }
    val classIndex = expression.parents.indexOfFirst { it is KtClass }

    val inFileScope = objectIndex == -1 && classIndex == -1
    if (inFileScope) {
      validateFileScope(expression)
    } else {
      validateObjectScope(objectIndex, classIndex, expression)
    }

Test coverage can be found in JsonMapperAntipatternRuleTest. Tests include positive and negative examples and can be added as we encounter more edge cases that we may not have thought of.

The positive test cases we want to allow:

// Private ObjectMapper in class' companion object
class TestedUnit {

  companion object {
    private val objectMapper = ObjectMapper()
  }
}
// ObjectMapper in object
object TestedObject {
    private val objectMapper = ObjectMapper()
}
// Private ObjectMapper in file (top level)
private val objectMapper = ObjectMapper()

class TestedObject {
}
// Part of a more complex expression
private val someObject = ObjectMapper().readValue("{}")

class TestedObject {
}

On the other hand, cases which we want to prevent:

// ObjectMapper not in companion class
class TestedUnit {

  private val objectMapper = ObjectMapper()
}
// Within a non-companion object
class External {
  object Internal {
      val notGood = ObjectMapper()
  }
}
// Not private in top level of file
val notGood = ObjectMapper()

class External {
}

The detekt-custom-rules module is a minimal Detekt plugin module. Once built, the result is a jar that can be used interactively within IntelliJ, or as a plugin when building the rest of the project.

Using the plugin with IntelliJ

After installing the Detekt plugin in IntelliJ, we enable it and point it to the plugin jar like so:

The result is that we now have syntax highlighting for this issue:

Using the plugin with a Maven build

To use is as part of the build, we need to point the maven plugin to the jar:

            <plugin>
                <groupId>com.github.ozsie</groupId>
                <artifactId>detekt-maven-plugin</artifactId>
                <version>${detekt.version}</version>
                <executions>
                    <execution>
                        <id>detekt-check</id>
                        <phase>verify</phase>
                        <goals>
                            <goal>check</goal>
                        </goals>
                    </execution>
                </executions>

                <configuration>
                    <config>config/detekt.yml</config>
                    <plugins>
                        <plugin>detekt-custom-rules/target/detekt-custom-rules-1.0-SNAPSHOT.jar</plugin>
                    </plugins>
                </configuration>
            </plugin>

With the plugin in place we receive a warning during the build:

custom-rules - 5min debt
        JsonMapperAntipatternRule - [preferences] at detekt-antipatterns/customers/src/main/java/com/ygaller/detekt/CustomerRecord.kt:18:23

In detekt.yml it’s possible to configure the build to either fail or just warn on the custom rule.

Conclusion

Using Detekt and the custom rule, I managed to screen for a non-trivial antipattern. Adding it to IntelliJ and to the build means that it is much less likely to pop up in future code reviews.

Moreover, it decenteralizes the knowledge of this issue and enables any engineer working on our code in the future to be alerted to the fact that he is introducing an inefficiency into the system. One that can easily be avoided.

Good Enough Architecture – Separating bounded contexts

Let’s talk about different options for separating services and domains, in the context of synchronous vs. event driven architectures.

Many times, we find ourselves with a need for a change in architecture. There are several alternatives with various tradeoffs between them. We are going to review one such case today.

Acmetrade – The direct trading company

Our story is about Acmetrade, a company providing a direct stock trading platform for individual investors.

At Acmitrade, an individual investor can create an account, transfer funds and start trading right away. It’s easy, simple and customers love it.

The platform that Acmetrade provides is so good that a while back, investment managers (IMs) started requesting to manage their customers’ portfolios on the platform. Management identified this opportunity for increased revenue and decided to create an Investment Manager offering based on the original platform. This new product has been an unqualified success, driving significant revenue.

Acmetrade’s R&D

Within the R&D organization, the accounts domain is maintained by the Accounts team. Its mandate is anything related to managing the account’s lifecycle – account creation, deposits, withdrawals, etc. The accounts team is responsible for the Accounts microservice.

To support the investment management domain, a new team was formed – the IM (Investment Management) team. Its purpose is to develop the investment management offering. The main service which they maintain is the IM microservice. It manages all the data and exposes all the functions that are needed for working with investment managers.

Acmetrade’s Architecture

Acmetrade provides customers and investment managers with a web portal.

Since the accounts domain and the IM domain are mostly accessed by these portals, the team provides a REST HTTP api for the webapps to call. The communication model is synchronous HTTP API calls.

The core domain that should not depend on any other is accounts. It should not be aware of whether it is managed by an investment manager. The IM domain does have a dependency on the accounts domain and holds references to account entities that are under IM management.

This image has an empty alt attribute; its file name is pp740x7b5on92r5ury5w.png

What’s the problem?

When the IM functionality was introduced, it was clear that the accounts team would have to incorporate IM functionality into various account business flows. The IM team went about making modifications to the account service, under supervision of the accounts team – the owners of the accounts domain.

While this was a trickle and harmless at first, the rapid growth of IM functionality now means that anything that is done in the accounts team must also factor the effect on the IM domain.

It’s a classic case of feature creep. Any new feature, in this case investment managers, must be supported from now on in each and every flow. But that’s ok, right? The domains are separate and there are different teams addressing the different concerns. Well, turns out things have gotten out of hand lately and the boundaries between the domains have become blurred. Code that is the responsibility of the IM team is located in the accounts service. For good reason – there’s temporal coupling between actions in the account and IM actions.

Some examples of IM -> accounts dependencies are obvious:

  • There’s a need to calculate investment managers’ commissions based on activity of accounts under management
  • The IM portal provides a summary of accounts under management

The problem is that over time, some backward dependencies have formed. These are cases where accounts started “being aware” of the existence of investment managers:

  1. When an account is created, the accounts service lets the IM service know that the new account should be affiliated with an IM.
  2. When the customer deposits funds, the investment manager is notified of the deposit via email. The account service queries the IM service for the managing investment manager and adds him as a CC recipient to the confirmation email that is sent to the customer.
  3. The AccountData DTO used by the UI contains a section with information about the assigned IM. This data is requested from the IM service.
Backward dependencies

The accounts domain has been “infected” with IM data, and we need to take action in order to prevent the domains from becoming entangled even further.

Why do we care? Well, if they become fully entangled, it will be impossible for the teams to work independently. Responsibility for the code will cease to be well defined. The IM team will be knee deep in accounts code and vice versa. This has already started to affect development velocity as well as code quality. The accounts team now need to understand the IM domain in addition to their domain. This is distracting them and making reasoning about the system much more difficult.

The ideal solution

Ideal solution

Given unlimited time and engineers, the right way to go about solving this would be to eliminate the account service’s dependency on the IM service. IM work within the account service should move to where it really belongs – the IM service.

What’s the best way to eliminate the dependencies? Let’s look at the cases:

For cases #1 and #2 in the previous section, moving to an event architecture would make sense. The account service would publish an “account created” event and a “funds deposited” event. The IM service would listen to these events and act accordingly. This is great! The account service no longer needs to be aware of the existence of the IM service. It just has to declare that an event has taken place.

As for #3, we are abusing the account service by also serving as a “portal gateway” service for composing the DTO that is returned to the portal. The solution here would be to create a portal gateway service that would be in charge of the composition of the response to the frontend. It would depend on the accounts service and on the IM service. The accounts service would no longer depend on the IM service.

Why is the ideal not ideal?

There is, however, a major problem. When asked to give an effort estimate for the work involved, the teams gave a “large” t-shirt size estimate for the work. At least 2 weeks, maybe more. Right now, the targets for the quarter are already set and doing this would derail other important work. Given that this is not an immediate fire that needs to be put out, work can start next quarter at the earliest. In addition, there are various drawbacks to changing the architecture:

Pubsub drawbacks
  • Creating a pubsub infrastructure involves writing a lot of new code. This code will take time to stabilize.
  • We are adding new components to the system. These components – queues & topics, may break. For the change to go live, the team needs to have monitoring and automated testing in place.
  • Actions that were atomic are now separate – we need to consider states of inconsistency and how to remedy them in case of failure.
  • People who are used to working with a synchronous model now need to think with a different, event driven, mindset.
Gateway service drawbacks
  • Creating a new service means additional complexity in monitoring and deployment.
  • The gateway service will be the responsibility of a third team – The front-end team which is responsible for such services. More communication overhead!

The Tradeoff

The team realizes that if they go for the ideal solution, it will never reach maturity within a reasonable time span. Meanwhile, the complexity of the system continues to grow, and refactoring becomes even harder.

Are there any short-term low cost alternatives that they can develop as a stop-gap until the team is able to provide a robust solution to the issues?

Alternatives

The IM Gateway
IM Gateway

To start with, the accounts team could start treating the IM service as a proper third party. Up until now, they teams, which have worked closely, have treated each other’s code as familiar codebases, instead of code that is opaque.

When accessing a third party API, it is good practice to wrap the access to that API with a gateway component. Privately, this component manages calls to the external API. Publicly, the component exposes a set of calls so that the rest of the accounts service only needs to know about the component. Let’s call this the IM Gateway.

Why is this important? What is the benefit here?

  • It limits the change in the accounts service to within the gateway. Any changes in the IM service will only reflect on a single point – the IM Gateway. The rest of the accounts service is isolated from changes.
  • It makes it easy to understand the extent of use of the IM API within the accounts service. A simple “find usage” search on the IM Gateway will show all the places in the accounts service that refer to IM logic.
  • It improves service testability – Once you have a central IM Gateway component, mocking it is much simpler. Mock a single component and all your tests can work with it, instead of having to intercept calls to the IM service all across the codebase.
  • The team’s effort estimate to create this gateway is only 2 days! They can definitely squeeze it into their schedule.
The IM Module
IM Module

The IM Gateway was a great step forward. But the IM business logic still remains in place and is dispersed throughout the service. The next step in isolating IM dependencies is to concentrate the IM business logic into a single module – contrast this with just passing all calls to the IM service via a gateway as in the previous step.

Once this module is complete, it will be much easier to “pick it up” and move it to the IM service. It is essentially doing all the IM work, except that we are still inside the accounts service. All calls to the module should pass an object, equivalent in structure to the external event that we would publish to the pubsub infrastructure.

The team estimates the work required for this step as one week of work.

The module would provide:

  • Account linking – The module receives an “event” object. It calls internal endpoints in the accounts service to collect information, just as the IM service would were it to receive such an event. It then calls the IM service in order to link the account to an IM.
  • Notifications – Up until now, we sent a single email with a “to” recipient (the account owner) and a “cc” recipient (the IM). Moving the logic of sending the IM email into the module would mean sending two separate emails. This means two actions instead of one, which means managing intermediate states and failures.
The Portal Gateway package

To deal with composing the DTO that we return to the frontend portal, the team creates a portal gateway package in the codebase. This isolates the access to IM information for display purposes to a single package. We remove references to the IM module from core account entities where they were present up to now. When we decide to create a proper portal gateway service, all we will need to do is move this package between services.

The effort estimate here is only 1 day.

The end state

The steps the team took have brought us to a much better state:

  • In total, the change took 2 weeks of immediate work, instead of taking at least a month, starting sometime in the next quarter. Moreover, this work is not throwaway work, but rather an intermediate step to build upon towards the ideal solution.
  • IM code is now much less pervasive in the accounts service’s codebase
  • The IM team can now work in a safer environment with almost all of the changes happening in a single area (the module).
  • The accounts team does not need to supervise work done by the IM team in the accounts service as closely.
  • The accounts team has less fear of unexpected changes. There’s less communication overhead due to increased decoupling.

Conclusion

The work that the team did puts us in a better position to reach the ideal, event based solution. In fact, if it untangles the code and sets boundaries, it may buy the teams months of breathing room and may ultimately prove to be a sustainable status quo, saving a lot of premature and unnecessary optimization.

While not the perfect solution, it might just be good enough and is a good example of a tradeoff between the unattainable correct solution and real life constraints.