Architecting a flexible and purely functional Scala back-end using Slick and Tagless Final

May 15, 2020

The Panaseer Team

Hi, my name is Chris and I am a Data Platform Engineer at Panaseer. For the past couple of months I’ve had the chance to dive deep into Scala and have helped to architect, implement and refactor parts of our data platform. At Panaseer, we use Scala as a back-end language to run our complex data-pipelines, from reading data from data sources via Kafka topics, through many Spark ETL jobs such as normalisation and entity resolution. Finally, inventory tables are populated, which we run metrics and control checks against, before being copied from Hive to Phoenix for fast, random access by an API which serves our web-client.

In addition to the data pipelines described above, we use Scala with the Play framework to build a highly scalable web server which is used to store metadata, such as control check definitions, which are accessible via a CRUD API to the data pipeline and other system components.

This article describes some of the design decisions made when developing and refactoring this web service with a focus on the repository layer. I’ll show how functional programming patterns can be used to simplify how components interact in a multilayer architecture with added benefits of improved flexibility, testability and maintainability. Some functional programming knowledge will be assumed but I will explain important terms and provide links for further reading where appropriate.

Introduction to Slick and Futures

For persistence in our application, we use Slick due to its easy integration with the Play framework and, unlike Doobie, it provides type-safety giving us compile-time query checking.

When using Slick, you start by composing queries which can be combined with other queries through filter and join operations to create actions. Query composition does not require a database or execute any part of the query, it only builds a description of what to execute at a later point. This is a common functional pattern in Scala of separating the description of an operation from its execution. Actions, referred to in Slick as DBIOAction (aliased as DBIO), describe operations to execute in sequential order on a database. Related actions can themselves be composed with combinators like andThen and .sequence so they can be executed as a single atomic transaction. If an action depends on a previous action in the sequence, you can use map and flatMap with for-comprehensions for easy action sequencing. Finally, any action(s) can then be run to obtain a Future that is eventually completed with a result when the asynchronous execution is finished.

Futures are a powerful Monad, allowing us to work with asynchronous operations, such as reading from a database, whilst treating them as if they were synchronous. Through the use of map and flatMap (monadic operations), we can chain Futures together, creating a seemingly sequential series of steps.

def findUser(userId: Long): Future[User]
def findAddress(user: User): Future[Address]
def findCity(address: Address): Future[City]
def findCityOfUser(userId: Long): Future[City]{
for {
user <- findUser(id)
address <- findAddress(user)
city <- findCity(address)
} yield city
}

Due to their practicality and their inclusion in the standard library, Futures are ubiquitous in the Scala ecosystem and are used in a number of popular libraries such as Spark, Akka and of course, Slick. Despite the many benefits, Futures can cause issues when not used correctly. Consider the following code snippet of a service-level method with a call to a database access method which returns Future[Option[User]].

def userMethod(repo: userRepository, id: Long) = {
for {
user <- OptionT(repo.findUser(id))
} yield someBusinessLogic(user)
}

At first sight, this code looks reasonable. It is using good functional style by using a Monad Transformer (OptionT) to deal with the nested context and to retrieve the user to perform some business logic with. However, if we take a step back and and consider how this method will be used, tested and maintained, there are two main issues with the code.

Firstly, the fact that we are running the DBIO directly in our repository layer and returning a Future effectively disallows a transaction to span across multiple repository calls. Once we convert to a Future, there is no way to combine multiple actions in one transaction. This means that instead of combining DBIO‘s through simple combinators into a single transaction in our service layer, the complexity is pushed to our repository layer where we end up with complex queries with potentially many joins.

The second issue is that we have committed to a specific interpretation of side-effects (i.e. Future) quite early. What I mean by this is that we do not need a Future to express the particular business logic, it is just an implementation detail of the user repository which has leaked in. This is an issue because if we moved to some other data access technology which didn’t work with Future‘s, such as synchronous JDBC, we would have to modify our service-level code. In short, we have tied our business logic to a specific database implementation which goes against one of the cornerstones of functional programming; separation of concerns.

Tagless Final

As a solution to these problems and as an architectural pattern for our backend, we use tagless final which is a method of using type-classes to model effects.

A quick aside:

A type class is a tool in Scala to enable ad-hoc polymorphism, more commonly known as overloading, where depending on the type, different implementations of the method are invoked. A type class C defines some behaviour in the form of operations that must be supported by a type T for it to be a member of type class C. A developer can declare that a type is a member of a type class by providing implementations of the operations the type must support. In short, type classes are simply a way to extend the functionality of existing code with your own custom types.

An effect or effect type is whatever the Monad handles, such as Option which models the effect of optionality or Future which models latency as an effect. Basically, an effectful function is a function that returns F[A] rather than A and the effect type is whatever F is.

In tagless final, we create a type-class in the form of a Domain-specific language (DSL) which describes some functionality and is parameterised over the effect type. For example, we create the following type-class to describe some persistence capabilities over some effect DB.

trait UserRepository[DB[_]] {
def findUser(id: Long): DB[Option[User]]
def addUser(user: User): DB[User]
def deleteUser(id: Long): DB[User]
}

DB[_] is known as a higher-kinded type as we are abstracting over a type parameter DB. We could have used F here, or any other letter(s), but DB is helpful as it makes it clear that these operations are performing some form of side-effect (interacting with a database) and are breaking referential-transparency.

Because such programs are polymorphic in the effect type DB[_], we can instantiate these polymorphic programs to any concrete effect type. In functional programming, this implementation of the DSL is called an interpreter. For example, for our Slick interpretation we could have:

class SlickUserRepository extends UserRepository[DBIO] {
override def findUser(id: Long): DBIO[Option[User]] =
doFindUser(id)
}

However, in our service layer, we do not code to a specific implementation but instead we write programs which are polymorphic in the effect type and we use tagless final as a way to give capabilities to the effect type. For example, we want to use DB to sequence computations in for-comprehensions which is what Monad gives us.

class Service[DB[_]: Monad](repo: UserRepository[DB])

This DB[_]: Monad is known as a context bound and is just syntax sugar for implicit parameter lists:

class Service[DB[_]](implicit monad: Monad[DB[_])(repo:
UserRepository[DB])

Note we could have also passed our user repository DSL as an implicit parameter here but we prefer to pass our business logic dependencies explicitly.

The power here is that tagless-final lets you separate our code from the decision of which effect type to use. Rather than pick one of these concrete implementations, using tagless-final lets us write effect-type-agnostic code, which can be instantiated to any concrete effect type such as Future, Cats IO or ZIO. We can then declare what kind of functionality is needed for the effect in a particular class or method via implicit parameters like in the code above. This is the crux of tagless final; we are using our custom made DSLs to describe our business logic whilst being completely agnostic to the interpreter that is provided and the chosen effect type!

Also notice that in the repository implementation code we have not run the DBIO, we are just describing the database interaction that will take place, to be run at a later time. This execution deferral means that we can combine a number of related repository calls into a single transaction before finally running the action to return a Future.

Database Manager

In order to manage how we compose DBIOs together and finally run the composed action in a transaction, we introduce another type class which we call DatabaseManager.

trait DatabaseManager[F[_], DB[_]] {
def execute[A](action: DB[A]): F[A]
def executeTransitionally[A](action: DB[A]): F[A]
def sequence[A](action: Seq[DB[A]]): DB[Seq[A]]
}

As we can see from the type signatures, the main responsibility of this type class is to convert between effectful types, one which represents the description of our interaction with the database (DB) and one which represents the result of running said interaction (F ).

Here is our Slick type class instance where the conversion is from a DBIO (DB)to a Future (F):

class SlickDatabaseManager extends DatabaseManager[Future, DBIO] {

override def execute[A](action: DBIO[A]): Future[A] =
db.run(action)
override def executeTransitionally[A](action:DBIO[A]): Future[A] =
db.run(action.transactionally)
override def sequence[A](action: Seq[DBIO[A]]): DBIO[Seq[A]] =
DBIO.sequence(action)
}

We do this to make the separation of concerns as explicit as possible. If we were to change to a different effect type such as Cats IO, we would only need to add a new DatabaseManager interpreter (along with the new repository implementation) and our code can continue to be used as normal. With tagless final, we have separated the description of our program(DBIO) from the result of running our program (Future ) and the database manager helps us manage the relationship between the two.

One additional benefit we haven’t discussed yet is increased testability; we can easily provide a separate DSL interpreter for testing purposes such as by using a local data store instead of accessing an external system. This approach is comparable to implementing mocks, with the difference that our interpreter is a full-featured implementation of the DSL.

Example & Architecture Summary

I will now give a simplified mock example of a service layer implementation to demonstrate what we have just learned.

class ServiceImpl[F[_]: Monad, DB[_]: Monad](
userRepository: SlickUserRepository[DB],
addressRepository: SlickAddressRepository[DB],
dbManager: DatabaseManager[F, DB])
extends Service[F] {
def updateUserAddress(userId: Long, adr: Address): F[Address]
dbManager.executeTransactionally(findAndUpdate(userId, adr))
private def findAndUpdate(id: Long, adr: Address): DB[Address] =
for {
user <- userRepository.findUser(id)
validUser <- validateUser(user)
address <- addressRepository.updateAddress(validUser, adr)
} yield address
}

What’s great about this code is that it is completely declarative- it is describing the database access logic whilst the implementation details are hidden away. What’s more, if we were to change database access technology and use a different effect type, this code would not have to change at all! It is a pure representation of our business logic, no more, no less.

To keep our architecture as simple as possible, we have used a single repository class for each schema class which models one table in our underlying database.

1 table = 1 schema = 1 repository class.

As a result, the repository layer is kept as thin as possible and our service layer takes the role of combining multiple DBIO’s, performing validation and creating a single DTO (Data Transfer Object) to be passed to our API layer.

We use the tagless final architectural pattern via generic DSLs as a way to decouple implementation details from our application code which allows generic programs to be re-used with a range of different interpreters as needed for different purposes. Dependencies are passed as implicit capabilities whilst the instantiation of a polymorphic tagless-final value to a concrete effect type is deferred as long as possible, preferably to the entry points of the application or test suite.

The end result is application code which uses Scala’s strong type system to its maximum whilst being more flexible, testable and maintainable.

Thanks for reading!
I hope that you learned something and please let me know about your experience building Scala backends, either in the comments below or you can us find on Twitter.