Lade Inhalt...

Pure functional HTTP APIs in Scala

Discover the pure functional side of HTTP API programming in Scala.

von Jens Grassel (Autor:in)
122 Seiten

Zusammenfassung

This book is intended for the intermediate Scala programmer who is interested in functional programming and works mainly on the web service backend side. Ideally she has experience with libraries like Akka HTTP and Slick which are in heavy use in that area. However maybe you have wondered if we can't do better even though aforementioned projects are battle tested and proven. The answer to this can be found in this book which is intended to be read from cover to cover in the given order. Within the book the following libraries will be used: Cats, Cats Effect, http4s, Doobie, Refined, fs2, tapir, Monocle and probably others. ;-) This edition includes a chapter about migrating the project to Scala 3. Which includes all the nasty issues that we tend to run into if we touch code after a longer time. Code and book source can be found in the author's github account.

Leseprobe

Inhaltsverzeichnis


Foreword

Thank you for your interest in this book. I hope you’ll have a good time reading it and learn something from it.

About this book

This book is intended for the intermediate Scala programmer who is interested in functional programming and works mainly on the web service backend side. Ideally she has experience with libraries like Akka HTTP and Slick which are in heavy use in that area.

However maybe you have wondered if we can’t do better even though aforementioned projects are battle tested and proven.

The answer to this can be found in this book which is intended to be read from cover to cover in the given order. Within the book the following libraries will be used: Cats1, Cats Effect2, http4s3, Doobie4, Refined5, fs26, tapir7, Monocle8 and probably others. ;-)

This edition includes a chapter about migrating the project to Scala 3. Which includes all the nasty issues that we tend to run into if we touch code after a longer time.

Code and book source can be found in the following repository: https://github.com/jan0sch/pfhais

Copyleft Notice

This book uses the Creative Commons Attribution ShareAlike 4.0 International (CC BY-SA 4.0) license9. The code snippets in this book are licensed under CC010 which means you can use them without restriction. Excerpts from libraries maintain their license.

Imprint / Impressum

Bear with me, this is obligatory under german law.

AutorJens Grassel: Wegtam GmbH, c/o J. Grassel, Neuer Markt 17, 18055 Rostock
Umschlaggestaltung, IllustrationJens Grassel
ISBN978-3-7521-4129-0

  1. https://typelevel.org/cats/↩︎

  2. https://typelevel.org/cats-effect/↩︎

  3. https://http4s.org/↩︎

  4. https://tpolecat.github.io/doobie/↩︎

  5. https://github.com/fthomas/refined↩︎

  6. https://fs2.io/↩︎

  7. https://github.com/softwaremill/tapir↩︎

  8. https://github.com/julien-truffaut/Monocle↩︎

  9. https://creativecommons.org/licenses/by-sa/4.0/legalcode↩︎

  10. https://wiki.creativecommons.org/wiki/CC0↩︎

Thanks

I would like to thank my beloved wife and family who bear with me and make all of this possible. Also I send a big thank you to all the nice people from the Scala community which I’ve had the pleasure to meet. Special thanks go to Adam Warski (tapir), Frank S. Thomas (refined), Julien Truffaut (Monocle) and Ross A. Baker (http4s) for their help, advise and patience.

Our use case

For better understanding we will implement a small use case in both impure and pure way. The following section will outline the specification.

Service specification

First we need to specify the exact scope and API of our service. We’ll design a service with a minimal API to keep things simple. It shall fulfil the following requirements.

The service shall provide HTTP API endpoints for:

  1. the creation of a product data type identified by a unique id
  2. adding translations for a product name by language code and unique id
  3. returning the existing translations for a product
  4. returning a list of all existing products with their translations

Data model

We will keep the model very simple to avoid going overboard with the implementation.

  1. A language code shall be defined by the ISO 639-1 (e.g. a two letter code).
  2. A translation shall contain a language code and a product name (non-empty string).
  3. A product shall contain a unique id (UUID version 4) and a list of translations.

Database

The data will be stored in a relational database (RDBMS). Therefore we need to define the tables and relations within the database.

The products table

The table products must contain only the unique id which is also the primary key.

The names table

The table names must contain a column for the product id, one for the language code and one for the name. Its primary key is the combination of the product id and the language code. All columns must not be null. The relation to the products is realised by a foreign key constraint to the products table via the product id.

HTTP API

The HTTP API shall provide the following endpoints on the given paths:

Path HTTP method Function
/products POST Create a product.
/products GET Get all products and translations.
/product/{UUID} PUT Add translations.
/product/{UUID} GET Get all translations for the product.

The data shall be encoded in JSON using the following specification:

JSON for a translation

JSON for a product

This should be enough to get us started.

Maybe there is another way

If we want referential transparency, we must push the side effects to the boundaries of our system (program) which can be done by using lazy evaluation. Let’s repeat the previous example in a different way.

IO example 1

The above code will produce no output. Only if we evaluate the variable effect which is of type IO[Unit] will the output be generated (try effect.unsafeRunSync in the REPL). Also the second approach works like expected.

IO example 2

What have we gained?

Suddenly we can much more easily reason about our code! And why is that? Well we don’t have unexpected side effects caused by code running even when it doesn’t need to. This is a sneak peak how pure code looks like. Now we only need to implement pure libraries for our use, or do we?

Luckily for us meanwhile there are several pure options available in the Scala ecosystem. We will stick to the Cats family of libraries namely http4s and Doobie as replacements for Akka-HTTP and Slick. They build upon the Cats Effect library which is an implementation of an IO monad for Scala. Some other options exist but we’ll stick to the one from Cats.

To be able to contrast both ways of implementing a service we will first implement it using Akka-HTTP and Slick and will then migrate to http4s and Doobie.

Impure implementation

We’ll be using the following libraries for the impure version of the service:

  1. Akka (including Akka-HTTP and Akka-Streams)
  2. Slick (as database layer)
  3. Flyway for database migrations (or evolutions)
  4. Circe for JSON codecs and akka-http-json as wrapper
  5. Refined for using refined types
  6. the PostgreSQL JDBC driver

I’ll spare you the sbt setup as you can look that up in the code repository (i.e. the impure folder in the book repo).

Models

First we’ll implement our models which are simple and straightforward. At first we need a class to store our translations or better a single translation.

So what is wrong with that approach?

Technically it is okay but we have a bad feeling about it. Using Option[String] is of no use because both fields have to be set. But a String can always be null and contain a lot of unexpected stuff (literally anything).

This is the moment when refined types come to you rescue!

So let us define some refined types which we can use later on. At first we need a language code which obeys the restrictions of ISO-639-1 and we need a stronger definition for a product name. For the former we use a regular expression and for the latter we simply expect a string which is not empty.

Refined types for models

Now we can give our translation model another try.

Translation model using refined types

Much better and while we’re at it we can also write the JSON codecs using the refined module of the Circe library. We put them into the companion object of the model.

Now onwards to the product model. Because we already know of refined types we can use them from start here.

Now what is wrong about this?

If we look closely we realise that a List maybe empty. Which is valid for the list but not for our product because we need at least one entry. Luckily for us the Cats library has us covered with the NonEmptyList data type. Including the JSON codecs this leads us to our final implementation. Last but not least we really should be using the existing UUID data type instead of rolling our own refined string version - even when it is cool. ;-)

Product model using UUID type and NeL

We kept the type name ProductId by using a type alias. This is convenient but remember that a type alias does not add extra type safety (e.g. type Foo = String will be a String).

Why do I have a bad feeling about this?

Well, maybe because a list may contain duplicate entries but the database will surely not because of unique constraints! So, let’s switch to a NonEmptySet which is also provided by Cats.

Product model using UUID and NeS

Now we have the models covered and can move on to the database layer.

Database layer

The database layer should provide a programmatic access to the database but also should it manage changes in the database. The latter one is called migrations or evolutions. From the available options we chose Flyway as the tool to manage our database schema.

Migrations

Flyway uses raw SQL scripts which have to be put into a certain location being /db/migration (under the resources folder) in our case. Also the files have to be named like VXX__some_name.sql (XX being a number) starting with V1. Please note that there are two underscores between the version prefix and the rest of the name! Because our database schema is very simply we’re done quickly:

Flyway migration for creating the database

In the code you’ll see that we additionally set comments which are omitted from the code snippet above. This might be overkill here but it is a very handy feature to have and I advice you to use it for more complicated database schemas. Because the right comment (read information) in the right place might save a lot of time when trying to understand things.

Next we move on to the programmatic part which at first needs a configuration of our database connection. With Slick you have a multitude of options but we’ll use the “Typesafe Config”1 approach.

Database configuration in application.conf

database {
  profile = "slick.jdbc.PostgresProfile$"
  db {
    connectionPool = "HikariCP"
    dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
    properties {
      serverName = "localhost"
      portNumber = "5432"
      databaseName = "impure"
      user = "impure"
      password = "secret"
    }
    numThreads = 10
  }
}

After we have this in place we can run the migrations via the API of Flyway. For this we have to load the configuration (we do it by creating an actor system), extract the needed information and create a JDBC url and use that with username and password to obtain a Flyway instance. On that one we simply call the method migrate() which will do the right thing. Basically it will check if the schema exists and decide to either create it, apply pending migrations or simply do nothing. The method will return the number of applied migrations.

Apply database migrations via Flyway

Let us continue to dive into the Slick table definitions.

Slick tables

Slick offers several options for approaching the database. For our example we will be using the lifted embedding but if needed Slick also provides the ability to perform plain SQL queries. For the lifted embedding we have to define out tables in a way Slick can understand. While this can be tricky under certain circumstances our simple model is straightforward to implement.

Slick product table definition

As you can see above we’re using simple data types (not the refined ones) to have a more easy Slick implementation. However we can also use refined types for the price of using either the slick-refined library or writing custom column mappers. Next we’ll implement the table for the translations which will also need some constraints.

Slick translations table definition

As you can see the definition of constraints is also pretty simple. Now our repository needs some functions for a more convenient access to the data.

Slick repository functions

The last two functions are helpers to enable us to create a load queries which we can compose. They are used in the saveProduct and updateProduct functions to create a list of queries that are executed as bulk while the call to transactionally ensures that they will run within a transaction. When updating a product we first delete all existing translations to allow the removal of existing translations via an update. To be able to do so we use the andThen helper from Slick. The loadProduct function simply returns a list of database rows from the needed join. Therefore we need a function which builds a Product type out of that.

Helper function to create a Product

But oh no! The compiler refuses to build it:

Missing cats.Order

[error] .../impure/models/Product.scala:45:74:
  could not find implicit value for parameter A: 
    cats.kernel.Order[com.wegtam.books.pfhais.impure.models.Translation]
[error] p <- Product(id = id, names = NonEmptySet.one[Translation](t)).some
[error]                                                            ^

It seems we have to provide an instance of Order for our Translation model to make Cats happy. So we have think of an ordering for our model. A simple approach would be to simply order by the language code. Let’s try this:

Providing Order for LanguageCode

Please note that you will need to import either the syntax.order._ package or the implicits package from Cats.

You might have noticed the explicit call to .value to get the underlying string instance of our refined type. This is needed because the other option (using x.compare(y)) will compile but bless you with stack overflow errors. The reason is probably that the latter is compiled into code calling OrderOps#compare which is recursive.

Providing Order for Translation

So far we should have everything in place to make use of our database. Now we need to wire it all together.

Akka-HTTP routes

Defining the routes is pretty simple if you’re used to the Akka-HTTP routing DSL syntax.

Basic routes with Akka-HTTP

We will fill in the details later on. But now for starting the actual server to make use of our routes.

Starting an Akka-HTTP server

The code will fire up a server using the defined routes and hostname and port from the configuration to start a server. It will run until you press enter and then terminate. Let us now visit the code for each routing endpoint. We will start with the one for returning a single product.

Returning a single product

We load the raw product data from the repository and convert it into a proper product model. But to make the types align we have to wrap the second call in a Future otherwise we would get a compiler error. We don’t need to marshal the response because we are using the akka-http-json library which provides for example an ErrorAccumulatingCirceSupport import that handles this. Unless of course you do not have circe codecs defined for your types.

Updating a single product

The route for updating a product is also very simple. We’re extracting the product entity via the entity(as[T]) directive from the request body and simply give it to the appropriate repository function. Now onwards to creating a new product.

Creating a product

As you can see the function is basically the same except that we’re calling a different function from the repository. Last but not least let us take a look at the return all products endpoint.

Return all products

This looks more complicated that the other endpoints. So what exactly are we doing here? Well first we load the raw product data from the repository. Afterwards we convert it into the proper data model or to be more exact into a list of product entities.

What is wrong with that approach?

The first thing that comes to mind is that we’re performing operations in memory. This is not different from the last time when we converted the data for a single product. Now however we’re talking about all products which may be a lot of data. Another obvious point is that we get a list of Option[Product] which we explicitly flatten at the end.

So, how can we do better?

Maybe we should consider streaming the results. But we still have to group and combine the rows which belong to a single product into a product entity. Can we achieve that with streaming? Well, let’s look at our data flow. We receive a list of 3 columns from the database in the following format: product id, language code, name. The tricky part being that multiple rows (list entries) can belong to the same product recognizable by the same value for the first column product id. At first we should simplify our problem by ensuring that the list will be sorted by the product id. This is done by adjusting the function loadProducts in the repository.

Sort the returned list of entries.

Now we can rely on the fact that we have seen all entries for one product if the product id in our list changes. Let’s adjust our code in the endpoint to make use of streaming now. Because Akka-HTTP is based on Akka-Streams we can simply use that.

Return all products as stream

Wow, this may look scary but let’s break it apart piece by piece. At first we need an implicit value which provides streaming support for JSON. Next we create a Source from the database stream. Now we implement the processing logic via the high level streams API. We collect every defined output of our helper function fromDatabase which leads to a stream of Product entities. But we have created way too many (Each product will be created as often as it has translations.). So we group our stream by the product id which creates a new stream for each product id holding only the entities for the specific product. We fold over each of these streams by merging together the list of translations (names). Afterwards we merge the streams back together and run another collect function to simply get a result stream of Product and not of Option[Product]. Last but not least the stream is passed to the complete function which will do the right thing.

Problems with the solution

The solution has two problems:

  1. The number of individual streams (and thus products) is limited to Int.MaxValue.
  2. The groupBy operator holds the references to these streams in memory opening a possible out of memory issue here.

As the first problem is simply related to the usage of groupBy we may say that we only have one problem: The usage of groupBy. ;-) For a limited amount of data the proposed solution is perfectly fine so we will leave it as is for now.

Regarding the state of our service we have a working solution, so congratulations and let’s move on to the pure implementation.

Pure implementation

Like in the previous section I will spare you the details of the sbt setup. We will be using the following set of libraries:

  1. http4s
  2. Doobie (as database layer)
  3. Flyway for database migrations (or evolutions)
  4. Circe for JSON codecs
  5. Refined for using refined types
  6. the PostgreSQL JDBC driver
  7. pureconfig (for proper configuration loading)

Pure configuration handling

Last time we simply loaded our configuration via the typesafe config library but can’t we do a bit better here? The answer is yes by using the pureconfig1 library. First we start by implementing the necessary parts of our configuration as data types.

Configuration data types

As we can see the code is pretty simple. The implicits in the companion objects are needed for pureconfig to actually map from a configuration to your data types. As you can see we are using a function deriveReader which will derive (like in mathematics) the codec (Yes, it is similar to a JSON codec thus the name.) for us.

A note on derivation In general we always want the compiler to derive stuff automatically because it means less work for us. However… As always there is a cost and sometimes a rather big one (compile time). Therefore you should not use fully automatic derivation but the semi automatic variant instead. The latter will let you chose what to derive explicitly. In some circumstances it may even be better to generate a codec manually (complex, deeply nested models).

Below is an example of deriving a Order instance using the kittens 2library. It uses shapeless under the hood and provides automatic and semi automatic derivation for a lot of type class instances from Cats like Eq, Order, Show, Functor and so on.

Deriving Order via kittens

Database layer

In general the same applies to the database layer as we have already read in the “impure” section.

Migrations

For the sake of simplicity we will stick to Flyway for our database migrations. However we will wrap the migration code in a different way (read Encapsulate it properly within an IO to defer side effects.). While we’re at it we may just as well write our migration code using the interpreter pattern (it became famous under the name “tagless final” in Scala).

Database migrator base

We define a trait which describes the functionality desired by our interpreter and use a higher kinded type parameter to be able to abstract over the type. But now let’s continue with our Flyway interpreter.

Flyway migrator interpreter

As we can see, the implementation is pretty simple and we just wrap our code into an IO monad to constrain the effect. Having the migration code settled we can move on to the repository.

So what is wrong with our solution?

If we take a closer look at the method definition of Flyway.migrate, we see this:

Method definition of Flyway.migrate

While IO will gladly defer side effects for us it won’t stop enclosed code from throwing exceptions. This is not that great. So what can we do about it? Having an instance of MonadError in scope we could just use the .attempt function provided by it. But is this enough or better does this provide a sensible solution for us? Let’s play a bit on the REPL.

MonadError on the REPL

This looks like we just have to use MonadError then. Hurray, we don’t need to change our code in the migrator. As model citizens of the functional programming camp we just defer the responsibility upwards to the calling site.

Doobie

As we already started with using a tagless final approach we might as well continue with it and define a base for our repository.

Base trait for the repository

There is nothing exciting here except that we feel brave now and try to use proper refined types in our database functions. This is possible due to the usage of the doobie-refined module. To be able to map the UUID data type (and others) we also need to include the doobie-postgresql module. For convenience we are still using ProductId instead of UUID in our definition. In addition we wire the return type of loadProducts to be a fs2.Stream because we want to achieve pure functional streaming here. :-) So let’s see what a repository using doobie looks like.

The doobie repository.

We keep our higher kinded type as abstract as we can but we want it to be able to suspend our side effects. Therefore we require an implicit Sync.3 If we look at the detailed function definitions further below, the first big difference is that with doobie you write plain SQL queries. You can do this with Slick too4 but with doobie it is the only way. If you’re used to object relational mapping (ORM) or other forms of query compilers then this may seem strange at first. But: “In data processing it seems, all roads eventually lead back to SQL!”5 ;-) We won’t discuss the benefits or drawbacks here but in general I also lean towards the approach of using the de facto lingua franca for database access because it was made for this and so far no query compiler was able to beat hand crafted SQL in terms of performance. Another benefit is that if you ask a database guru for help, she will be much more able to help you with plain SQL queries than with some meta query which is compiled into something that you have no idea of.

Loading a product.

The loadProduct function simply returns all rows for a single product from the database like its Slick counterpart in the impure variant. The parameter will be correctly interpolated by Doobie therefore we don’t need to worry about SQL injections here. We specify the type of the query, instruct Doobie to transform it into a sequence and give it to the transactor.

Please note that instead of the Slick variant the code does not run at that point! While the db.run of Slick will run your code the transact of Doobie will not. It just provides a free structure (read free monads) which can be interpreted later on.

Load all products

Our loadProducts function is equivalent to the first one but it returns the data for all products sorted by product and as a stream using the fs2 library which provides pure functional streaming.

Save a product

When saving a product we use monadic notation for our program to have it short circuit in the case of failure. Doobie will also put all commands into a database transaction. The function itself will try to create the “master” entry into the products table and save all translations afterwards.

Update a product

The updateProduct function uses also monadic notation like the saveProduct function we talked about before. The difference is that it first deletes all known translations before saving the given ones.

http4s routes

The routing DSL of http4s differs from the one of Akka-HTTP. Although I like the latter one more it poses no problem to model out a base for our routes.

Details

Seiten
ISBN (ePUB)
9783752141290
Sprache
Englisch
Erscheinungsdatum
2021 (April)
Schlagworte
rest-api functional-programming cats fs2 postgresql scala http4s http services akka

Autor

  • Jens Grassel (Autor:in)

Jens Grassel is the CTO and founder of Wegtam, a company specialised in data integration and search technology. He is programming in a plethora of languages and engineering software systems since the fall of the Wall. For some years now his focus has shifted to functional programming languages. He has contributed to numerous free software projects and likes to apply free and open source technologies wherever applicable.
Zurück

Titel: Pure functional HTTP APIs in Scala