Architecture killed by network latency

In multiple projects using Serverless or Microservice technologies, I have seen architectures that looked good and straight forward but were in trouble due to network latency not being considered.  This is such a basic point, not even worth mentioning but not considered by architects, as it seems.

Let’s start with requirements. The customer needs a backend for the UI and the SLA is a response time of half a seconds measured from the arrival of the request in AWS.

The API does receive a sales order as payload, must validate it and store it with a new order id in the database. The architecture reflects that and looks absolutely fine. It is a direct translation of the business requirements.

Each of these services require processing time, e.g. to find the material master in the database, this will take a few milliseconds. Executing the business logic inside the runtime another few milliseconds. In this topic we do not care how long the processing time is for each, we look at the network overhead only.

Every line in this diagram represents a network hop. The API Gateway is one service and does forward the request to the Runtime service via the network. The database is another service accessed via the network. While data inside a single server or within a CPU is exchanged in nanoseconds, the network adds a latency of 10-20 milliseconds. Not necessarily because the payload is so huge (that can be a factor also but less important in our case), it is simply the time it takes to communicate.

Do be more precise, in the context of this text I define network latency as the time it takes to trigger the remote service and get back the answer, visible in the logs.

Example execution

Time Service Action
2024-06-12T19:07:10.842Z
APIGW
Request received
2024-06-12T19:07:10.977Z
APIGW
Invoke Lambda
2024-06-12T19:07:10.995Z
Lambda
Request received
2024-06-12T19:07:11.866Z
Lambda
Request completed
2024-06-12T19:07:11.868Z
APIGW
Lambda response received

In this sample execution, the API GW sent the request at 10.997 and got a response at 11.868 (=0.891s) and the lambda code execution was from 10.995 to 11.866 (=0.871s), hence the pure network latency is 0.020s (0.891 – 0.871). So 20 milliseconds are the communication overhead in this run to invoke a single service.

As there are 4 lines in the diagram and assuming 20ms it is for all, 80ms are spent with network access, 80ms less the code has time to do the actual work.

Worse, not much can be optimized here. The only option I can see is executing the two lookups for material and customer in parallel and then we have a 60ms network overhead overall.

Architecture does not show all

What is interesting in the example is the API GW, spending 135ms between receiving a request and invoking the lambda.

The API GW is responsible for authenticating the caller and the request. Is this user valid? Is he allowed to create a sales order? A sales order for this customer? So hidden inside this box is another lambda (or another auth service) with database access adding network latency. In this example is another 3 service invocations. 

The write to the order table is also two service invocations because it must be guaranteed the order number is unique.

Now we are at 8 times 20ms (partially parallel) pure network overhead.

The time the code has to do the actual work and still meet the 500ms SLAs is shrinking and shrinking.

And now?

Probably it is a good idea to compare above approach with the good ol’ times of client server architectures and what we did there.

In this architecture a webserver is called, the user authenticates just once at the beginning of the session and all user related data is stored in the webserver’s memory as session object and the servlet code is executed inside the webserver. Not a single network hop for all of that, all runs at CPU speed.

The servlet code is rather simple and makes a single call to the database invoking the “create-order” stored procedure which again runs inside the database and does everything: validating the data, inserting the order with an unique order number. All running inside the database at CPU speed.

So from a network point of view, just a single network hop is required!

This comes with downsides, obviously, mostly around scaling. Some can be solved, in this architecture still. For failover a second webserver can be on standby. If the first fails, the user must login on the second again and can continue. Load balancing on an active-active setup is also possible by routing the same user to the same server instance most of the time and session migration. All supported by any webserver.

If the database cannot be scaled up, it can make sense to move the stored procedure logic outside the database to a fleet of application servers that are very close to the database and support sessions as well. Maybe we would have written the code in ABAP and refer to the application servers as SAP ABAP Application Servers? This obviously increases the network latency but at least it is not in the 20ms range for each but rather 5ms.

Comparing the two approaches, we have traded scaling with overhead. The serverless architecture is all about unlimited scaling, the client server architecture all about efficient use of resources and separation by functionality.

A sensitive compromise

The simplest enhancement of the serverless architecture is to use stored procedures as well. The Business Logic is implemented in the database and not not in the backend tailored for a frontend. This eliminates the majority of the network hops but obviously assumes that there is something like a stored procedure for the database used (relational databases yes, DynamoDB no) and that all data is stored in a single database.

My customers are all large enterprises with 1000s (B2B) to millions (B2c) of end users using their systems. They are not of the size of Netflix and Amazon and never will be. Hence adopting an unlimited scale-able architecture at all costs is not necessary. For most customers, above assumptions are met in 99% of the cases.

Even session handling is supported by serverless architectures. There are multiple options, e.g. sticky sessions routing the same user to the same container always and that has the database connection open already. Or storing all session information either on the client inside a JWT token or inside a distributed cache service (memcache, redis). Serverless Functions (AWS Lambda) can be used with the latter two approaches. Not optimal from a network overhead perspective but maybe good enough for your use case.

And SAP?

Given that my speciality is the SAP environment, can we find the one or the other thought again?

The SAP ECC system as 3 level client server architecture is obvious. Application server owns the user session and the business logic and talks to the database. Both are in close proximity.

SAP ABAP Cloud is again an application server but has a greater distance to the database. In worst case the instance runs in BTP and talks with the onPrem database via SAP Cloud Connector. Speaking about latencies!

SAP Embedded Steampunk is an ABAP application server running next to the S/4 instance. All good, no complaints.

SAP BTP CAP is encapsulating the entire business logic in its “service” model thus has all the network latency issues we talked about.

S/4 ABAP code tries to pushdown as much as possible to the database into stored procedures or database views, thus gets the best performance and least latency.