What is API Latency and How to Test It

6 min read

Your API returns the correct data. Your tests pass. Everything works perfectly—until it doesn't. Users start complaining that the app feels slow. Pages take forever to load. Buttons don't respond. The API is working, but it's working slowly.

This is the latency problem, and it's one of the most common yet overlooked issues in application development.

What is API Latency?

Latency is the time between sending a request and receiving the first byte of the response. It's not how long the entire download takes—it's how long you wait before anything starts happening.

Think of it like ordering at a restaurant. Latency is how long until the waiter acknowledges your order, not how long until the food arrives.

In API terms:

  • Low latency (< 100ms): Feels instant. Users don't notice any delay.
  • Moderate latency (100-300ms): Perceptible but acceptable. Users might notice a slight pause.
  • High latency (300ms-1s): Feels slow. Users start getting impatient.
  • Very high latency (> 1s): Frustrating. Users wonder if something is broken.

Why Latency Matters

Latency directly impacts user experience and business metrics.

User perception. Studies show users perceive delays over 100ms. At 1 second, users lose focus. At 10 seconds, users abandon the task entirely. Every 100ms of latency can reduce conversion rates by 1%.

Cascading effects. If your app makes 5 API calls to render a page, and each takes 300ms, that's 1.5 seconds minimum—assuming they run sequentially. High latency multiplies across your application.

Mobile and global users. Users on mobile networks or in distant geographic regions experience higher base latency. If your API adds 500ms on top of that, the experience becomes unusable.

Timeout failures. Most HTTP clients have default timeouts around 30 seconds. But user patience times out much sooner. An API that technically "works" but takes 10 seconds might as well be broken.

Components of API Latency

Total latency comes from multiple sources:

Network latency. The time for data to travel between client and server. Affected by geographic distance, network congestion, and connection quality. A request from New York to Singapore takes longer than New York to New Jersey.

DNS resolution. Converting the domain name to an IP address. Usually cached, but the first request to a new domain adds 20-120ms.

TLS handshake. Establishing a secure HTTPS connection. Adds 1-2 round trips (50-150ms typically). Subsequent requests reuse the connection.

Server processing. The time the server spends handling your request. Database queries, computation, external service calls—all add up.

Response transmission. Sending the response data back. Larger responses take longer, especially on slow connections.

How to Measure API Latency

You can't improve what you don't measure. Here are key latency metrics:

P50 (median). Half of requests are faster than this, half are slower. Represents the typical user experience.

P90. 90% of requests are faster than this. Shows what slower experiences look like.

P99. 99% of requests are faster than this. Reveals worst-case scenarios that still happen regularly.

P99.9. The extreme tail. Rare but important for high-traffic applications where 0.1% still means thousands of users.

Why percentiles matter: if your average latency is 100ms but P99 is 5 seconds, one in every hundred users has a terrible experience. Averages hide problems.

Common Latency Problems

Cold starts. Serverless functions and scaled-down containers take time to initialize. The first request after idle time is much slower than subsequent ones.

Database queries. Unoptimized queries, missing indexes, or N+1 query patterns can add seconds of latency. The API looks slow, but the database is the bottleneck.

External dependencies. Your API calls another API, which calls another API. Each hop adds latency. One slow dependency slows everything.

Large payloads. Returning more data than necessary increases transmission time. An endpoint returning 5MB when the client needs 5KB wastes everyone's time.

Geographic distance. Servers in one region serving users globally. A user in Tokyo hitting servers in London experiences unavoidable physics-based latency.

How to Test Latency Handling

Here's the challenge: your development environment is fast. Your local network has near-zero latency. The database is on the same machine. Everything feels instant.

Then you deploy to production, and users on 3G networks with high-latency connections have a completely different experience.

You need to simulate latency during development to build resilient applications.

Testing with Mocklantis

Mocklantis makes latency testing straightforward with built-in delay simulation:

Fixed delay. Add a constant delay to every response. Set 2000ms to see how your app handles a consistently slow API. Do your loading spinners work? Does the UI remain responsive?

Random delay. Configure a range (e.g., 100ms to 3000ms) for unpredictable latency. This tests how your app handles variable response times. Does your UI flicker when responses arrive in unexpected order?

Log-normal distribution. The most realistic option. Real-world latency isn't uniform—most requests are fast, but some are much slower. Log-normal distribution with a median of 200ms and sigma of 0.8 creates this pattern:

  • 50% of requests: 100-300ms
  • 40% of requests: 300-800ms
  • 10% of requests: 800ms-3s+

This "long tail" of slow requests is exactly what happens in production.

What to Test

Loading states. Does your app show appropriate feedback while waiting? Skeleton screens, spinners, or progress indicators should appear quickly.

Timeout handling. What happens when a request takes 30 seconds? Does your app show an error, or does it hang forever? Configure a timeout and test it.

Request cancellation. If a user navigates away while a request is pending, does your app cancel it? Slow responses make this scenario common.

Optimistic updates. If you show changes immediately before the API confirms them, what happens when the slow response finally arrives? Or when it fails?

Retry behavior. Does your app retry failed requests? What if retries are also slow? Test the complete failure and recovery flow.

Concurrent requests. If your app makes multiple API calls, how does it handle them completing in different orders due to variable latency?

Building for Latency

Once you understand latency, you can design around it:

Show something fast. Render the page structure immediately, then fill in data as it arrives. Users perceive this as faster than waiting for everything.

Set appropriate timeouts. Don't wait 30 seconds for an API that should respond in 200ms. Fail fast and show an error.

Implement retry with backoff. Temporary latency spikes happen. Retry once or twice with increasing delays before giving up.

Cache aggressively. The fastest API call is the one you don't make. Cache responses when possible.

Parallelize requests. If you need data from multiple endpoints, request them simultaneously rather than sequentially.

Latency is invisible in development and painful in production. Test it early, test it often, and build applications that stay responsive even when APIs don't.