# 一次Java后端服务间歇性响应慢的问题排查(入门)

分享一个之前在公司内遇到的一个后端服务连接超时问题,问题的表现是服务部署到线上后出现间歇性请求响应非常慢(大于10s),但是后端业务分析业务日志时却没有发现慢请求,另外由于服务pod livenessProbe也出现超时,导致容器出现间歇性重启。

# 复现

该服务基于spring-boot开发,通过spring-mvc框架对外提供一些web接口,业务简化后代码如下:

@Controller
@SpringBootApplication
public class Bootstrap {

    public static void main(String[] args) {
        SpringApplication.run(Bootstrap.class, args);
    }
    
    @GetMapping("/ping")
    public String ping() {
        return "pong";
    }
}

客户端访问该服务(记为backend)的路径为: client => ingress => backend,客户端的代码简化如下,其实就是在一个循环里面持续访问ingress(这里以一个nginx代替):

import time
import requests

while True:
    try:
        start = time.time()
        r = requests.get('http://nginx/ping', timeout=(3, 10))
        spend = int((time.time() - start) * 1000)
        r.raise_for_status()
        print(f'{time.strftime("%Y-%m-%dT%H:%M:%S")} OK {spend}ms {r.content.decode("utf-8")}')
    except requests.HTTPError as err:
        print(f'{time.strftime("%Y-%m-%dT%H:%M:%S")} HTTP error: {err}')
    except Exception as err:
        print(f'{time.strftime("%Y-%m-%dT%H:%M:%S")} Error: {err}')
    time.sleep(0.1)

下面是一个docker-compose文件构造了一个最小可复现的环境:

version: '3'
services:
  backend:
    image: shawyeok/128-slowbackend:backend

  nginx:
    image: shawyeok/128-slowbackend:nginx
    depends_on:
      - backend

  client:
    image: shawyeok/128-slowbackend:client
    depends_on:
      - nginx

通过docker-compose启动后,检查client容器的日志,你将会在client看到间歇性出现read timeout的记录

$ docker-compose up -d
$ docker ps
$ docker logs -f xxx-client-1
2024-05-23T08:02:51 OK 52ms pong
2024-05-23T08:02:51 OK 6ms pong
2024-05-23T08:02:51 OK 3ms pong
2024-05-23T08:02:51 OK 5ms pong
2024-05-23T08:02:51 OK 17ms pong
2024-05-23T08:02:51 OK 14ms pong
2024-05-23T08:02:51 OK 11ms pong
2024-05-23T08:02:51 OK 16ms pong
2024-05-23T08:02:52 OK 7ms pong
2024-05-23T08:02:52 OK 10ms pong
2024-05-23T08:02:52 OK 6ms pong
2024-05-23T08:02:52 OK 8ms pong
2024-05-23T08:03:02 Error: HTTPConnectionPool(host='nginx', port=80): Read timed out. (read timeout=10)
2024-05-23T08:03:12 Error: HTTPConnectionPool(host='nginx', port=80): Read timed out. (read timeout=10)
2024-05-23T08:03:12 OK 15ms pong
2024-05-23T08:03:12 OK 15ms pong
2024-05-23T08:03:12 OK 15ms pong

完整的复现代码在Shawyeok/128-slowbackend (opens new window),读者看到这里可以先尝试通过上面步骤把环境运行起来自己动手分析一下原因。

# 分析

TBD