Job Listings

Help debug latency of runpod serverless API

Upwork

I’m currently deploying a serverless API via Runpod and facing a challenge with an unexpected spike in latency.

The setup is simple: a script runs on a serverless worker with concurrency set to 32 workers. When I send a batch of 8 parallel requests (repeated 4 times for a total of 32 requests), the first batch has an expected average latency of around 0.5 seconds. However, the subsequent batches see the latency jump to around 5 seconds and stay there. I’m unsure of the cause for this behavior.

I've attached the sample inference/ping code. If you have experience in addressing similar issues, please share your hypothesis on the cause and how you would approach solving it.

Location: Anywhere

Posted: Oct. 8, 2024, 7:53 p.m.

Apply Now Company Website