Устойчивость и латентность

Латентность и перцентили, таймауты и распространение дедлайнов, ретраи с backoff и jitter, circuit breaker, bulkhead, backpressure, сброс нагрузки, hedged-запросы и слияние запросов — как держать Go-сервис быстрым и стабильным под нагрузкой и при частичных отказах.

IT Abyss

Устойчивость и латентность

Сервис, который летает на пустом стенде, в проде живёт в другом мире: запросы разлетаются на десятки зависимостей, какая-то из них тормозит или ложится, нагрузка скачет, и вопрос не в том, случится ли частичный отказ, а в том, превратит ли его сервис в управляемую деградацию или в обвал. Эта тема — про приёмы, которые держат Go-сервис быстрым и стабильным именно в этих условиях: как мерить латентность, как ограничивать время вызовов, как повторять и когда переставать звонить в мёртвую зависимость, как изолировать ресурсы и как сбрасывать лишнее, не падая целиком.

Главная ловушка темы — оптимистичное мышление: «среднее в норме, таймаут поставим потом, на ошибке просто повторим». Среднее прячет хвост, в который под fan-out попадает значимая доля пользователей; отсутствие таймаута и распространения дедлайна копит зависшие горутины; наивный ретрай без backoff и jitter добивает восстанавливающуюся зависимость синхронным штормом. Дальше всё решает дисциплина отказа: circuit breaker, который перестаёт звонить в мёртвую зависимость и даёт ей подняться; bulkhead, изолирующий ресурсы, чтобы одна зависимость не утопила всё; backpressure и сброс нагрузки, которые тормозят источник и отдают 503/429 быстро вместо тихого OOM; и наконец hedged-запросы и слияние запросов, срезающие хвост и дедуплицирующие одинаковые вызовы. Тема разобрана по слоям — от измерения латентности до защиты горячего ключа.

Карта темы

Латентность и перцентили — латентность против пропускной способности и конкурентности (закон Литтла), измерение перцентилей p50/p95/p99/p999 вместо среднего и доминирование хвоста под fan-out.
Таймауты и дедлайны — каждый исходящий вызов ограничен таймаутом, а дедлайн в context.Context распространяется вниз по цепочке, и каждый хоп вычитает своё время, не сбрасывая бюджет.
Ретраи, backoff и jitter — повторять только идемпотентное на временной ошибке, с экспоненциальным backoff и jitter против синхронного шторма, ограничивая бюджетом ретраев.
Circuit breaker — автомат closed → open → half-open, который при росте доли ошибок размыкается и мгновенно отказывает локально, периодически пробуя восстановление зависимости.
Bulkhead — изоляция ресурсов по каждой зависимости (отдельные пулы горутин и соединений, ограниченные очереди), чтобы одна засбоившая зависимость не выела все ресурсы сервиса.
Backpressure — при перегрузе сигнал «притормози» вверх по течению вместо безграничного буфера: ограниченные каналы и очереди гасят всплески, а на заполнении блокируют или отбрасывают.
Сброс нагрузки — под перегрузом осознанно отбрасывать лишнюю и менее важную работу ради ключевого пути, отдавая 503/429 быстро, пока система ещё не рухнула.
Hedged-запросы — подождав около p95, отправить вторую копию запроса другой реплике и взять первый ответ, срезая хвост ценой ограниченной доли дополнительной нагрузки.
Слияние запросов — схлопывать одновременные одинаковые in-flight запросы в один вызов (singleflight), защищая горячий ключ от cache stampede и thundering herd.

Частые ошибки и ловушки

Ошибка	Последствие
Мерить латентность средним	Низкое среднее прячет хвост, в который попадает значимая доля пользователей
Игнорировать хвост под fan-out	Одна редкая медленная зависимость делает медленным p99 всему запросу
Не ставить таймаут на исходящий вызов	Зависший вызов держит горутину, коннект и память, пока сервис не утонет
Сбрасывать дедлайн на каждом хопе вместо распространения	Цепочка работает дольше бюджета клиента, который уже ушёл
Повторять неидемпотентную операцию	Ретрай платежа списывает деньги дважды
Ретраить без backoff и jitter	Синхронный шторм ретраев добивает восстанавливающуюся зависимость
Звонить в уже легшую зависимость	Каскадный отказ: вызовы копятся по таймауту и не дают ей подняться
Не изолировать ресурсы по зависимостям	Одна засбоившая зависимость выедает все горутины и коннекты — падает весь сервис
Копить входящие в неограниченный буфер	Память течёт до OOM вместо сигнала «притормози» вверх по течению
Тонуть под перегрузом, обслуживая всех	Медленная смерть всего сервиса вместо быстрого `503`/`429` лишним запросам
Слать N одинаковых запросов на горячий ключ	Thundering herd / cache stampede кладёт зависимость под one-hit нагрузкой

Значение для собеседований

Устойчивость и латентность — стержень senior-части Go-интервью по system design: проверяют не знание названий паттернов, а понимание, как сервис ведёт себя под нагрузкой и при частичном отказе. Интервьюер смотрит, меряете ли вы перцентили, а не среднее, помните ли про доминирование хвоста под fan-out, ставите ли таймаут на каждый исходящий вызов и распространяете ли дедлайн, отличаете ли идемпотентный ретрай от опасного и знаете ли, когда перестать звонить в мёртвую зависимость, как изолировать ресурсы и как деградировать управляемо вместо обвала.

Что обычно проверяют:

Чем латентность отличается от пропускной способности и конкурентности и почему меряют перцентили p50/p95/p99/p999, а не среднее.
Зачем нужен таймаут на каждом исходящем вызове и как распространять дедлайн через context.Context вниз по цепочке.
Что можно ретраить, почему нужен экспоненциальный backoff с jitter и зачем бюджет ретраев против амплификации.
Как работает circuit breaker (closed → open → half-open) и почему fail fast останавливает каскадный отказ.
Чем bulkhead изолирует ресурсы и почему без него одна зависимость топит весь сервис.
Чем backpressure отличается от сброса нагрузки и почему буфер нельзя растить без границ.
Как hedged-запросы и слияние запросов срезают хвост и защищают горячий ключ и чем за это платят.

Типичный неверный ответ: «на ошибке просто повторим, а среднее у нас в норме». Это запускает разбор того, что среднее прячет хвост, а под fan-out именно хвост определяет p99; что ретрай без backoff и jitter добивает восстанавливающуюся зависимость, а неидемпотентный ретрай списывает деньги дважды; что без таймаута и circuit breaker вызовы в мёртвую зависимость копятся и дают каскадный отказ; и что под перегрузом нужно осознанно сбрасывать лишнее и отдавать 503/429 быстро, а не тонуть, обслуживая всех.

advanced

Resilience and Latency

Latency and percentiles, timeouts and deadline propagation, retries with backoff and jitter, circuit breaker, bulkhead, backpressure, load shedding, hedged requests and request coalescing — keeping a Go service fast and stable under load and partial failure.

go-sd-resilience

Practice: 9 questions · 1 task →

Sections

Contents

Detailed explanation

Resilience and Latency

A service that flies on an empty bench lives in a different world in production: requests fan out to dozens of dependencies, one of them slows down or falls over, load spikes — and the question is not whether partial failure happens, but whether the service turns it into controlled degradation or a collapse. This topic is about the techniques that keep a Go service fast and stable under exactly these conditions: how to measure latency, how to bound call time, how to retry and when to stop calling a dead dependency, how to isolate resources, and how to shed excess without going down whole.

The core trap of the topic is optimistic thinking: "the average is fine, we'll add a timeout later, on an error we'll just retry." The average hides the tail, which under fan-out catches a significant share of users; the absence of a timeout and deadline propagation piles up stuck goroutines; a naive retry without backoff and jitter finishes off a recovering dependency with a synchronized storm. Beyond that it comes down to failure discipline: a circuit breaker that stops calling a dead dependency and lets it recover; a bulkhead that isolates resources so one dependency can't sink everything; backpressure and load shedding that slow the source and return 503/429 fast instead of a silent OOM; and finally hedged requests and request coalescing that cut the tail and dedupe identical calls. The topic is laid out in layers — from measuring latency to protecting a hot key.

Topic map

Latency and percentiles — latency versus throughput and concurrency (Little's law), measuring p50/p95/p99/p999 percentiles instead of the average, and tail dominance under fan-out.
Timeouts and deadlines — every outbound call is bounded by a timeout, while a deadline in context.Context propagates down the chain, and each hop subtracts its own time without resetting the budget.
Retries, backoff and jitter — retry only idempotent work on a transient error, with exponential backoff and jitter against a synchronized storm, capped by a retry budget.
Circuit breaker — a closed → open → half-open machine that trips open when the error rate climbs and fails fast locally, periodically probing the dependency's recovery.
Bulkhead — isolating resources per dependency (separate goroutine and connection pools, bounded queues) so one misbehaving dependency can't exhaust all the service's resources.
Backpressure — under overload, signal "slow down" upstream instead of an unbounded buffer: bounded channels and queues absorb spikes and block or shed once full.
Load shedding — under overload, deliberately drop excess and lower-priority work to protect the core path, returning 503/429 fast before the system collapses.
Hedged requests — after waiting around p95, send a second copy of the request to another replica and take the first answer, cutting the tail at the cost of a capped amount of extra load.
Request coalescing — collapse concurrent identical in-flight requests into one call (singleflight), protecting a hot key from a cache stampede and thundering herd.

Common mistakes and traps

Mistake	Consequence
Measuring latency by the average	A low mean hides the tail that a significant share of users land in
Ignoring the tail under fan-out	One rare slow dependency makes the whole request's p99 slow
No timeout on an outbound call	A stuck call holds a goroutine, connection and memory until the service drowns
Resetting the deadline at each hop instead of propagating	The chain runs past the client's budget after the client has already left
Retrying a non-idempotent operation	A payment retry charges the money twice
Retrying without backoff and jitter	A synchronized retry storm finishes off a recovering dependency
Calling an already-downed dependency	Cascading failure: calls pile up on timeout and keep it from recovering
Not isolating resources per dependency	One misbehaving dependency eats all goroutines and connections — the whole service falls
Buffering incoming work unbounded	Memory leaks to OOM instead of a "slow down" signal upstream
Drowning under overload while serving everyone	A slow death of the whole service instead of a fast `503`/`429` to excess requests
Sending N identical requests for a hot key	Thundering herd / cache stampede crushes the dependency under one-hit load

Why it matters for interviews

Resilience and latency are the spine of the senior part of a Go system-design interview: what's tested is not knowing the pattern names but understanding how a service behaves under load and partial failure. The interviewer watches whether you measure percentiles rather than the average, remember tail dominance under fan-out, put a timeout on every outbound call and propagate the deadline, tell an idempotent retry from a dangerous one, and know when to stop calling a dead dependency, how to isolate resources, and how to degrade in a controlled way instead of collapsing.

What they usually check:

How latency differs from throughput and concurrency and why you measure p50/p95/p99/p999 percentiles, not the average.
Why every outbound call needs a timeout and how to propagate the deadline through context.Context down the chain.
What is safe to retry, why exponential backoff with jitter is needed, and why a retry budget guards against amplification.
How a circuit breaker works (closed → open → half-open) and why failing fast stops cascading failure.
How a bulkhead isolates resources and why without it one dependency sinks the whole service.
How backpressure differs from load shedding and why the buffer must not grow without bound.
How hedged requests and request coalescing cut the tail and protect a hot key, and what they cost.

The typical wrong answer: "on an error we'll just retry, and our average is fine." That opens up a discussion of how the average hides the tail, while under fan-out it is the tail that sets the p99; that a retry without backoff and jitter finishes off a recovering dependency, while a non-idempotent retry charges the money twice; that without a timeout and circuit breaker calls to a dead dependency pile up and cause cascading failure; and that under overload you must deliberately shed excess and return 503/429 fast rather than drown while serving everyone.

Why it matters

A service that is fast and stable on an empty bench behaves completely differently under load and partial failure — and that is exactly what senior interviews probe. Measure averages instead of percentiles and you miss the tail that a significant share of users land in; under fan-out that tail dominates the p99 of the whole request. Skip the timeout and deadline propagation down the chain and you pile up stuck goroutines and connections until the service drowns. Retry non-idempotent work without jitter and you finish off a recovering dependency with a synchronized retry storm. Fail to isolate resources, never trip the circuit on a dead dependency, and buffer without bound, and you get cascading failure and OOM. Resilience is the set of techniques that turn partial failure into controlled degradation instead of collapse.