Ошибка	Последствие
Считать вертикаль конечной архитектурой	Упираешься в потолок самой большой машины и оставляешь единую точку отказа
Считать горизонталь бесплатной добавкой инстансов	Без stateless-инстансов второй под не находит сессию пользователя
Думать, что узлы дают линейное ускорение	Амдал и USL: после некоторого числа узлов координация съедает выигрыш
Игнорировать штраф USL за кросс-токинг	Новый узел не ускоряет, а замедляет — пропускная способность падает
Балансировать round-robin по разным по мощности машинам	Слабые инстансы перегружены, мощные простаивают — перекос нагрузки
Класть липкие сессии при состоянии в памяти инстанса	Нагрузка перекошена, а при падении пода сессия пользователя теряется
Держать состояние сессии в памяти процесса	Любой инстанс не взаимозаменяем — горизонталь и failover ломаются
Балансировать на статичный список инстансов	Запросы уходят на упавшие инстансы — нет health-aware discovery
Катить новую версию сразу на 100% трафика	Плохой релиз кладёт весь сервис без пути на быстрый откат

advanced

Scaling and Load Balancing

Vertical and horizontal scaling, Amdahl's, Gustafson's, and the Universal Scalability Law, load-balancing algorithms, session affinity, service discovery, and traffic splitting — how to grow a Go service under load without hitting a ceiling or killing throughput with coordination.

go-sd-scaling

Practice: 8 questions →

Sections

Contents

Detailed explanation

Scaling and Load Balancing

When a Go service stops keeping up with traffic, there are two directions to grow — give it a bigger machine or stand more copies next to it — and almost all the interest lies in what begins after "more copies": stateless instances, a load balancer that spreads requests, a registry that knows which instances are alive, and a careful rollout of a new version under traffic. While load is small, all of this looks like overhead; at scale it is exactly where load skew, failover outages, and production crashes are born.

The central trap of this topic is assuming scaling is linear and free: "low on capacity — add a node." It is not, and the scalability laws say so. Amdahl's law bounds speedup by the serial fraction of the code; the USL adds a coordination penalty, and beyond some number of nodes throughput does not rise but falls — the nodes contend and spend time coordinating. From there it is all geometry: stateless instances (not sticky sessions), a load balancer that accounts for capacity and latency, a registry that drops unhealthy instances, and traffic splitting that lets you roll back a bad release before it takes the service down. This topic breaks scaling into layers — from choosing the direction of growth to canarying a new version.

Topic Map

Vertical versus horizontal — growth by machine versus growth by copies: vertical is simple but hits a ceiling and a single point of failure, horizontal removes the ceiling at the cost of stateless instances, a load balancer, and consistency.
Scalability laws — Amdahl bounds speedup by the serial fraction, Gustafson keeps it linear as the problem grows, and the USL adds a crosstalk penalty beyond which a new node slows you down.
Load-balancing algorithms — round-robin, weighted, least-connections, latency-based selection, and hashing by key, plus the difference between L4 and L7 balancing.
Session affinity — stateless sessions via JWT or a shared store in Redis versus sticky sessions, which skew load and break on failover.
Service discovery — a registry (Consul, etcd, Kubernetes DNS) that maps a service name to healthy instances, with health checks and the difference between client-side and server-side discovery.
Traffic splitting — a gradual rollout of a version via canary, blue-green, and percentage splits driven by feature flags, with rollback by metrics.

Common Mistakes and Traps

Mistake	Consequence
Treating vertical scaling as a final architecture	You hit the ceiling of the largest machine and leave a single point of failure
Treating horizontal scaling as a free addition of instances	Without stateless instances the second pod cannot find the user's session
Assuming nodes give linear speedup	Amdahl and the USL: past some node count coordination eats the gains
Ignoring the USL's crosstalk penalty	A new node does not speed you up but slows you down — throughput falls
Round-robining across machines of unequal capacity	Weak instances are overloaded, strong ones idle — load skew
Using sticky sessions while state lives in instance memory	Load is skewed, and when a pod dies the user's session is lost
Keeping session state in process memory	No instance is interchangeable — horizontal scaling and failover break
Balancing onto a static list of instances	Requests go to dead instances — there is no health-aware discovery
Shipping a new version to 100% of traffic at once	A bad release takes the whole service down with no fast rollback

Interview Relevance

Scaling is a mandatory topic at the senior level of a Go interview in the system-design part, and the question is not whether you know the names of algorithms but whether you understand that growth has a cost. The interviewer checks whether you distinguish vertical from horizontal scaling by their obligations to the code, whether you remember that adding nodes is not linear (Amdahl and the USL), whether you pick a balancing algorithm for the real instance topology, and whether you keep sessions stateless so that any instance is interchangeable.

What interviewers usually check:

The difference between vertical and horizontal scaling and why the sensible order is to squeeze vertical first, then go horizontal.
What Amdahl's law, Gustafson's law, and the Universal Scalability Law say and why past some node count throughput falls.
Which load-balancing algorithms exist (round-robin, weighted, least-connections, latency-based, hashing) and how L4 differs from L7.
Why stateless sessions are preferable to sticky ones and how sticky sessions harm balance and failover.
How service discovery works through a registry with health checks and the difference between client-side and server-side discovery.
How to safely roll out a new version via canary, blue-green, and percentage splits with rollback by metrics.

A typical wrong answer: "scaling is just adding instances behind a load balancer." This triggers a discussion that without stateless instances the second pod won't find the session, that round-robin across machines of unequal capacity skews load, that Amdahl's law and the USL bound the gain from new nodes down to zero, and that shipping a release to all traffic at once gives up the only cheap way to roll back.

Why it matters

Scaling sounds like "just add more instances," but in practice it runs into physics and into coordination. Vertical scaling hits the ceiling of the largest machine and leaves a single point of failure; horizontal scaling removes the ceiling but demands stateless instances, a load balancer, and distributed consistency. Amdahl's law and the USL say it plainly — at some point a new node does not speed you up, it slows you down, as crosstalk and coordination eat the gains. Whoever runs sticky sessions instead of stateless ones, round-robins across machines of unequal capacity, and ships a new version to 100% of traffic at once gets load skew, failover outages, and a production crash with no way back.

Ошибка	Последствие
Считать вертикаль конечной архитектурой	Упираешься в потолок самой большой машины и оставляешь единую точку отказа
Считать горизонталь бесплатной добавкой инстансов	Без stateless-инстансов второй под не находит сессию пользователя
Думать, что узлы дают линейное ускорение	Амдал и USL: после некоторого числа узлов координация съедает выигрыш
Игнорировать штраф USL за кросс-токинг	Новый узел не ускоряет, а замедляет — пропускная способность падает
Балансировать round-robin по разным по мощности машинам	Слабые инстансы перегружены, мощные простаивают — перекос нагрузки
Класть липкие сессии при состоянии в памяти инстанса	Нагрузка перекошена, а при падении пода сессия пользователя теряется
Держать состояние сессии в памяти процесса	Любой инстанс не взаимозаменяем — горизонталь и failover ломаются
Балансировать на статичный список инстансов	Запросы уходят на упавшие инстансы — нет health-aware discovery
Катить новую версию сразу на 100% трафика	Плохой релиз кладёт весь сервис без пути на быстрый откат

advanced

Scaling and Load Balancing

go-sd-scaling

Practice: 8 questions →

Sections

Contents

Detailed explanation

Scaling and Load Balancing

Topic Map

Vertical versus horizontal — growth by machine versus growth by copies: vertical is simple but hits a ceiling and a single point of failure, horizontal removes the ceiling at the cost of stateless instances, a load balancer, and consistency.
Scalability laws — Amdahl bounds speedup by the serial fraction, Gustafson keeps it linear as the problem grows, and the USL adds a crosstalk penalty beyond which a new node slows you down.
Load-balancing algorithms — round-robin, weighted, least-connections, latency-based selection, and hashing by key, plus the difference between L4 and L7 balancing.
Session affinity — stateless sessions via JWT or a shared store in Redis versus sticky sessions, which skew load and break on failover.
Service discovery — a registry (Consul, etcd, Kubernetes DNS) that maps a service name to healthy instances, with health checks and the difference between client-side and server-side discovery.
Traffic splitting — a gradual rollout of a version via canary, blue-green, and percentage splits driven by feature flags, with rollback by metrics.

Common Mistakes and Traps

Mistake	Consequence
Treating vertical scaling as a final architecture	You hit the ceiling of the largest machine and leave a single point of failure
Treating horizontal scaling as a free addition of instances	Without stateless instances the second pod cannot find the user's session
Assuming nodes give linear speedup	Amdahl and the USL: past some node count coordination eats the gains
Ignoring the USL's crosstalk penalty	A new node does not speed you up but slows you down — throughput falls
Round-robining across machines of unequal capacity	Weak instances are overloaded, strong ones idle — load skew
Using sticky sessions while state lives in instance memory	Load is skewed, and when a pod dies the user's session is lost
Keeping session state in process memory	No instance is interchangeable — horizontal scaling and failover break
Balancing onto a static list of instances	Requests go to dead instances — there is no health-aware discovery
Shipping a new version to 100% of traffic at once	A bad release takes the whole service down with no fast rollback

Interview Relevance

What interviewers usually check:

The difference between vertical and horizontal scaling and why the sensible order is to squeeze vertical first, then go horizontal.
What Amdahl's law, Gustafson's law, and the Universal Scalability Law say and why past some node count throughput falls.
Which load-balancing algorithms exist (round-robin, weighted, least-connections, latency-based, hashing) and how L4 differs from L7.
Why stateless sessions are preferable to sticky ones and how sticky sessions harm balance and failover.
How service discovery works through a registry with health checks and the difference between client-side and server-side discovery.
How to safely roll out a new version via canary, blue-green, and percentage splits with rollback by metrics.

Why it matters

Масштабирование и балансировка нагрузки

Масштабирование и балансировка нагрузки

Карта темы

Частые ошибки и ловушки

Значение для собеседований

Scaling and Load Balancing

Scaling and Load Balancing

Topic Map

Common Mistakes and Traps

Interview Relevance

Масштабирование и балансировка нагрузки

Масштабирование и балансировка нагрузки

Карта темы

Частые ошибки и ловушки

Значение для собеседований

Scaling and Load Balancing

Scaling and Load Balancing

Topic Map

Common Mistakes and Traps

Interview Relevance