SQL и индексы

Индексы и SQL глазами backend-инженера на Go — устройство B-tree, типы индексов PostgreSQL, составной индекс, агрегация, оконные функции, join-ы, партиционирование и VACUUM.

IT Abyss

SQL и индексы

Сервис на Go редко бывает сложным сам по себе — горутина приняла запрос, сходила в PostgreSQL по SQL, отдала ответ. Вся настоящая борьба за производительность сосредоточена в том, как написан запрос и какие индексы под ним стоят. database/sql и pgx дают прямой доступ к SQL, но не защищают ни от одной концептуальной ошибки: лишний индекс, неверный порядок колонок в составном индексе, COUNT(*) вместо COUNT(col), отстающий autovacuum — всё это компилируется, проходит локальные тесты и проявляется только на реальном объёме данных под нагрузкой.

Ловушки тут не про синтаксис SELECT. Кандидаты считают индекс бесплатным ускорителем чтения и забывают про его цену на каждом INSERT. Ждут, что составной индекс (a, b, c) ускорит фильтр по b в одиночку. Берут B-tree на jsonb и удивляются, почему он бесполезен. Путают RANK с DENSE_RANK, называют партиционирование шардированием, считают, что обычный VACUUM возвращает место операционной системе. Эта тема разбирает SQL и индексы по слоям — от устройства B-tree до уборки мёртвых версий строк — так, чтобы каждый из этих вопросов вы закрывали механизмом, а не заученной фразой.

Карта темы

Основы индексов — что такое индекс под капотом, почему по умолчанию B-tree, и какую цену он берёт на каждой записи.
Типы индексов PostgreSQL — B-tree, Hash, GIN, GiST, SP-GiST, BRIN и класс запросов, под который заточен каждый.
Составной индекс — одно B-tree по (a, b, c), правило левого префикса и почему порядок колонок — проектное решение.
Агрегация в SQL — GROUP BY с COUNT/SUM, разница WHERE и HAVING и ловушка COUNT(*) против COUNT(col) при LEFT JOIN.
Оконные функции — OVER (PARTITION BY ... ORDER BY ...), ранжирование без схлопывания строк и RANK против DENSE_RANK.
Self-join — соединение таблицы с собой через два псевдонима для иерархий и парного сравнения строк одной таблицы.
Anti-join — поиск строк без пары через LEFT JOIN ... IS NULL или NOT EXISTS и ловушка NOT IN с NULL.
Партиционирование таблиц — разбиение одной таблицы на дочерние по ключу с отсечением партиций; почему это не шардирование.
VACUUM в PostgreSQL — уборка мёртвых версий строк MVCC, заморозка XID и раздувание при долгой транзакции.
Проблема N+1 — почему запрос за списком плюс запрос на каждую строку дают 1 + N обращений к базе, и как свести их к одному через JOIN или батч.

Частые ошибки и ловушки

Ошибка	Последствие
Считать индекс бесплатным ускорителем чтения	Упустить цену записи — каждый `INSERT`/`UPDATE`/`DELETE` дополнительно правит каждый индекс
Считать, что любой индекс — это `B-tree`	Промахнуться с типом — `jsonb` и массивам нужен `GIN`, не `B-tree`
Использовать `Hash` для запроса диапазона	`Hash` обслуживает только равенство; диапазон уйдёт в seq scan
Ждать, что индекс `(a, b, c)` ускорит фильтр по `b` в одиночку	Индекс обслуживает только ведущий префикс — остальное уходит в seq scan
Ставить range-столбец раньше equality-столбца в составном индексе	После диапазона правые столбцы по дереву не отсекаются — индекс работает вполовину
Считать `COUNT(*)` и `COUNT(col)` взаимозаменяемыми	`COUNT(col)` пропускает `NULL` — при `LEFT JOIN` даёт ложную 1 вместо 0 для пустых групп
Фильтровать по агрегату в `WHERE`	Агрегата там ещё нет — фильтр по `SUM`/`COUNT` идёт только в `HAVING`
Путать `RANK` и `DENSE_RANK`	После ничьей `RANK` оставляет пропуск (1,1,3), `DENSE_RANK` — нет (1,1,2)
Фильтровать по оконной функции в `WHERE` того же `SELECT`	Окна считаются после `WHERE` — нужен подзапрос
Писать anti-join через `NOT IN` с подзапросом	Один `NULL` в подзапросе обнуляет весь результат — используйте `NOT EXISTS`
Называть партиционирование шардированием	Партиции остаются на одном сервере; шардирование раскидывает данные по узлам
Считать, что обычный `VACUUM` возвращает место на диск ОС	Он лишь освобождает место для повторного использования; ОС отдаёт только `VACUUM FULL`
Грузить связанные строки в цикле (N+1)	1 + N обращений к базе вместо одного `JOIN` или батча через `IN (...)`

Значение для собеседований

SQL и индексы — обязательная тема на любом backend-интервью, и спрашивают не «знаешь ли ты слово индекс», а умеешь ли ты рассуждать о цене записи и о том, какой запрос индекс ускорит, а какой — нет.

Что обычно проверяют:

Что такое индекс, почему B-tree по умолчанию и какую цену он берёт на записи.
Какие типы индексов есть в PostgreSQL и под какой класс запросов заточен каждый.
Как работает составной индекс — правило левого префикса и почему порядок колонок решает.
Разницу WHERE и HAVING и ловушку COUNT(*) против COUNT(col) после LEFT JOIN.
Чем оконная функция отличается от агрегата и в чём разница RANK/DENSE_RANK.
Как выражают self-join и anti-join и почему NOT IN с NULL опасен.
Чем партиционирование отличается от шардирования и когда оно оправдано.
Что чистит VACUUM, почему он не отдаёт место ОС и как долгая транзакция держит горизонт уборки.

Типичный неверный ответ: «индекс — это всегда хорошо, чем их больше, тем быстрее». Это запускает разбор того, что каждый индекс — это налог на каждую запись, что лишний неиспользуемый индекс только замедляет вставки, и что тип индекса нужно подбирать под класс запроса (GIN для jsonb, BRIN для огромных упорядоченных таблиц), а не ставить B-tree на всё подряд.

intermediate

SQL & Indexing

Indexes and SQL through a Go backend engineer's eyes — B-tree internals, PostgreSQL index types, the composite index, aggregation, window functions, joins, partitioning, and VACUUM.

go-databases

Practice: 22 questions · 8 tasks →

Sections

Contents

Detailed explanation

SQL & Indexing

A Go service is rarely complex on its own — a goroutine takes a request, hits PostgreSQL over SQL, returns a response. The real fight for performance sits in how the query is written and which indexes back it. database/sql and pgx give you direct SQL access but protect you from no conceptual mistake: a redundant index, the wrong column order in a composite index, COUNT(*) instead of COUNT(col), a lagging autovacuum — all of it compiles, passes local tests, and surfaces only at real data volume under load.

The traps here are not about SELECT syntax. Candidates treat an index as a free read accelerator and forget its cost on every INSERT. They expect a composite (a, b, c) index to speed up a filter on b alone. They put a B-tree on jsonb and wonder why it is useless. They confuse RANK with DENSE_RANK, call partitioning sharding, and believe a plain VACUUM returns space to the operating system. This topic dissects SQL and indexes layer by layer — from B-tree internals to the cleanup of dead row versions — so you answer each of these questions with a mechanism, not a memorized phrase.

Topic Map

Index Basics — what an index is under the hood, why B-tree by default, and the cost it charges on every write.
PostgreSQL Index Types — B-tree, Hash, GIN, GiST, SP-GiST, BRIN and the query class each is built for.
Composite Index — one B-tree over (a, b, c), the left-prefix rule, and why column order is a design decision.
SQL Aggregation — GROUP BY with COUNT/SUM, the difference between WHERE and HAVING, and the COUNT(*) vs COUNT(col) trap on a LEFT JOIN.
Window Functions — OVER (PARTITION BY ... ORDER BY ...), ranking without collapsing rows, and RANK vs DENSE_RANK.
Self-Join — joining a table to itself via two aliases for hierarchies and pairwise row comparison within one table.
Anti-Join — finding rows with no match via LEFT JOIN ... IS NULL or NOT EXISTS, and the NOT IN with NULL trap.
Table Partitioning — splitting one table into children by key with partition pruning; why this is not sharding.
VACUUM in PostgreSQL — cleaning up MVCC dead row versions, freezing XID, and bloat from a long-running transaction.
The N+1 Problem — why one query for the list plus one query per row makes 1 + N database round-trips, and how to collapse them into one with a JOIN or a batch.

Common Mistakes and Traps

Mistake	Consequence
Treating an index as a free read accelerator	Missing the write cost — every `INSERT`/`UPDATE`/`DELETE` also updates every index
Believing any index is a `B-tree`	Mismatching the type — `jsonb` and arrays need `GIN`, not `B-tree`
Using `Hash` for a range query	`Hash` serves equality only; a range falls back to a seq scan
Expecting an `(a, b, c)` index to speed up a filter on `b` alone	The index serves the leading prefix only — the rest goes to a seq scan
Putting a range column before an equality column in a composite index	After a range, columns to its right are not narrowed — the index works at half power
Treating `COUNT(*)` and `COUNT(col)` as interchangeable	`COUNT(col)` skips `NULL` — on a `LEFT JOIN` it gives a false 1 instead of 0 for empty groups
Filtering by an aggregate in `WHERE`	No aggregate exists yet — filtering by `SUM`/`COUNT` belongs in `HAVING`
Confusing `RANK` and `DENSE_RANK`	After a tie `RANK` leaves a gap (1,1,3), `DENSE_RANK` does not (1,1,2)
Filtering by a window function in the `WHERE` of the same `SELECT`	Windows are computed after `WHERE` — wrap it in a subquery
Writing an anti-join via `NOT IN` with a subquery	One `NULL` in the subquery zeroes the whole result — use `NOT EXISTS`
Calling partitioning sharding	Partitions stay on one server; sharding spreads data across nodes
Believing a plain `VACUUM` returns disk space to the OS	It only frees space for reuse; only `VACUUM FULL` returns it to the OS
Loading related rows in a loop (N+1)	1 + N database round-trips instead of one `JOIN` or an `IN (...)` batch

Interview Relevance

SQL and indexes are a mandatory topic on any backend interview, and the question is not "do you know the word index" but whether you can reason about the write cost and about which queries an index speeds up and which it does not.

What interviewers check:

What an index is, why B-tree by default, and the cost it charges on writes.
Which index types PostgreSQL offers and the query class each is built for.
How a composite index works — the left-prefix rule and why column order decides.
The difference between WHERE and HAVING and the COUNT(*) vs COUNT(col) trap after a LEFT JOIN.
How a window function differs from an aggregate and the difference between RANK/DENSE_RANK.
How a self-join and an anti-join are expressed and why NOT IN with NULL is dangerous.
How partitioning differs from sharding and when it is justified.
What VACUUM cleans, why it does not return space to the OS, and how a long transaction holds the cleanup horizon.

A typical wrong answer: "an index is always good, the more of them the faster". That triggers a discussion of how every index is a tax on every write, how a redundant unused index only slows inserts down, and how the index type must match the query class (GIN for jsonb, BRIN for huge ordered tables) rather than putting a B-tree on everything.

Why it matters

A Go backend is almost always a thin layer over a database, and it reads it through SQL. Fail to grasp what an index costs on writes, why a filter on `b` that misses a composite `(a, b)` index falls back to a seq scan, how `COUNT(*)` lies after a `LEFT JOIN`, and why a plain `VACUUM` does not return disk to the OS — and you write queries that pass tests and crash production once the data grows.

intermediate

SQL & Indexing

Indexes and SQL through a Go backend engineer's eyes — B-tree internals, PostgreSQL index types, the composite index, aggregation, window functions, joins, partitioning, and VACUUM.

go-databases

Practice: 22 questions · 8 tasks →

Sections

Contents

Detailed explanation

SQL & Indexing

Topic Map

Index Basics — what an index is under the hood, why B-tree by default, and the cost it charges on every write.
PostgreSQL Index Types — B-tree, Hash, GIN, GiST, SP-GiST, BRIN and the query class each is built for.
Composite Index — one B-tree over (a, b, c), the left-prefix rule, and why column order is a design decision.
SQL Aggregation — GROUP BY with COUNT/SUM, the difference between WHERE and HAVING, and the COUNT(*) vs COUNT(col) trap on a LEFT JOIN.
Window Functions — OVER (PARTITION BY ... ORDER BY ...), ranking without collapsing rows, and RANK vs DENSE_RANK.
Self-Join — joining a table to itself via two aliases for hierarchies and pairwise row comparison within one table.
Anti-Join — finding rows with no match via LEFT JOIN ... IS NULL or NOT EXISTS, and the NOT IN with NULL trap.
Table Partitioning — splitting one table into children by key with partition pruning; why this is not sharding.
VACUUM in PostgreSQL — cleaning up MVCC dead row versions, freezing XID, and bloat from a long-running transaction.
The N+1 Problem — why one query for the list plus one query per row makes 1 + N database round-trips, and how to collapse them into one with a JOIN or a batch.

Common Mistakes and Traps

Mistake	Consequence
Treating an index as a free read accelerator	Missing the write cost — every `INSERT`/`UPDATE`/`DELETE` also updates every index
Believing any index is a `B-tree`	Mismatching the type — `jsonb` and arrays need `GIN`, not `B-tree`
Using `Hash` for a range query	`Hash` serves equality only; a range falls back to a seq scan
Expecting an `(a, b, c)` index to speed up a filter on `b` alone	The index serves the leading prefix only — the rest goes to a seq scan
Putting a range column before an equality column in a composite index	After a range, columns to its right are not narrowed — the index works at half power
Treating `COUNT(*)` and `COUNT(col)` as interchangeable	`COUNT(col)` skips `NULL` — on a `LEFT JOIN` it gives a false 1 instead of 0 for empty groups
Filtering by an aggregate in `WHERE`	No aggregate exists yet — filtering by `SUM`/`COUNT` belongs in `HAVING`
Confusing `RANK` and `DENSE_RANK`	After a tie `RANK` leaves a gap (1,1,3), `DENSE_RANK` does not (1,1,2)
Filtering by a window function in the `WHERE` of the same `SELECT`	Windows are computed after `WHERE` — wrap it in a subquery
Writing an anti-join via `NOT IN` with a subquery	One `NULL` in the subquery zeroes the whole result — use `NOT EXISTS`
Calling partitioning sharding	Partitions stay on one server; sharding spreads data across nodes
Believing a plain `VACUUM` returns disk space to the OS	It only frees space for reuse; only `VACUUM FULL` returns it to the OS
Loading related rows in a loop (N+1)	1 + N database round-trips instead of one `JOIN` or an `IN (...)` batch

Interview Relevance

What interviewers check:

What an index is, why B-tree by default, and the cost it charges on writes.
Which index types PostgreSQL offers and the query class each is built for.
How a composite index works — the left-prefix rule and why column order decides.
The difference between WHERE and HAVING and the COUNT(*) vs COUNT(col) trap after a LEFT JOIN.
How a window function differs from an aggregate and the difference between RANK/DENSE_RANK.
How a self-join and an anti-join are expressed and why NOT IN with NULL is dangerous.
How partitioning differs from sharding and when it is justified.
What VACUUM cleans, why it does not return space to the OS, and how a long transaction holds the cleanup horizon.

Why it matters