Indexes in SQL Server

5 min
Advanced

An index is a data structure that improves the speed of data retrieval operations on a table.

Imagine you have a 1,000-page textbook and I ask you to find the definition of “Polymorphism”. You have two options:

Full Scan: You start on page 1, read page 2, page 3… until (hopefully) on page 750 you find it.
Index Seek: You go to the back of the book, look at the alphabetical index, search for “P”, find “Polymorphism: page 750” and go directly there.

The same happens in SQL Server. If you don’t have indexes, the engine performs a Full Table Scan (reads the entire table). If you have indexes, it goes directly to the data. The time difference can be minutes vs milliseconds.

In SQL Server, there are two fundamental types you must differentiate: Clustered and Non-Clustered.

Characteristic	Clustered	Non-Clustered
Quantity	Maximum 1 per table	Multiple per table
Physical	Physically sorts the data	Separate logical structure
Content	Contains all the row data	Contains the key + pointer
Speed	Fastest for reading	Fast, but may require a Lookup
Ideal Use	Primary Key, range columns (Dates), frequently sorted columns	Foreign keys, columns used in frequent WHERE filters

Let’s look at them in detail 👇

Clustered Index

The Clustered index is the table itself. That is, it determines the physical order in which data is stored on disk.

Think of an address book. The data itself (first and last names) is sorted alphabetically. There is no separate list; the book itself is the sorted list.

There can only be ONE Clustered Index per table. (Logically, you cannot physically order the pages of a book by Last Name and by Date of Birth at the same time).

Relationship with the Primary Key

By default, when you create a PRIMARY KEY, SQL Server automatically creates a Clustered index on that column. That’s why searches by ID are so fast.

-- Create a Clustered index explicitly (if no PK existed)
CREATE CLUSTERED INDEX IX_Clientes_Email
ON Clientes(Email);

Copied!

Choose your Primary Key carefully, it’s usually the column you search by the most (ID, Date, Code).

If you create a Clustered Index on a random column (like Estado), the entire table will be physically reordered on disk.

Non-Clustered Index

The Non-Clustered index is a structure separate from the main table. It contains the values of the indexed column and a pointer to the actual row.

Continuing with the examples, it’s the index at the back of a book. It’s a separate list sorted alphabetically that tells you: “Concept X is on page 5”. You look it up in the index and then have to go to the page (this is called a Lookup).

You can have multiple Non-Clustered indexes on a table (up to 999, although I don’t recommend it 😊).

They are ideal for columns you frequently use in WHERE or JOIN clauses but which are not the primary key.

-- Create an index to search quickly by Last Name
CREATE NONCLUSTERED INDEX IX_Clientes_Apellido
ON Clientes(Apellido);

Copied!

Now, if you run SELECT * FROM Clientes WHERE Apellido = 'García', SQL will use this index instead of reading the entire table.

Technically, an index is a data structure (usually a B-Tree or B+ Tree) that stores the values of one or more columns in a sorted order, along with pointers that tell us where to find the rest of the information.

Sometimes, going from the index to the main table (Lookup) is expensive if done many times. SQL Server allows us to “attach” additional columns to the leaf level of the Non-Clustered index so we don’t have to go fetch them from the main table. This is called a Covering Index.

-- Index that sorts by Last Name, but already "includes" the Email
CREATE NONCLUSTERED INDEX IX_Clientes_Apellido_Include_Email
ON Clientes(Apellido)
INCLUDE (Email);

Copied!

If you run:

SELECT Email FROM Clientes WHERE Apellido = 'García';

Copied!

The engine will find the last name in the index and see that the Email is already there. It doesn’t need to touch the real table! The query is instant.

Why not index everything? The Cost

If indexes are so good, why don’t we put one on every column? Because indexes are not free.

Space: They take up space on disk and in RAM.
Maintenance (The most important part): Every time you do an INSERT, UPDATE, or DELETE, SQL Server has to update the table AND ALSO all the indexes that point to that table.

A table with 20 indexes will be very fast to read (SELECT), but slow to write.

Clustered Index

Relationship with the Primary Key

Non-Clustered Index

The INCLUDE Trick (Covering Indexes)

Why not index everything? The Cost