When working with Entity Framework, one of its strengths is the ability to load related data between entities.
When we work with relational databases, it is common for entities to be related to each other (that’s why they are called “relational” 🤗).
For example, an entity Order
may be related to an entity Customer
, and in turn, a Customer
can have multiple Orders
.
When we perform a query to obtain an Order
, we may also need information about the associated Customer
.
Entity Framework allows loading these related data easily using the Include
and ThenInclude
methods.
Let’s see each of these methods 👇.
Using Include
The Include
method is used to load related data of an entity in a single query. Its syntax is straightforward:
var orders = context.Orders
.Include(o => o.Customer)
.ToList();
In this example,
- We are retrieving all
Orders
and, additionally, loading the information of the associatedCustomer
for eachOrder
. - This translates to a single SQL query that includes a
JOIN
between theOrders
andCustomers
tables.
Practical example
Let’s imagine we have the following entities:
public class Customer
{
public int CustomerId { get; set; }
public string Name { get; set; }
public ICollection<Order> Orders { get; set; }
}
public class Order
{
public int OrderId { get; set; }
public DateTime Date { get; set; }
public int CustomerId { get; set; }
public Customer Customer { get; set; }
}
If we want to retrieve all Orders
along with the information of the associated Customer
, we can do it as follows:
var orders = context.Orders
.Include(o => o.Customer)
.ToList();
This will generate a SQL query similar to:
SELECT o.*, c.*
FROM Orders o
INNER JOIN Customers c ON o.CustomerId = c.CustomerId;
Loading multiple relationships
In some cases, you may need to load multiple relationships in a single query. For example, if an Order
has a Customer
and a Product
, you can load both relationships as follows:
var orders = context.Orders
.Include(o => o.Customer)
.Include(o => o.Product)
.ToList();
Conditional loading with relationships
Of course, we can perform conditional loads by applying Where
after Include
, applying a condition to the relational table. That is, you can do,
var orders = context.Orders
.Include(o => o.Customer)
.Where(o => o.Date.Year == 2023)
.ToList();
Using nested relationships with ThenInclude
In some cases, the relationships between entities are more complex and can involve multiple levels. For loading nested related data, we use the ThenInclude
method.
- Include performs
join
relationships with the first-level table - ThenInclude performs subsequent
join
relationships with the previously loaded tables
Practical example
For example, an Order
may be related to a Customer
, and this Customer
may have an Address
.
Suppose we have an Address
entity related to Customer
:
public class Address
{
public int AddressId { get; set; }
public string Street { get; set; }
public string City { get; set; }
public int CustomerId { get; set; }
public Customer Customer { get; set; }
}
If we want to retrieve all Orders
along with the information of the Customer
and their Addresses
, we can do it as follows:
var orders = context.Orders
.Include(o => o.Customer)
.ThenInclude(c => c.Addresses)
.ToList();
This will generate a SQL query that includes a JOIN
between the Orders
, Customers
, and Addresses
tables.
Performance considerations
Relationships, and how to work with them, have always been one of the biggest creators of potential performance issues when working with databases.
With Entity Framework, just like with any other technology for connecting to a database, we also need to be mindful of this.
Include
performs an eager load of related data. That is, it retrieves the data, and all related data at once.
This avoids the problem known as N+1 queries, which means multiple queries are made to the database to obtain related data (which can significantly impact performance).
However, on the other hand, you can also end up downloading a lot of data. This also results in generating a large and slow SQL query.
In some cases, it may be more efficient to use lazy loading or explicit loading, depending on the scenario.
In general, as a rule of thumb, load everything you need, and only what you need, in the least number of queries possible when working with databases.