Seriously, not only humans can shout at you — sometimes your code does too!
When? You may ask.
The Problem Link to heading
It happens when you do something so bad in code that even the logs beg you to look at them. When I first saw the log firstResult/maxResults specified with collection fetch; applying in memory!, I ignored it — many times, I must admit. Now I feel bad about it. But what is this log even about?
It happens in the part of the code responsible for fetching a collection from the database, where entities are linked through a One-to-Many relationship.
Look at the following example:
import jakarta.persistence.*
@Entity
class Author(
@Id
@GeneratedValue
val id: Long? = null,
val name: String,
@OneToMany(mappedBy = "author", fetch = FetchType.LAZY, cascade = [CascadeType.ALL])
val books: MutableList<Book> = mutableListOf()
)
@Entity
class Book(
@Id
@GeneratedValue
val id: Long? = null,
val title: String,
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "author_id")
val author: Author
)
Related repo:
import org.springframework.data.domain.Pageable
import org.springframework.data.jpa.repository.JpaRepository
import org.springframework.data.jpa.repository.Query
import org.springframework.stereotype.Repository
@Repository
interface AuthorRepository : JpaRepository<Author, Long> {
@Query("select a from Author a left join fetch a.books order by a.id")
fun findPageWithBooks(pageable: Pageable): List<Author>
}
Query:
authorRepository.findPageWithBooks(PageRequest.of(0, 20))
Executing query above will cause firstResult/maxResults specified with collection fetch; applying in memory!.
Why it happens? Link to heading
JOIN FETCH a.books creates a Cartesian product (one author × many books → many rows). Therefore, Hibernate cannot properly limit the result to 20 authors, because 20 rows != 20 authors. That’s why it loads everything into memory and post-processes the results afterwards.
How to fix it? Link to heading
- Firstly let’s refactor our code that has one query method into two ones. It allows us to first fetch only the IDs of the authors.
- Later, we would fetch the rest part we need - authors with books only for IDs loaded from the first step.
Fixed repo:
import org.springframework.data.jpa.repository.EntityGraph
import org.springframework.data.jpa.repository.JpaRepository
import org.springframework.data.jpa.repository.Query
import org.springframework.data.jpa.repository.QueryHints
import org.springframework.data.repository.query.Param
import jakarta.persistence.QueryHint
interface AuthorRepository : JpaRepository<Author, Long> {
@Query("select a.id from Author a order by a.id")
fun findPageOfIds(pageable: Pageable): Page<Long>
@Query(
"""
select a
from Author a
left join fetch a.books
where a.id in :ids
order by a.id
"""
)
fun fetchByIdsWithBooks(@Param("ids") ids: List<Long>): List<Author>
}
Now service needs to do glueling in order to provide complete result:
import org.springframework.data.domain.PageRequest
import org.springframework.stereotype.Service
@Service
class AuthorService(private val repo: AuthorRepository) {
fun findPageWithBooks(page: Int, size: Int): List<Author> {
val ids = repo.findPageOfIds(PageRequest.of(page, size)).content
if (ids.isEmpty()) {
return emptyList()
}
return repo.fetchByIdsWithBooks(ids)
}
}
It works, because pagination happens on simple Author table and later fetching of related collection is constrained to small ID list. Since Hibernate 6, you do not longer need to hint Hibernate via @QueryHint annotation that it should not pass DISTINCT keyword to SQL due to reasons described here: https://github.com/hibernate/hibernate-orm/blob/6.0/migration-guide.adoc#distinct