What Is ORM

ORM stands for Object-Relational Mapping and it can be simply described as mapping of classes (objects) to database tables. The concept is almost as old as object-oriented programming itself. It's hard to find which library was exactly first, but some suggest that it was TOPLink for Smalltalk. Today we have dozens of different implementations for popular languages.

The idea of the ORM is on the surface. Directly querying databases, constructing objects using factories or some other patterns, writing insert, update and delete method for each real-world entity seems very ineffective and redundant. We start seeing repeating patterns that we want to refactor into reusable pieces of code. Another consideration here is that a lot of plain SQL inside code looks untidy. That can be solved by query builders, which have their own disadvantages and could be the topic of another article.

A Great Variety of Different Species

There are many popular implementations for every widely used modern language: Hibernate for Java, SQLAlchemy for Python, ActiveRecord for Ruby and so on. Even popular web frameworks have their own ORM implementations. For example, Eloquent in Laravel or Python's Django. Many implementations refer to ActiveRecord pattern. My favorite one is Peewee for Python, easy to use and intuitive to learn.

What ORMs Are Good for?

  • DRY. As I already said above, ORM removes repeating code. This is especially a case when you have lots of business entities. Anyway, this advantage is disputable.
  • Security. Not passing anything directly to query execution code greatly reduces the chances of messing with SQL injections.
  • Portability. When advocating ORM approach, people often mention possibility of quickly switching to another RDBMS. For me, it is a mere ephemeral advantage than a real one. How often do you want to switch a database engine in your project? If so, I'm pretty skeptical that you don't have to rewrite a lot of low-level SQL and other related code that ORM's are not taking care of. But yes, ORM will definitely save time in that case.
  • Testability. This together with a next point are probably the two main reasons for me to use ORMs. With ORM I can quickly create a database for a test run with just a few lines of code. What is also convenient, I can recreate database structure in exact form that was in any particular commit as all ORM definitions reside in code. This is a common pattern that I use when testing code with the state: build up a database in memory and populate it with test data using fixtures.
  • Deployment. Pushing the app that can roll out its own database schema from its own code is a modern approach when everything including infrastructure should be a code.

Why Not to Use ORM

It's an ongoing discussion about whether it worth using ORM or not. Somebody thinks it's an anti-pattern, someone considers it a good reason to learn SQL.

Probably the main reason why I wanted to write this article was a number of edge cases I'd encountered when decided to use ORM in my project. They all were not critical but made the pleasure of using relations mapping less appealing.

First, any ORM adds cognitive and technological overhead. It may or may not be an issue, depending on the experience of team members. But I bet that considering a number of different ORMs and respectively different approaches, you'll have to invest some time in learning the lib. ORM code is hard to read. For example, consider the following example of rewriting SQL into ORM code from Peewee documentation:

WITH regional_sales AS (
    SELECT region, SUM(amount) AS total_sales
    FROM orders
    GROUP BY region
  ), top_regions AS (
    SELECT region
    FROM regional_sales
    WHERE total_sales > (SELECT SUM(total_sales) / 10 FROM regional_sales)
  )
SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

And this is a ORM version of the query above:

reg_sales = (Order
             .select(Order.region,
                     fn.SUM(Order.amount).alias('total_sales'))
             .group_by(Order.region)
             .cte('regional_sales'))

top_regions = (reg_sales
               .select(reg_sales.c.region)
               .where(reg_sales.c.total_sales > (
                   reg_sales.select(fn.SUM(reg_sales.c.total_sales) / 10)))
               .cte('top_regions'))

query = (Order
         .select(Order.region,
                 Order.product,
                 fn.SUM(Order.quantity).alias('product_units'),
                 fn.SUM(Order.amount).alias('product_sales'))
         .where(Order.region.in_(top_regions.select(top_regions.c.region)))
         .group_by(Order.region, Order.product)
         .with_cte(regional_sales, top_regions))

Isn't it less readable, even if you are familar with python and peewee?

This is where vanilla SQL has obvious advantage, even if it comes with vendor specifics.

Second, the more advanced relationships you want to use, the more a probability that you end up writing native SQL. ORMs are unable to handle complex relationships and edge cases (for example, variations of N+1 problem, even some ORMs are providing a solution to a subset of it).

Third, ORMs can produce ineffective and redundant SQL. If you want to debug something and use query log for that, programmatically generated SQL will be much more difficult to read and analyze.

Conclusion

The point that saved ORMs for me was that you can always step down to a lower level and write plain SQL. You are not restricted to choice ORM over SQL or reverse. My current approach is to use ORM for applications with lots of simple entities, that would produce many simple CRUD queries. In reverse, analytical applications require more complex and fine-grained queries that's much faster to write in plain SQL.