The Lapidary Lemur

Musings from Brandon Weaver

Association Aggregates Explained

The last post covered some of the basics of aggregate commands, but left out a section explaining the more perilous aggregates of associations and more advanced querying against them.

Here are a few questions to get you thinking before we start. Given a model Foo which has_many Tags (key, value, foo_id):

  • How do we find the count of tags for every Foo?
  • How do we find a Foo with multiple matching tags? (name: ‘David Tennant’ AND color: ‘Blue’)

Suddenly ActiveRecord becomes very annoyingly complicated to use, but not to fear! We can still use SQL for all of this.

So now our application looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Foo: {a: String, b: String, c: String}
# Tag: {key: String, value: Text, foo_id: Integer}
# Foo has_many Tags and Tag belongs_to a Foo

# Seeds

words   = IO.readlines('/usr/share/dict/words').flat_map { |w| w.chomp.downcase }
records = 100.times.map { |i| {a: words.sample, b: words.sample, c: words.sample} }
foos    = Foo.create(records)

tag_seeds = {
  name:  [
    'William Hartnell',
    'Patrick Troughton',
    'Jon Pertwee',
    'Tom Baker',
    'Peter Davison',
    'Colin Baker',
    'Sylvester McCoy',
    'Paul McGann',
    'Chris Eccleston',
    'David Tennant',
    'Matt Smith',
    'Peter Capaldi'
  ],
  place: %w(Tardis Gallifrey Kasterborous Earth),
  color: %w(Red Blue Yellow Green Black White Orange)
}

foos.each { |foo|
  tag_seeds.each { |key, values|
    foo.tags << Tag.create(key: key, value: values.sample)
  }
  foo.save
}

A Path Lesser Traveled

Normally when you start to look into ActiveRecord Queries, you’re going to see some code that looks something like this:

1
Foo.where(a: 1, b: 2, c: 3)

or perhaps you’ll see the normal escaped queries:

1
Foo.where('a = ? AND b = ? AND c = ?', 1, 2, 3)

…but lurking in the documentation you’ll find another way entirely that’s not so often advertised by guides, allowing us the same power as the string based conditionals with what I would argue as a lot more clear way.

1
Foo.where('a = :a AND b = :b AND c = :c', {a: 1, b: 2, c:3})

So why not just use the first variant like any sane developer, you might wonder. Put simply, because this allows us the full leverage of string conditionals with a lot more clarity. Try and do this with the hash syntax:

1
2
3
4
Foo.where(
  'a LIKE :a OR b LIKE :b AND created_at > :date AND length(:c) > 5',
  {a: 'a%', b: 'b%', c: 5, date: 20.days.ago}
)

Counting associations

Say you want the count of tags on a foo, how would you go about it?

1
Foo.joins(:tags).group('foos.id').count('tags.id')

Now the conundrum here is why do we need to use group? Let’s take a look at the generated SQL:

1
2
Foo.joins(:tags).group('foos.id').count('tags.id')
   (0.5ms)  SELECT COUNT(tags.id) AS count_tags_id, foos.id AS foos_id FROM "foos" INNER JOIN "tags" ON "tags"."foo_id" = "foos"."id" GROUP BY foos.id

In order to run a count on an association, we need to aggregate the records into groups that we’ll run the count against. Handy thing is, this allows us to do a few more… interesting things:

1
2
3
[35] pry(main)> Foo.joins(:tags).group('tags.key').count('tags.id')
   (0.6ms)  SELECT COUNT(tags.id) AS count_tags_id, tags.key AS tags_key FROM "foos" INNER JOIN "tags" ON "tags"."foo_id" = "foos"."id" GROUP BY tags.key
=> {"color"=>100, "name"=>100, "place"=>100}

You can run aggregates for different groups, not just the supposedly common case, hence why AR wants you to specify it. In the first case, we’re simply telling it to aggregate the tags based on the id of their parent.

Finding multiple matching tags

I’m going to put a disclaimer here and say that trying to play for Single Table Inheritance hacks like this will cause you a lot more harm than good. Now if your data model is not so friendly and forces you into this, it’s something worth remembering.

Aliased Inner Joins

SQL has a concept for this, but AR currently does not give us this power. Thankfully we have access to find_by_sql:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
search_tags = [
  {key: 'name', value: 'David Tennant'},
  {key: 'color', value: 'Blue'}
]

aliased_tags = search_tags.map.with_index { |tag, i| ["t#{i}", tag] }.to_h

sql_data = aliased_tags.reduce({
  sql: '', data: [], where: {}
}) { |state, (i, tag)|
  state[:sql]  << " INNER JOIN tags AS #{i} ON #{i}.key = ? "
  state[:data] << tag[:key]

  state[:where].merge!(i => {"#{i}_value" => tag[:value]})

  state
}

where_clause = sql_data[:where].reduce({
  sql_fragments: [], data: {}
}) { |state, (i, tag)|
  state[:sql_fragments] << "#{i}.value = :#{tag.keys.first}"
  state[:data].merge!(tag.symbolize_keys)
  state
}

where_sql = Foo.where(
  where_clause[:sql_fragments].join(' AND '),
  where_clause[:data]
).to_sql

select_sql, new_where_sql = where_sql.split('WHERE')
final_sql = select_sql + ' ' + sql_data[:sql] + " WHERE " + where_sql

Foo.find_by_sql([final_sql, *sql_data[:data]])

# The final SQL looks something like this:
SELECT "foos".* FROM "foos"
  INNER JOIN tags AS t0 ON t0.key = 'name'
  INNER JOIN tags AS t1 ON t1.key = 'color'
  WHERE (t0.value = 'David Tennant' AND t1.value = 'Blue')

Now note that this is most certainly not the best way to go about this, suggestions are quite welcome as to better ways to deal with this one.

What we’re doing here is in essence creating an on-the-fly single table inheritance to query against.

Admittedly a better way to do this currently eludes me, and I would recommend against using this on your own solutions.

One can use subqueries to circumvent this type of issue, but the solution will be similar if not slower.

Finishing up

As you can see, this design of ours quickly devolves into madness when querying against at the end of the article. In the next sections, I’ll be covering methods of database design to avoid these issues as much as possible.

Comments