PostgreSQL & JDBC: How to select rows matching pairs of values

December 15, 2023

Given a sequence of pairs of (id, version) in code, how do you efficiently select all records in your DB that match? You can construct a long string with SELECT … WHERE (id = 'a' AND version = 1) OR (id = 'b' AND version = 20) OR … but that’s not very efficient. If you only had a single value then, in PostgreSQL, you could use id=ANY(?) and pass in a char array. But what if you have multiple conditions/columns that must match? Unnest to the rescue!

Here is the complete solution:

(let [id+ver-pairs [["a",1], ["b",20], ["c",1]]
      ids (map first id+ver-pairs)
      versions (map last id+ver-pairs)]
   (next.jdbc/execute! conn
      ["select * from thing t join unnest(?,?) as x(id,version)
                              on (t.id = x.id and t.version = x.version)"
        (into-array ids) (into-array versions)]))

Bonus: Producing the SQL with HoneySQL:

(honey/format
  (-> (hh/select :*)
      (hh/from [:thing :t])
      (hh/join [[:unnest
                 (into-array String ["a" "b" "c"]) ; (1)
                 (into-array [1 20 1])]
                [[:'x :id :version]]] ; (2)
               [:and [:= :t.id :x.id]
                [:= :t.version :x.version]]))
  {:quoted true})

1 Notice that you may need to provide explicit type argument to into-array so that you don’t end up with an Object array. The Postgres JDBC driver can auto-convert some Java arrays to binary representations that work with PreparedStatement.setObject, but an object array is not one of them.

2 The quote in :'x ensures that the name stays as-is, i.e. x(id,version) - otherwise it would be upper-cased (even though it doesn’t really matter, since in effect the name would be unquoted and thus case-insensitive, as identifiers are by default). See Usage > Functions: “If a keyword begins with ', the function name is formatted as a SQL entity rather than being converted to uppercase and having hyphens - converted to spaces”.

Isak mentioned that he has been using json_to_recordset with an array of objects, as in join json_to_recordset(?::json) as x(id int, version int) …, though it is possible my way is more efficient.

Bonus: Leverage composites for tuple matches with a subquery

A tip from my awesome colleague Fredrik Røtnes. In my case, the ids and versions come from the client app. But what if they come elsewhere from the DB? Here we can leverage composites:

SELECT * FROM thing WHERE ((id, version)) IN (SELECT DISTINCT other_id, other_version FROM other_table)

and in Honey:

(honey/format
    (-> (hh/select :*)
        (hh/from :thing)
        (hh/where
          [:in
           (hh/columns :id :version)
           (-> (hh/select-distinct :other-id :other-version)
               (hh/from :other-table))])))

Bonus 2: Use a multi-dimensional input with `IN …` with unnest and composites

We can combine the previous tip with unnest for multi-dimensional array input:

;; Return Honey where clause, to include in `(hh/where ...)`:
(let [id+ver-pairs [["a",1], ["b",20], ["c",1]]]
  [:in
   (hh/columns :id :version)
   (-> (hh/select :input.id :input.version)
       (hh/from [[:unnest
                  (into-array String (map first id+ver-pairs))
                  (long-array (map second id+ver-pairs))]
                 [[:'input :id :version]]]))])

Tags: database performance

PostgreSQL & JDBC: How to select rows matching pairs of values

Bonus: Leverage composites for tuple matches with a subquery

Bonus 2: Use a multi-dimensional input with IN …​ with unnest and composites

Bonus 2: Use a multi-dimensional input with `IN …` with unnest and composites