Document folding over plan #125

2020-06-26 21:31:43 -07:00 · 2020-06-26 21:31:43 -07:00 · 68d8f98d26
commit 68d8f98d26
parent efa37ad84f
2 changed files with 13 additions and 0 deletions
--- a/doc/tips-and-tricks.md
+++ b/doc/tips-and-tricks.md
@ -2,6 +2,16 @@
 This page contains various tips and tricks that make it easier to use `next.jdbc` with a variety of databases. It is mostly organized by database, but there are a few that are cross-database and those are listed first.
 ## Reducing and Folding with `plan`
 Most of this documentation describes using `plan` specifically for reducing and notes that you can avoid the overhead of realizing rows from the `ResultSet` into Clojure data structures if your reducing function uses only functions that get column values by name. If you perform any function on the row that would require an actual hash map or a sequence, the row will be realized into a full Clojure hash map via the builder function passed in the options (or via `next.jdbc.result-set/as-maps` by default).
 One of the benefits of reducing over `plan` is that you can stream very large result sets, very efficiently, without having the entire result set in memory (assuming your reducing function doesn't build a data structure that is too large!). See the tips below on **Streaming Result Sets**.
 The result of `plan` is also foldable in the [clojure.core.reducers](https://clojure.org/reference/reducers) sense. While you could use `execute!` to produce a vector of fully-realized rows as hash maps and then fold that vector (Clojure's vectors support fork-join parallel reduce-combine), that wouldn't be possible for very large result sets. If you fold the result of `plan`, the result set will be partitioned and processed using fork-join parallel reduce-combine. Unlike reducing over `plan`, each row **is** realized into a Clojure data structure and each batch is forked for reduction as soon as that many rows have been realized. By default, `fold`'s batch size is 512 but you can specify a different value in the 4-arity call. Once the entire result set has been read, the last (partial) batch is forked for reduction and then all of the reduced batches are combined.
 There is no back pressure here so if your reducing function is slow, you may end up with more of the realized result set in memory than your system can cope with. There is also currently no attempt to combine the reduced batches until the entire result set has been processed which may also add to this issue.
 ## CLOB & BLOB SQL Types
 Columns declared with the `CLOB` or `BLOB` SQL types are typically rendered into Clojure result sets as database-specific custom types but they should implement `java.sql.Clob` or `java.sql.Blob` (as appropriate). In general, you can only read the data out of those Java objects during the current transaction, which effectively means that you need to do it either inside the reduction (for `plan`) or inside the result set builder (for `execute!` or `execute-one!`). If you always treat these types the same way for all columns across the whole of your application, you could simply extend `next.jdbc.result-set/ReadableColumn` to `java.sql.Clob` (and/or `java.sql.Blob`). Here's an example for reading `CLOB` into a `String`:
--- a/src/next/jdbc.clj
+++ b/src/next/jdbc.clj
@ -176,6 +176,9 @@
  "General SQL execution function (for working with result sets).
  Returns a reducible that, when reduced, runs the SQL and yields the result.
  The reducible is also foldable (in the `clojure.core.reducers` sense) but
  see the **Tips & Tricks** section of the documentation for some important
  caveats about that.
  Can be called on a `PreparedStatement`, a `Connection`, or something that can
  produce a `Connection` via a `DataSource`.