clarify reduce-over-plan requires init-value

2023-12-11 09:25:11 -08:00 · 2023-12-11 09:25:11 -08:00 · bc92cc027d
commit bc92cc027d
parent 6de1175bd8
6 changed files with 15 additions and 8 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -6,6 +6,7 @@ Only accretive/fixative changes will be made from now on.
  * Address [#267](https://github.com/seancorfield/next-jdbc/issues/267) by adding the `:schema-opts` option to override the default conventions for identifying foreign keys in columns.
  * Address [#264](https://github.com/seancorfield/next-jdbc/issues/264) by letting `insert-multi!` accept empty rows (and producing an empty result vector). This improves compatibility with `clojure.javaj.jdbc`.
  * Address [#258](https://github.com/seancorfield/next-jdbc/issues/258) by updating all the library (driver) versions in Getting Started to match the latest versions being tested (from `deps.edn`).
  * Attempt to clarify that when calling `reduce` on the result of `plan`, you must provide an initial value.
  * Expand examples for calling `next.jdbc.sql/find-by-keys` to show `LIKE` and `IN` clauses.
  * Update `tools.build` to 0.9.6 (and get rid of `template/pom.xml` in favor of new `:pom-data` option to `b/write-pom`).
--- a/README.md
+++ b/README.md
@ -49,7 +49,7 @@ The primary concepts behind `next.jdbc` are that you start by producing a `javax
 From a `DataSource`, either you or `next.jdbc` can create a `java.sql.Connection` via the `get-connection` function. You can specify an options hash map to `get-connection` to modify the connection that is created: `:read-only`, `:auto-commit`.
 The primary SQL execution API in `next.jdbc` is:
-* `plan` -- yields an `IReduceInit` that, when reduced, executes the SQL statement and then reduces over the `ResultSet` with as little overhead as possible.
+* `plan` -- yields an `IReduceInit` that, when reduced with an initial value, executes the SQL statement and then reduces over the `ResultSet` with as little overhead as possible.
 * `execute!` -- executes the SQL statement and produces a vector of realized hash maps, that use qualified keywords for the column names, of the form `:<table>/<column>`. If you join across multiple tables, the qualified keywords will reflect the originating tables for each of the columns. If the SQL produces named values that do not come from an associated table, a simple, unqualified keyword will be used. The realized hash maps returned by `execute!` are `Datafiable` and thus `Navigable` (see Clojure 1.10's `datafy` and `nav` functions, and tools like [Portal](https://github.com/djblue/portal), [Reveal](https://github.com/vlaaad/reveal), and Cognitect's REBL). Alternatively, you can specify `{:builder-fn rs/as-arrays}` and produce a vector with column names followed by vectors of row values. `rs/as-maps` is the default for `:builder-fn` but there are also `rs/as-unqualified-maps` and `rs/as-unqualified-arrays` if you want unqualified `:<column>` column names (and there are also lower-case variants of all of these).
 * `execute-one!` -- executes the SQL or DDL statement and produces a single realized hash map. The realized hash map returned by `execute-one!` is `Datafiable` and thus `Navigable`.
--- a/doc/getting-started.md
+++ b/doc/getting-started.md
@ -198,7 +198,8 @@ user=> (reduce
 14.67M
 ```
-The call to `jdbc/plan` returns an `IReduceInit` object but does not actually run the SQL. Only when the returned object is reduced is the connection obtained from the data source, the SQL executed, and the computation performed. The connection is closed automatically when the reduction is complete. The `row` in the reduction is an abstraction over the underlying (mutable) `ResultSet` object -- it is not a Clojure data structure. Because of that, you can simply access the columns via their SQL labels as shown -- you do not need to use the column-qualified name, and you do not need to worry about the database returning uppercase column names (SQL labels are not case sensitive).
+The call to `jdbc/plan` returns an `IReduceInit` object (a "reducible collection" that requires an initial value) but does not actually run the SQL.
 Only when the returned object is reduced is the connection obtained from the data source, the SQL executed, and the computation performed. The connection is closed automatically when the reduction is complete. The `row` in the reduction is an abstraction over the underlying (mutable) `ResultSet` object -- it is not a Clojure data structure. Because of that, you can simply access the columns via their SQL labels as shown -- you do not need to use the column-qualified name, and you do not need to worry about the database returning uppercase column names (SQL labels are not case sensitive).
 > Note: if you want a column name transformation to be applied here, specify `:column-fn` as an option to the `plan` call.
@ -311,6 +312,8 @@ As of 1.1.588, two helper functions are available to make some `plan` operations
 * `next.jdbc.plan/select-one!` -- reduces over `plan` and returns part of just the first row,
 * `next.jdbc.plan/select!` -- reduces over `plan` and returns a sequence of parts of each row.
 > Note: in both those cases, an appropriate initial value is supplied to the `reduce` (since `plan` returns an `IReduceInit` object).
 `select!` accepts a vector of column names to extract or a function to apply to each row. It is equivalent to the following:
 ```clojure
--- a/doc/tips-and-tricks.md
+++ b/doc/tips-and-tricks.md
@ -124,7 +124,7 @@ you can use `run!` instead of `reduce`:
 ```
 `run!` is based on `reduce` and `process-row` here takes just one argument --
-the row -- rather than the usual reducing function that takes two
+the row -- rather than the usual reducing function that takes two.
 The result of `plan` is also foldable in the [clojure.core.reducers](https://clojure.org/reference/reducers) sense. While you could use `execute!` to produce a vector of fully-realized rows as hash maps and then fold that vector (Clojure's vectors support fork-join parallel reduce-combine), that wouldn't be possible for very large result sets. If you fold the result of `plan`, the result set will be partitioned and processed using fork-join parallel reduce-combine. Unlike reducing over `plan`, each row **is** realized into a Clojure data structure and each batch is forked for reduction as soon as that many rows have been realized. By default, `fold`'s batch size is 512 but you can specify a different value in the 4-arity call. Once the entire result set has been read, the last (partial) batch is forked for reduction. The combining operations are forked and interleaved with the reducing operations, so the order (of forked tasks) is batch-1, batch-2, combine-1-2, batch-3, combine-1&2-3, batch-4, combine-1&2&3-4, etc. The amount of parallelization you get will depend on many factors including the number of processors, the speed of your reducing function, the speed of your combining function, and the speed with which result sets can actually be streamed from your database.
--- a/src/next/jdbc.clj
+++ b/src/next/jdbc.clj
@ -14,8 +14,8 @@
  * `get-connection` -- given a connectable, obtain a new `java.sql.Connection`
      from it and return that,
  * `plan` -- given a connectable and SQL + parameters or a statement,
-      return a reducible that, when reduced will execute the SQL and consume
+      return a reducible that, when reduced (with an initial value) will
-      the `ResultSet` produced,
+      execute the SQL and consume the `ResultSet` produced,
  * `execute!` -- given a connectable and SQL + parameters or a statement,
      execute the SQL, consume the `ResultSet` produced, and return a vector
      of hash maps representing the rows (@1); this can be datafied to allow
@ -199,7 +199,10 @@
 (defn plan
  "General SQL execution function (for working with result sets).
-  Returns a reducible that, when reduced, runs the SQL and yields the result.
+  Returns a reducible that, when reduced (with an initial value), runs the
  SQL and yields the result. `plan` returns an `IReduceInit` object so you
  must provide an initial value when calling `reduce` on it.
  The reducible is also foldable (in the `clojure.core.reducers` sense) but
  see the **Tips & Tricks** section of the documentation for some important
  caveats about that.
--- a/src/next/jdbc/protocols.clj
+++ b/src/next/jdbc/protocols.clj
@ -38,8 +38,8 @@
  `PreparedStatement`, and `Object`, on the assumption that an `Object` can be
  turned into a `DataSource` and therefore used to get a `Connection`."
  (-execute ^clojure.lang.IReduceInit [this sql-params opts]
-    "Produce a 'reducible' that, when reduced, executes the SQL and
+    "Produce a 'reducible' that, when reduced (with an initial value), executes
-    processes the rows of the `ResultSet` directly.")
+    the SQL and processes the rows of the `ResultSet` directly.")
  (-execute-one [this sql-params opts]
    "Executes the SQL or DDL and produces the first row of the `ResultSet`
    as a fully-realized, datafiable hash map (by default).")