2015-09-03 10:39:22 +00:00
# xforms
2016-12-16 15:28:43 +00:00
More transducers and reducing functions for Clojure(script)!
2015-09-03 10:39:22 +00:00
2015-09-07 13:08:44 +00:00
[](https://travis-ci.org/cgrand/xforms)
2017-10-05 11:23:40 +00:00
*Transducers* can be classified in three groups: regular ones, higher-order ones
2018-01-25 11:13:04 +00:00
(which accept other transducers as arguments) and aggregators (transducers which emit only 1 item out no matter how many went in).
2018-01-25 09:41:36 +00:00
Aggregators generally only make sense in the context of a higher-order transducer.
2016-12-03 14:05:20 +00:00
2017-10-05 11:23:40 +00:00
In `net.cgrand.xforms` :
2018-01-25 09:41:36 +00:00
* regular ones: `partition` (1 arg), `reductions` , `for` , `take-last` , `drop-last` , `sort` , `sort-by` , `wrap` , `window` and `window-by-time`
2018-11-14 13:05:40 +00:00
* higher-order ones: `by-key` , `into-by-key` , `multiplex` , `transjuxt` , `partition` (2+ args), `time`
2018-01-25 09:41:36 +00:00
* aggregators: `reduce` , `into` , `without` , `transjuxt` , `last` , `count` , `avg` , `sd` , `min` , `minimum` , `max` , `maximum` , `str`
2017-10-05 11:23:40 +00:00
In `net.cgrand.xforms.io` :
2018-04-16 21:07:18 +00:00
* `sh` to use any process as a reducible collection (of stdout lines) or as a transducers (input as stdin lines, stdout lines as output).
2017-10-05 11:23:40 +00:00
2016-12-03 14:05:20 +00:00
2017-10-04 13:46:25 +00:00
*Reducing functions*
2015-09-16 11:32:47 +00:00
2017-10-04 13:46:25 +00:00
* in `net.cgrand.xforms.rfs` : `min` , `minimum` , `max` , `maximum` , `str` , `str!` , `avg` , `sd` , `last` and `some` .
* in `net.cgrand.xforms.io` : `line-out` and `edn-out` .
2017-10-05 11:23:40 +00:00
(in `net.cgrand.xforms` )
2017-10-05 08:31:53 +00:00
*Transducing contexts*:
2018-01-25 09:41:36 +00:00
* in `net.cgrand.xforms` : `transjuxt` (for performing several transductions in a single pass), `iterator` (clojure only), `into` , `without` , `count` , `str` (2 args) and `some` .
2017-10-05 08:31:53 +00:00
* in `net.cgrand.xforms.io` : `line-out` (3+ args) and `edn-out` (3+ args).
2017-11-16 17:53:46 +00:00
* in `net.cgrand.xforms.nodejs.stream` : `transformer` .
2017-10-04 13:46:25 +00:00
*Reducible views* (in `net.cgrand.xforms.io` ): `lines-in` and `edn-in` .
2015-09-04 09:20:35 +00:00
2017-11-16 15:58:34 +00:00
**Note:** it should always be safe to update to the latest xforms version; short of bugfixes, breaking changes are avoided.
2022-08-01 13:20:25 +00:00
## Add as a dependency
2022-08-05 19:42:55 +00:00
For specific coordinates see the [Releases ](https://github.com/cgrand/xforms/releases ) page.
2015-09-03 10:59:23 +00:00
2022-08-01 13:20:25 +00:00
## Usage
2015-09-03 10:39:22 +00:00
```clj
=> (require '[net.cgrand.xforms :as x])
```
`str` and `str!` are two reducing functions to build Strings and StringBuilders in linear time.
```clj
=> (quick-bench (reduce str (range 256)))
Execution time mean : 58,714946 µs
2016-11-03 14:34:18 +00:00
=> (quick-bench (reduce rf/str (range 256)))
2015-09-03 10:39:22 +00:00
Execution time mean : 11,609631 µs
```
2015-09-04 11:43:29 +00:00
`for` is the transducing cousin of `clojure.core/for` :
```clj
=> (quick-bench (reduce + (for [i (range 128) j (range i)] (* i j))))
Execution time mean : 514,932029 µs
=> (quick-bench (transduce (x/for [i % j (range i)] (* i j)) + 0 (range 128)))
Execution time mean : 373,814060 µs
```
2016-12-03 16:14:04 +00:00
You can also use `for` like `clojure.core/for` : `(x/for [i (range 128) j (range i)] (* i j))` expands to `(eduction (x/for [i % j (range i)] (* i j)) (range 128))` .
2015-09-03 10:39:22 +00:00
`by-key` and `reduce` are two new transducers. Here is an example usage:
```clj
;; reimplementing group-by
(defn my-group-by [kfn coll]
2015-09-03 10:42:12 +00:00
(into {} (x/by-key kfn (x/reduce conj)) coll))
2015-09-03 10:39:22 +00:00
;; let's go transient!
(defn my-group-by [kfn coll]
2015-09-04 09:20:35 +00:00
(into {} (x/by-key kfn (x/into [])) coll))
2015-09-03 10:39:22 +00:00
=> (quick-bench (group-by odd? (range 256)))
Execution time mean : 29,356531 µs
=> (quick-bench (my-group-by odd? (range 256)))
Execution time mean : 20,604297 µs
```
2016-10-10 15:11:45 +00:00
Like `by-key` , `partition` also takes a transducer as last argument to allow further computation on the partition.
2015-09-04 09:20:35 +00:00
```clj
=> (sequence (x/partition 4 (x/reduce +)) (range 16))
(6 22 38 54)
```
2016-10-10 15:11:45 +00:00
Padding is achieved as usual:
2015-09-04 09:20:35 +00:00
```clj
2016-10-10 15:11:45 +00:00
=> (sequence (x/partition 4 4 (repeat :pad) (x/into [])) (range 9))
2015-09-04 09:20:35 +00:00
([0 1 2 3] [4 5 6 7] [8 :pad :pad :pad])
```
2016-11-03 14:34:18 +00:00
`avg` is a transducer to compute the arithmetic mean. `transjuxt` is used to perform several transductions at once.
2015-09-04 09:20:35 +00:00
2015-09-03 12:25:19 +00:00
```clj
2016-11-03 14:34:18 +00:00
=> (into {} (x/by-key odd? (x/transjuxt [(x/reduce +) x/avg])) (range 256))
2015-09-03 18:51:37 +00:00
{false [16256 127], true [16384 128]}
2016-11-03 14:34:18 +00:00
=> (into {} (x/by-key odd? (x/transjuxt {:sum (x/reduce +) :mean x/avg :count x/count})) (range 256))
2015-09-04 09:20:35 +00:00
{false {:sum 16256, :mean 127, :count 128}, true {:sum 16384, :mean 128, :count 128}}
2015-09-03 12:25:19 +00:00
```
2015-09-03 10:39:22 +00:00
2015-09-07 14:18:17 +00:00
`window` is a new transducer to efficiently compute a windowed accumulator:
```clj
;; sum of last 3 items
=> (sequence (x/window 3 + -) (range 16))
(0 1 3 6 9 12 15 18 21 24 27 30 33 36 39 42)
=> (def nums (repeatedly 8 #(rand-int 42)))
#'user/nums
=> nums
(11 8 32 26 6 10 37 24)
;; avg of last 4 items
=> (sequence
2019-02-27 15:29:18 +00:00
(x/window 4 rf/avg #(rf/avg %1 %2 -1))
2015-09-07 14:18:17 +00:00
nums)
2016-06-03 08:49:27 +00:00
(11 19/2 17 77/4 18 37/2 79/4 77/4)
2015-09-07 14:18:17 +00:00
;; min of last 3 items
=> (sequence
2019-02-28 14:30:20 +00:00
(x/window 3
(fn
([] (sorted-map))
([m] (key (first m)))
([m x] (update m x (fnil inc 0))))
(fn [m x]
(let [n (dec (m x))]
(if (zero? n)
(dissoc m x)
(assoc m x (dec n))))))
nums)
2015-09-07 14:18:17 +00:00
(11 8 8 8 6 6 6 10)
```
2015-09-04 12:21:14 +00:00
## On Partitioning
Both `by-key` and `partition` takes a transducer as parameter. This transducer is used to further process each partition.
It's worth noting that all transformed outputs are subsequently interleaved. See:
```clj
=> (sequence (x/partition 2 1 identity) (range 8))
2017-01-26 12:50:10 +00:00
(0 1 1 2 2 3 3 4 4 5 5 6 6 7)
2015-09-04 12:21:14 +00:00
=> (sequence (x/by-key odd? identity) (range 8))
([false 0] [true 1] [false 2] [true 3] [false 4] [true 5] [false 6] [true 7])
```
2018-01-25 09:41:36 +00:00
That's why most of the time the last stage of the sub-transducer will be an aggregator like `x/reduce` or `x/into` :
2015-09-04 12:21:14 +00:00
```clj
=> (sequence (x/partition 2 1 (x/into [])) (range 8))
2017-01-26 12:50:10 +00:00
([0 1] [1 2] [2 3] [3 4] [4 5] [5 6] [6 7])
2015-09-04 12:21:14 +00:00
=> (sequence (x/by-key odd? (x/into [])) (range 8))
([false [0 2 4 6]] [true [1 3 5 7]])
```
2015-09-07 13:25:08 +00:00
## Simple examples
`(group-by kf coll)` is `(into {} (x/by-key kf (x/into []) coll))` .
`(plumbing/map-vals f m)` is `(into {} (x/by-key (map f)) m)` .
My faithful `(reduce-by kf f init coll)` is now `(into {} (x/by-key kf (x/reduce f init)))` .
2016-09-19 12:36:26 +00:00
`(frequencies coll)` is `(into {} (x/by-key identity x/count) coll)` .
2015-09-07 13:25:08 +00:00
2016-06-01 08:37:32 +00:00
## On key-value pairs
Clojure `reduce-kv` is able to reduce key value pairs without allocating vectors or map entries: the key and value
are passed as second and third arguments of the reducing function.
Xforms allows a reducing function to advertise its support for key value pairs (3-arg arity) by implementing the `KvRfable` protocol (in practice using the `kvrf` macro).
2016-06-01 08:41:35 +00:00
Several xforms transducers and transducing contexts leverage `reduce-kv` and `kvrf` . When these functions are used together, pairs can be transformed without being allocated.
2016-06-01 08:37:32 +00:00
< table >
< thead >
< tr > < th > fn< th > kvs in?< th > kvs out?
< / thead >
< tbody >
< tr >< td > `for`< td > when first binding is a pair< td > when `body-expr` is a pair
< tr >< td > `reduce`< td > when is `f` is a kvrf< td > no
< tr >< td > 1-arg `into` < br > (transducer)< td > when `to` is a map< td > no
< tr >< td > 3-arg `into` < br > (transducing context)< td > when `from` is a map< td > when `to` is a map
< tr >< td > `by-key`< br > (as a transducer)< td > when is `kfn` and `vfn` are unspecified or `nil` < td > when `pair` is `vector` or unspecified
< tr > < td > `by-key`< br > (as a transducing context on values)< td > no< td > no
< / tbody >
< table >
```clj
;; plain old sequences
=> (let [m (zipmap (range 1e5) (range 1e5))]
(crit/quick-bench
(into {}
(for [[k v] m]
[k (inc v)]))))
Evaluation count : 12 in 6 samples of 2 calls.
Execution time mean : 55,150081 ms
Execution time std-deviation : 1,397185 ms
;; x/for but pairs are allocated (because of into)
=> (let [m (zipmap (range 1e5) (range 1e5))]
(crit/quick-bench
(into {}
(x/for [[k v] _]
[k (inc v)])
m)))
Evaluation count : 18 in 6 samples of 3 calls.
Execution time mean : 39,119387 ms
Execution time std-deviation : 1,456902 ms
;; x/for but no pairs are allocated (thanks to x/into)
=> (let [m (zipmap (range 1e5) (range 1e5))]
(crit/quick-bench (x/into {}
(x/for [[k v] %]
[k (inc v)])
m)))
Evaluation count : 24 in 6 samples of 4 calls.
Execution time mean : 24,276790 ms
Execution time std-deviation : 364,932996 µs
```
2016-12-02 22:27:59 +00:00
## Changelog
2016-12-08 17:20:35 +00:00
2022-08-01 13:20:25 +00:00
### 0.19.3
2022-08-01 12:46:56 +00:00
* Add `deps.edn` to enable usage as a [git library ](https://clojure.org/guides/deps_and_cli#_using_git_libraries )
* Bump `macrovich` to make Clojure and ClojureScript provided dependencies #34
* Fix reflection warnings in `xforms.io` #35 #36
* Add compatibility with [babashka ](https://github.com/babashka/babashka ) #42
* Fix `x/destructuring-pair?` #44 #45
* Fix `x/into` performance hit with small maps #46 #47
* Fix reflection and shadowing warnings in tests
### 0.19.2
* Fix infinity symbol causing issues with ClojureScript #31
2018-11-14 13:05:40 +00:00
### 0.19.0
`time` allows to measure time spent in one transducer (excluding time spent downstream).
```clj
=> (time ; good old Clojure time
(count (into [] (comp
(x/time "mapinc" (map inc))
(x/time "filterodd" (filter odd?))) (range 1e6))))
filterodd: 61.771738 msecs
mapinc: 143.895317 msecs
"Elapsed time: 438.34291 msecs"
500000
```
First argument can be a function that gets passed the time (in ms),
this allows for example to log time instead of printing it.
2017-09-19 15:26:11 +00:00
### 0.9.5
* Short (up to 4) literal collections (or literal collections with `:unroll` metadata) in collection positions in `x/for` are unrolled.
This means that the collection is not allocated.
If it's a collection of pairs (e.g. maps), pairs themselves won't be allocated.
### 0.9.4
* Add `x/into-by-key` short hand
2016-12-19 13:19:50 +00:00
### 0.7.2
* Fix transients perf issue in Clojurescript
2016-12-16 15:28:43 +00:00
### 0.7.1
* Works with Clojurescript (even self-hosted).
2016-12-08 17:20:35 +00:00
### 0.7.0
* Added 2-arg arity to `x/count` where it acts as a transducing context e.g. `(x/count (filter odd?) (range 10))`
* Preserve type hints in `x/for` (and generally with `kvrf` ).
2016-12-02 22:27:59 +00:00
### 0.6.0
* Added `x/reductions`
* Now if the first collection expression in `x/for` is not a placeholder then `x/for` works like `x/for` but returns an eduction and performs all iterations using reduce.
2016-06-01 08:37:32 +00:00
2018-03-16 16:14:33 +00:00
## Troubleshooting xforms in a Clojurescript dev environment
If you use xforms with Clojurescript and the Emacs editor to start your figwheel REPL be sure to include the `cider.nrepl/cider-middleware` to your figwheel's nrepl-middleware.
```
:figwheel {...
:nrepl-middleware [cider.nrepl/cider-middleware;;< = that middleware
refactor-nrepl.middleware/wrap-refactor
cemerick.piggieback/wrap-cljs-repl]
...}
```
Otherwise a strange interaction occurs and every results from your REPL evaluation would be returned as a String. Eg.:
```
cljs.user> 1
"1"
cljs.user>
```
instead of:
```
cljs.user> 1
1
cljs.user>
```
2015-09-03 10:39:22 +00:00
## License
2016-04-12 12:20:22 +00:00
Copyright © 2015-2016 Christophe Grand
2015-09-03 10:39:22 +00:00
Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.