Details:
- Nippy will continue to support thawing OLD data that was originally compressed with Snappy.
- But Nippy will no longer support freezing NEW data with Snappy.
Motivation:
- The current Snappy implementation can cause JVM crashes in some cases [1].
- The only alternative JVM implementation that seems to be safe [2] uses JNI and
so would introduce possible incompatibility issues even for folks not using Snappy.
- Nippy already moved to the superior LZ4 as its default compression scheme in v2.7.0,
more than 9 years ago.
[1] Ref. <https://github.com/airlift/aircompressor/issues/183>
[2] Ref. <https://github.com/xerial/snappy-java>
BREAKING for the very small minority of folks that use `nippy/stress-data`.
Changes:
1. Make `nippy/stress-data` a function
It's unnecessarily wasteful to generate and store all this data when it's not
being used in the common case.
2. Make data deterministic
The stress data will now generally be stable by default between different versions
of Nippy, etc. This will help support an upcoming test for stable serialized output.
Note: also considered (but ultimately rejected) idea of a separate
`*thaw-mapfn*` opt that operates directly on every `thaw-from-in!`
result.
This (transducer) approach is more flexible, and covers the most
common use cases just fine. Having both seems excessive.
This change affects small: strings, vectors, sets, and maps so that
they use *unsigned* element counts.
Before: counts in [0, 127] use 1 byte
After: counts in [0, 255] use 1 byte
I.e. doubles the range of counts that can be stored in 1 byte.
This changes saves:
- 1 byte per count in [128, 255]
Is this advantage worth the extra complexity? Probably no in isolation,
but this is a reasonable opportunity to lay the groundwork for unsigned
element counts for future types.
Before:
Longs in [ -128, 127] use 1 byte
Longs in [-32768, 32767] use 2 bytes
etc.
After:
Longs in [ -255, 255] use 1 byte
Longs in [-65535, 65535] use 2 bytes
etc.
I.e. doubles the range of longs that can be stored by 1, 2, and 4 bytes.
This changes saves:
- 1 byte per long in [ 128, 255], or [ -129, -255]
- 2 bytes per long in [32768, 65535], or [-32769, -65535]
- 4 bytes per long ...
Is this advantage worth the extra complexity? Probably yes, given how
common longs (and colls of longs) are in Clojure.