Kinesis allows clients to rely on an invariant that, for a given partition key, the order of records added to the stream will be maintained. IE: given an input `pkey=x,val=1 pkey=x,val=2 pkey=x,val=3`, the values `1,2,3` will be seen in that order when processed by clients, so long as clients are careful. It does so by putting all records for a single partition key into a single shard, then maintaining ordering within that shard.
However, shards can be split and merge, to distribute load better and handle per-shard throughput limits. Kinesis does this currently by (one or many times) splitting a single shard into two or by merging two adjacent shards into one. When this occurs, Kinesis still allows for ordering consistency by detailing shard parent/child relationships within its `listShards` outputs. A split shard A will create children B and C, both with `ParentShardId=A`. A merging of shards A and B into C will create a new shard C with `ParentShardId=A,AdjacentParentShardId=B`. So long as clients fully process all records in parents (including adjacent parents) before processing the new shard, ordering will be maintained.
`kinesis-consumer` currently doesn't do this. Instead, upon the initial (and subsequent) `listShards` call, all visible shards immediately begin processing. Considering this case, where shards split, then merge, and each shard `X` contains a single record `rX`:
```
time ->
B
/ \
A D
\ /
C
```
record `rD` should be processed after both `rB` and `rC` are processed, and both `rB` and `rC` should wait for `rA` to be processed. By starting goroutines immediately, any ordering of `{rA,rB,rC,rD}` might occur within the original code.
This PR utilizes the `AllGroup` as a book-keeper of fully processed shards, with the `Consumer` calling `CloseShard` once it has finished a shard. `AllGroup` doesn't release a shard for processing until its parents have fully been processed, and the consumer just processes the shards it receives as it used to.
This PR created a new `CloseableGroup` interface rather than append to the existing `Group` interface to maintain backwards compatibility in existing code that may already implement the `Group` interface elsewhere. Different `Group` implementations don't get the ordering described above, but the default `Consumer` does.
* Include ShardID in Record passed to ScanFunc
* Update mock to explicitly use kinesis.Record
Supports change wherein consumer.Record is changed from an alias of
kinesis.Record to a composition containing it.
As we work towards introducing consumer groups to the repository we need a more generic name for the persistence layer for storing checkpoints and leases for given shards.
* Rename `checkpoint` to `store`
* Use shard broker to start processing new shards
The addition of a shard broker will allow the consumer to be notified
when new shards are added to the stream so it can consume them.
Fixes: https://github.com/harlow/kinesis-consumer/issues/36
Major changes:
```go
type ScanFunc func(r *Record) error
```
* Simplify the callback func signature by removing `ScanStatus`
* Leverage context for cancellation
* Add custom error `SkipCheckpoint` for special cases when we don't want to checkpoint
Minor changes:
* Use kinesis package constants for shard iterator types
* Move optional config to new file
See conversation on #75 for more details
Having an additional Client has added some confusion (https://github.com/harlow/kinesis-consumer/issues/45) on how to provide a
custom kinesis client. Allowing `WithClient` to accept a Kinesis client
it cleans up the interface.
Major changes:
* Remove the Client wrapper; prefer using kinesis client directly
* Change `ScanError` to `ScanStatus` as the return value isn't necessarily an error
Note: these are breaking changes, if you need last stable release please see here: https://github.com/harlow/kinesis-consumer/releases/tag/v0.2.0
- scanError.Error is removed since it is not used.
- session.New() is deprecated, The NewKinesisClient's function signature change so it can
returns the error from the NewSession().
* remove ValidateCheckpoint
* update for checkpoint can not customize retryer
* implement the scan error as in PR 44
* at least log if record processor has error
* mistakenly removed this line
* propage error up. ignore invalid state
Testing the components of the consumer where proving difficult because
the consumer code was so tightly coupled with the Kinesis client
library.
* Extract the concept of Client interface
* Create default client w/ kinesis connection
* Test with fake client to avoid round trip to kinesis