Commit graph

24 commits

Author SHA1 Message Date
Jarrad
553e2392fd
fix nil pointer dereference on AWS errors (#148)
* fix nil pointer dereference on AWS errors

* return Start errors to Scan consumer

before the previous commit e465b09, client errors panicked the
reader, so consumers would pick up sharditerator errors by virtue of
their server crashing and burning.

Now that client errors are properly returned, the behaviour of
listShards is problematic because it absorbs any client errors it gets.

The result of these two things now is that if you hit an aws error, your server will go into an
endless scan loop you can't detect and can't easily recover from.

To avoid that, listShards will now stop if it hits a client error.

---------

Co-authored-by: Jarrad Whitaker <jwhitaker 📧 swift-nav.com>
2024-06-06 08:38:16 -07:00
gram-signal
6720a01733
Maintain parent/child shard ordering across shard splits/merges. (#155)
Kinesis allows clients to rely on an invariant that, for a given partition key, the order of records added to the stream will be maintained.  IE: given an input `pkey=x,val=1  pkey=x,val=2  pkey=x,val=3`, the values `1,2,3` will be seen in that order when processed by clients, so long as clients are careful.  It does so by putting all records for a single partition key into a single shard, then maintaining ordering within that shard.

However, shards can be split and merge, to distribute load better and handle per-shard throughput limits.  Kinesis does this currently by (one or many times) splitting a single shard into two or by merging two adjacent shards into one.  When this occurs, Kinesis still allows for ordering consistency by detailing shard parent/child relationships within its `listShards` outputs.  A split shard A will create children B and C, both with `ParentShardId=A`.  A merging of shards A and B into C will create a new shard C with `ParentShardId=A,AdjacentParentShardId=B`.  So long as clients fully process all records in parents (including adjacent parents) before processing the new shard, ordering will be maintained.

`kinesis-consumer` currently doesn't do this.  Instead, upon the initial (and subsequent) `listShards` call, all visible shards immediately begin processing.  Considering this case, where shards split, then merge, and each shard `X` contains a single record `rX`:

```
time ->
  B
 / \
A   D
 \ /
  C
```

record `rD` should be processed after both `rB` and `rC` are processed, and both `rB` and `rC` should wait for `rA` to be processed.  By starting goroutines immediately, any ordering of `{rA,rB,rC,rD}` might occur within the original code.

This PR utilizes the `AllGroup` as a book-keeper of fully processed shards, with the `Consumer` calling `CloseShard` once it has finished a shard.  `AllGroup` doesn't release a shard for processing until its parents have fully been processed, and the consumer just processes the shards it receives as it used to.

This PR created a new `CloseableGroup` interface rather than append to the existing `Group` interface to maintain backwards compatibility in existing code that may already implement the `Group` interface elsewhere.  Different `Group` implementations don't get the ordering described above, but the default `Consumer` does.
2024-06-06 08:37:42 -07:00
Harlow Ward
61fa84eca6
Update to use aws-sdk-go-v2 (#141) 2021-09-21 22:00:14 -07:00
jonandrewj
a334486111
Implement a WithShardClosedHandler option for the consumer (#135) 2021-07-30 14:16:15 -07:00
Jason Tackaberry
97ffabeaa5
Include ShardID in Record passed to ScanFunc (#121)
* Include ShardID in Record passed to ScanFunc
* Update mock to explicitly use kinesis.Record

Supports change wherein consumer.Record is changed from an alias of
kinesis.Record to a composition containing it.
2020-07-30 14:18:38 -07:00
Andrew Shannon Brown
f85f25c15e Add an in-memory checkpoint to the API (#103)
* Add an in-memory checkpoint to the API

* Rename memory to store

* Rename test package to store
2019-09-08 13:13:04 -07:00
Andrew Shannon Brown
14db23eaf3 Support creating an iterator with an initial timestamp (#99)
* Allow setting initial timestamp

* Fix writing to closed channel

* Allow cancelling of request
2019-08-14 09:33:35 -07:00
Harlow Ward
35c48ef1c9
Only retry expired shard iterator errors (#95)
Fixes https://github.com/harlow/kinesis-consumer/issues/92
2019-07-30 19:48:20 -07:00
Harlow Ward
a9c97d3b93 Update examples to use Store interface 2019-07-28 21:33:19 -07:00
Harlow Ward
c72f561abd
Replace Checkpoint with Store interface (#90)
As we work towards introducing consumer groups to the repository we need a more generic name for the persistence layer for storing checkpoints and leases for given shards. 

* Rename `checkpoint` to `store`
2019-07-28 21:18:40 -07:00
Harlow Ward
97fe4e66ff
Use shard broker to monitor and process new shards (#85)
* Use shard broker to start processing new shards

The addition of a shard broker will allow the consumer to be notified
when new shards are added to the stream so it can consume them.

Fixes: https://github.com/harlow/kinesis-consumer/issues/36
2019-04-09 22:03:12 -07:00
Harlow Ward
76158d24ab
Introduce ScanFunc signature and remove ScanStatus (#77)
Major changes:

```go
type ScanFunc func(r *Record) error
```

* Simplify the callback func signature by removing `ScanStatus` 
* Leverage context for cancellation 
* Add custom error `SkipCheckpoint` for special cases when we don't want to checkpoint

Minor changes:

* Use kinesis package constants for shard iterator types
* Move optional config to new file

See conversation on #75 for more details
2019-04-07 16:29:12 -07:00
lordfarhan40
2037463c62 Fix getShardID does not return more than 100 shards (#81) 2019-02-14 20:45:32 -08:00
Harlow Ward
5688ff2820 Specify stop scan when returning scan status 2018-12-28 19:34:16 -08:00
Vincent
d3b76346f5 Break down the big function and Add tests for Scan (#65) 2018-09-03 09:59:39 -07:00
Harlow Ward
23811ec99a Add test for stopping scan 2018-07-29 10:27:01 -07:00
Harlow Ward
d6ded158bf Refactor the consumer tests
A previous PR from @vincent6767 had nicer mock Kinesis client that
simplified setting up data for the tests.

Mock client pulled from:
https://github.com/harlow/kinesis-consumer/pull/64
2018-07-29 10:13:03 -07:00
Harlow Ward
fb98fbe244
Remove the client wrapper (#58)
Having an additional Client has added some confusion (https://github.com/harlow/kinesis-consumer/issues/45) on how to provide a
custom kinesis client. Allowing `WithClient` to accept a Kinesis client
it cleans up the interface.

Major changes:

* Remove the Client wrapper; prefer using kinesis client directly
* Change `ScanError` to `ScanStatus` as the return value isn't necessarily an error

Note: these are breaking changes, if you need last stable release please see here: https://github.com/harlow/kinesis-consumer/releases/tag/v0.2.0
2018-07-28 22:53:33 -07:00
Vincent
a1239221d8 Ignore IDE files and fix code based Gometalinter (#63)
- scanError.Error is removed since it is not used.
- session.New() is deprecated, The NewKinesisClient's function signature change so it can
  returns the error from the NewSession().
2018-07-24 20:10:38 -07:00
Prometheus
e6a489c76b Scanerror signals the consumer if we should continue scanning for next record & whether to checkpoint. (#54)
* remove ValidateCheckpoint

* update for checkpoint can not customize retryer

* implement the scan error as in PR 44

* at least log if record processor has error

* mistakenly removed this line

* propage error up. ignore invalid state
2018-06-08 08:40:42 -07:00
Harlow Ward
b875bb56e7 Introduce Client Interface
Testing the components of the consumer where proving difficult because
the consumer code was so tightly coupled with the Kinesis client
library.

* Extract the concept of Client interface
* Create default client w/ kinesis connection
* Test with fake client to avoid round trip to kinesis
2017-11-26 16:00:11 -08:00
Harlow Ward
afae1bea36 Use config object for optional params
After reading notes from Peter's talk I like the idea of using a config
object where consumers of the library can override the defaults.

https://peter.bourgon.org/go-best-practices-2016/#configuration
2016-05-01 12:20:44 -07:00
Harlow Ward
3aa0f72efe Add logging when records are emitted w/ record count 2016-05-01 10:43:42 -07:00
Harlow Ward
c29698550f Add config options to Consumer
The Firehose service can take a max batch size of 500. While created the
example the need for finer grained configuration was necessary.
2016-02-09 22:31:15 -08:00