# Golang Kinesis Consumer Kinesis consumer applications written in Go. This library is intended to be a lightweight wrapper around the Kinesis API to read records, save checkpoints (with swappable backends), and gracefully recover from service timeouts/errors. __Alternate serverless options:__ * [Kinesis to Firehose](http://docs.aws.amazon.com/firehose/latest/dev/writing-with-kinesis-streams.html) can be used to archive data directly to S3, Redshift, or Elasticsearch without running a consumer application. * [Process Kinesis Streams with Golang and AWS Lambda](https://medium.com/@harlow/processing-kinesis-streams-w-aws-lambda-and-golang-264efc8f979a) for serverless processing and checkpoint management. ## Installation Get the package source: $ go get github.com/harlow/kinesis-consumer ## Overview The consumer leverages a handler func that accepts a Kinesis record. The `Scan` method will consume all shards concurrently and call the callback func as it receives records from the stream. _Important: The default Log, Counter, and Checkpoint are no-op which means no logs, counts, or checkpoints will be emitted when scanning the stream. See the options below to override these defaults._ ```go import( // ... consumer "github.com/harlow/kinesis-consumer" ) func main() { var stream = flag.String("stream", "", "Stream name") flag.Parse() // consumer c, err := consumer.New(*stream) if err != nil { log.Fatalf("consumer error: %v", err) } // start scan err = c.Scan(context.TODO(), func(r *consumer.Record) consumer.ScanStatus { fmt.Println(string(r.Data)) return consumer.ScanStatus{ StopScan: false, // true to stop scan SkipCheckpoint: false, // true to skip checkpoint } }) if err != nil { log.Fatalf("scan error: %v", err) } // Note: If you need to aggregate based on a specific shard the `ScanShard` // method should be leverged instead. } ``` ## Scan status The scan func returns a `consumer.ScanStatus` the struct allows some basic flow control. ```go // continue scanning return consumer.ScanStatus{} // continue scanning, skip saving checkpoint return consumer.ScanStatus{SkipCheckpoint: true} // stop scanning, return nil return consumer.ScanStatus{StopScan: true} // stop scanning, return error return consumer.ScanStatus{Error: err} ``` ## Checkpoint To record the progress of the consumer in the stream we use a checkpoint to store the last sequence number the consumer has read from a particular shard. The boolean value SkipCheckpoint of consumer.ScanError determines if checkpoint will be activated. ScanError is returned by the record processing callback. This will allow consumers to re-launch and pick up at the position in the stream where they left off. The uniq identifier for a consumer is `[appName, streamName, shardID]` kinesis-checkpoints Note: The default checkpoint is no-op. Which means the scan will not persist any state and the consumer will start from the beginning of the stream each time it is re-started. To persist scan progress choose one of the following checkpoints: ### Redis Checkpoint The Redis checkpoint requries App Name, and Stream Name: ```go import checkpoint "github.com/harlow/kinesis-consumer/checkpoint/redis" // redis checkpoint ck, err := checkpoint.New(appName) if err != nil { log.Fatalf("new checkpoint error: %v", err) } ``` ### DynamoDB Checkpoint The DynamoDB checkpoint requires Table Name, App Name, and Stream Name: ```go import checkpoint "github.com/harlow/kinesis-consumer/checkpoint/ddb" // ddb checkpoint ck, err := checkpoint.New(appName, tableName) if err != nil { log.Fatalf("new checkpoint error: %v", err) } // Override the Kinesis if any needs on session (e.g. assume role) myDynamoDbClient := dynamodb.New(session.New(aws.NewConfig())) // For versions of AWS sdk that fixed config being picked up properly, the example of // setting region should work. // myDynamoDbClient := dynamodb.New(session.New(aws.NewConfig()), &aws.Config{ // Region: aws.String("us-west-2"), // }) ck, err := checkpoint.New(*app, *table, checkpoint.WithDynamoClient(myDynamoDbClient)) if err != nil { log.Fatalf("new checkpoint error: %v", err) } // Or we can provide your own Retryer to customize what triggers a retry inside checkpoint // See code in examples // ck, err := checkpoint.New(*app, *table, checkpoint.WithDynamoClient(myDynamoDbClient), checkpoint.WithRetryer(&MyRetryer{})) ``` To leverage the DDB checkpoint we'll also need to create a table: ``` Partition key: namespace Sort key: shard_id ``` screen shot 2017-11-22 at 7 59 36 pm ### Postgres Checkpoint The Postgres checkpoint requires Table Name, App Name, Stream Name and ConnectionString: ```go import checkpoint "github.com/harlow/kinesis-consumer/checkpoint/postgres" // postgres checkpoint ck, err := checkpoint.New(app, table, connStr) if err != nil { log.Fatalf("new checkpoint error: %v", err) } ``` To leverage the Postgres checkpoint we'll also need to create a table: ```sql CREATE TABLE kinesis_consumer ( namespace text NOT NULL, shard_id text NOT NULL, sequence_number numeric NOT NULL, CONSTRAINT kinesis_consumer_pk PRIMARY KEY (namespace, shard_id) ); ``` The table name has to be the same that you specify when creating the checkpoint. The primary key composed by namespace and shard_id is mandatory in order to the checkpoint run without issues and also to ensure data integrity. ## Options The consumer allows the following optional overrides. ### Kinesis Client Override the Kinesis client if there is any special config needed: ```go // client client := kinesis.New(session.New(aws.NewConfig())) // consumer c, err := consumer.New(streamName, consumer.WithClient(client)) ``` ### Metrics Add optional counter for exposing counts for checkpoints and records processed: ```go // counter counter := expvar.NewMap("counters") // consumer c, err := consumer.New(streamName, consumer.WithCounter(counter)) ``` The [expvar package](https://golang.org/pkg/expvar/) will display consumer counts: ``` "counters": { "checkpoints": 3, "records": 13005 }, ``` ### Logging Logging supports the basic built-in logging library or use thrid party external one, so long as it implements the Logger interface. For example, to use the builtin logging package, we wrap it with myLogger structure. ``` // A myLogger provides a minimalistic logger satisfying the Logger interface. type myLogger struct { logger *log.Logger } // Log logs the parameters to the stdlib logger. See log.Println. func (l *myLogger) Log(args ...interface{}) { l.logger.Println(args...) } ``` The package defaults to `ioutil.Discard` so swallow all logs. This can be customized with the preferred logging strategy: ```go // logger log := &myLogger{ logger: log.New(os.Stdout, "consumer-example: ", log.LstdFlags) } // consumer c, err := consumer.New(streamName, consumer.WithLogger(logger)) ``` To use a more complicated logging library, e.g. apex log ``` type myLogger struct { logger *log.Logger } func (l *myLogger) Log(args ...interface{}) { l.logger.Infof("producer", args...) } func main() { log := &myLogger{ logger: alog.Logger{ Handler: text.New(os.Stderr), Level: alog.DebugLevel, }, } ``` ### Opentracing To enable integraton with Opentracing. Checkpoint, Consumer are now required to pass in context as first parameter. Context object wraps tracing context within and is required to pass down to other layer. Another change, that should be invisible from user is that, all AWS SDK GO call are now using the version WithContext, e.g. if codebase is using GetID(...), now they are replaced with GetIDWithContext(ctx,...). This is done so we can link the span created for AWS call to spans created upstream within application code. ## Contributing Please see [CONTRIBUTING.md] for more information. Thank you, [contributors]! [LICENSE]: /MIT-LICENSE [CONTRIBUTING.md]: /CONTRIBUTING.md ## License Copyright (c) 2015 Harlow Ward. It is free software, and may be redistributed under the terms specified in the [LICENSE] file. [contributors]: https://github.com/harlow/kinesis-connectors/graphs/contributors > [www.hward.com](http://www.hward.com)  ·  > GitHub [@harlow](https://github.com/harlow)  ·  > Twitter [@harlow_ward](https://twitter.com/harlow_ward)