vmware-go-kcl-v2

Author	SHA1	Message	Date
Tao Jiang	c02b7a85d4	Update unit tests Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-22 22:16:06 -06:00
Tao Jiang	86cc5a1a64	Update the libray reference path to new repo Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-21 13:49:47 -06:00
Fabiano Graças	f9ced84cbd	improve gofmt	2021-12-20 21:21:15 -06:00
Fabiano Graças	7538535bff	remove debug code	2021-12-20 21:21:15 -06:00
Fabiano Graças	a44513ef08	add parameters names in order to serve as suggestions and ignore explicitly bellow to avoid lint msgs.	2021-12-20 21:21:15 -06:00
Fabiano Graças	97c6633ea0	migrate to aws-sdk-go-v2	2021-12-20 21:21:15 -06:00
Fabiano Graças	0c204685a9	improve comments	2021-12-20 21:21:15 -06:00
Fabiano Graças	6372087bc3	removed due the new error handling https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md#error-handling	2021-12-20 21:21:15 -06:00
Fabiano Arruda	7af9290557	Upgrade golang 1.17 (#98 ) * upgrade to golang 1.17 Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> # Conflicts: # go.mod # go.sum * improve after shell lint Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> * improve after upgrade docker image (used by the build system) Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> * remove not needed variable Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> * apply fixes after security scan (hmake test) Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> * add missing package after merge with latest master branch code. Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> * improve docker layer Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> * upgrade packages Signed-off-by: Fabiano Graças <fabiano.gracas@faro.com> Co-authored-by: Fabiano Graças <fabiano.gracas@faro.com>	2021-12-20 21:21:15 -06:00
Luca Rinaldi	0094ef5a69	improve log event (#93 ) * improve log event Signed-off-by: lucarin91 <lucarin@protonmail.com> * use %+v in template string Signed-off-by: lucarin91 <lucarin@protonmail.com>	2021-12-20 21:21:15 -06:00
Connor McKelvey	7de4607b71	Add support for lease stealing (#78 ) Fixes #4 Signed-off-by: Connor McKelvey <connormckelvey@gmail.com> Signed-off-by: Ali Hobbs <alisuehobbs@gmail.com> Co-authored-by: Ali Hobbs <alisuehobbs@gmail.com> Co-authored-by: Ali Hobbs <alisuehobbs@gmail.com>	2021-12-20 21:21:15 -06:00
Ilia Cimpoes	4a642bfa2f	Use application name as default enhanced fan-out consumer name (#91 ) * Use ApplicationName as default for EnhancedFanOutConsumerName Signed-off-by: Ilia Cimpoes <ilia.cimpoes@ellation.com> * Add tests Signed-off-by: Ilia Cimpoes <ilia.cimpoes@ellation.com>	2021-12-20 21:21:15 -06:00
Ilia Cimpoes	ddcc2d0f95	Support enhanced fan-out feature (#90 ) * Implement enhanced fan-out consumer Signed-off-by: Ilia Cimpoes <ilia.cimpoes@ellation.com> * Add test cases Signed-off-by: Ilia Cimpoes <ilia.cimpoes@ellation.com> * Small adjustments in fan-out consumer Signed-off-by: Ilia Cimpoes <ilia.cimpoes@ellation.com>	2021-12-20 21:21:15 -06:00
Aurélien Rainone	909d1774a3	Add context to ErrLeaseNotAcquired (#87 ) * clientlibrary/checkpoint: convert ErrLeaseAcquired to struct Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * clientlibrary/checkpoint: add context to ErrLeaseNotAcquired Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Use errors.As to check for ErrLeaseNotAcquired error Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	adb264717b	Fix naming convention (#85 ) Minor fix on constant naming convention. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	1044485392	Support Kinesis aggregation format (#84 ) Add support for Kinesis aggregation format to consume record published by KPL. Note: current implementation need to checkpoint the whole batch of the de-aggregated records instead of just portion of them. Add cache entry and exit time. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	6ff3cd1b15	Fix retry logic for dynamodb (#83 ) Adding min/max retry and throttle delay for the retryer. Also, increase the max retries to 10 which is inline with dynamodb default retry count. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	f1982602ff	Fix data race during checkpointing (#82 ) Make sure shard is locked during checkpointing. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	499e9cf1be	Update aws go sdk and tests (#81 ) Update aws go sdk to the latest. Also, update integration tests by publishing data using both PutRecord and PutRecords. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
wgerges-discovery	384482169c	Refactor `getShardIDs` (#70 ) * Refactor * Use `nextToken` paramter as string. Use `nextToken` paramter as string instead of pointer to match the original code base. * Log the last shard token when failing. * Use aws.StringValue to get the string pointer value. Co-authored-by: Wesam Gerges <wesam.gerges.discovery@gmail.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	5dd53bf731	Add nil check before shutdown (#68 ) Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Kevin Burns	8f0d7bc8d8	Reduce log noise from found shards in worker event loop (#66 ) Signed-off-by: Kev Burns <kevburnsjr@gmail.com>	2021-12-20 21:21:15 -06:00
Aurélien Rainone	f1935bc0ff	Fix potentially delayed shutdown on shard sync (#64 ) ull-request #62 wrongly introduced an increased delay on shutdown. Before #62 the `stop` channel could be triggered while waiting for `syncShard` milliseconds, so the function could return as soon as `stop` was received. However #62 changed this behavior by sleeping in the default case: `stop` couldn't be handled right away anymore. Instead it was handled after a whole new loop, potentially delaying shutdown by minutes. (up to synchard * 1.5 ms). This commit fixes that. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	df60778d89	Re-org code for adding jittered delay for syncShard (#63 ) Minor update for the previous commit by removing duplicated code. No functional change. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Aurélien Rainone	43a936cab3	Issue 61/add shard sync jitter (#62 ) * Add a random number generator to Worker Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Add random jitter to the worker shard sync sleep Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Add random jitter in case syncShard fails Fixes #61 Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:15 -06:00
dferstay	a35f4960a8	Make Worker.Shutdown() synchronous (#58 ) Previously, a WaitGroup was used to track executing ShardConsumers and prevent Worker.Shutdown() from returning until all ShardConsumers had completed. Unfortunately, it was possible for Shutdown() to race with the eventLoop(), leading to a situation where Worker.Shutdown() returns while a ShardConsumer is still executing. Now, we increment the WaitGroup to keep track the eventLoop() as well as the ShardConsumers. This prevents shutdown from returning until all background go-routines have completed. Signed-off-by: Daniel Ferstay <dferstay@splunk.com>	2021-12-20 21:21:15 -06:00
Aurélien Rainone	6c9e594751	Make the lease refresh period configurable (#56 ) * Add LeaseRefreshSpanMillis in configuration For certain use cases of KCL the hard-coded value of 5s value, representing the time span before the end of a lease timeout in which the current owner gets to renew its own lease, is not sufficient. When the time taken by ProcessRecords is higher than 5s, the lease gets lost and the shard may end up to another worker. This commit adds a new configuration value, that defaults to 5s, to let the user set this value to its own needs. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Slight code simplification Or readability improvement Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	9ca9d901ca	Fix error in puslishing cloud watch metrics (#55 ) Reported at: https://github.com/vmware/vmware-go-kcl/issues/54 The input params are not used to set monitor service in cloudwatch Init function. The empty appName, streamName and workerID cause PutMetricData failed with error string "Error in publishing cloudwatch metrics. Error: InvalidParameter...". Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Aurélien Rainone	c9793728a3	Fix 'get records time' metric (#53 ) The time sent to the `metrics.MonitoringService.RecordGetRecordsTime`' was not the time taken by GetRecords, it was the time taken by `GetRecords` and `ProcessRecords` additioned together. Fixes #51 Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	eb56e3b1d7	Fix broken tests (#50 ) Fix some broken unit and integ tests introduced by last commit. Tests: 1. hmake test 2. Run integration test on Goland IDE and make sure all pass. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Aurélien Rainone	21980a54e3	Expose monitoring service (#49 ) * Remove MonitoringConfiguration and export no-op service MonitoringConfiguration is not needed anymore as the user directly implements its monitoring service or use one the default constructors. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Provide a constructor for CloudWatchMonitoringService Unexport all fields Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Provide a constructor to PrometheusMonitoringService Unexport fields Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Remove all CloudWatch specific-stuff from config package Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * NewWorker accepts a metrics.MonitoringService Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Fix tests Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Add WithMonitoringService to config Instead of having an additional parameter to NewWorker so that the user can provide its own MonitoringService, WithMonitoringService is added to the configuration. This is much cleaner and remains in-line with the rest of the current API. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Fix tests after introduction of WithMonitoringService Also, fix tests that should have been fixed in earlier commits. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Move Prometheus into its own package Also rename it to prometheus.MonitoringService to not have to repeat Prometheus twice when using. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Move CloudWatch metrics into its own package Also rename it to cloudwatch.MonitoringService to not have to repeat Cloudwatch twice when using. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com> * Remove references to Cloudwatch in comments Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	971d748195	Fix missing init position with AT_TIMESTAMP (#44 ) AT_TIMESTAMP start from the record at or after the specified server-side Timestamp. However, the implementation was missing. The bug was not notices until recently because most of users never use this feature. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:15 -06:00
Tao Jiang	0d91fbd443	Add generic logger support (#43 ) * Add generic logger support The current KCL has tight coupling with logrus and it causes issue for customer to use different logging system such as zap log. The issue has been opened via: https://github.com/vmware/vmware-go-kcl/issues/27 This change is to created a logger interface be able to abstract above logrus and zap log. It makes easy to add support for other logging system in the fugure. The work is based on: https://www.mountedthoughts.com/golang-logger-interface/ Some updates are made in order to make logging system easily injectable and add more unit tests. Tested against real kinesis and dyamodb as well. Signed-off-by: Tao Jiang <taoj@vmware.com> * Add lumberjack configuration options to have fine grained control Update the file log configuratio by adding most of luberjack configuration to avoid hardcode default value. Let user to specify the value because log retention and rotation are very important for prod environment. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Aurélien Rainone	c8a5aa1891	Fix possible deadlock with getRecords in eventLoop (#42 ) A waitgroup should always be incremented before the creation of the goroutine which decrements it (through Done) or there is the potential for deadlock. That was not the case since the wg.Add was performed after the `go getRecords() ` line. Also, since there's only one path leading to the wg.Done in getRecords, I moved wg.Done out of the getRecords function and placed it alongside the goroutine creation, thus totally removing the need to pass the waitgroup pointer to the sc instance, this lead to the removal of the `waitGroup` field from the `ShardConsumer` struct. This has been tested in production and didn't create any problem. Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	4f79203f44	Get rid of unused skipTableCheck (#39 )	2021-12-20 21:21:14 -06:00
Tao Jiang	46fea317de	Release shard lease after shutdown (#31 ) * Release shard lease after shutdown Currently, only local cached shard info has been removed when worker losts the lease. The info inside checkpointer (dynamoDB) is not removed. This causes lease has been hold until the lease expiration and it might take too long for shard is ready for other worker to grab. This change release the lease in checkpointer immediately. The user need to ensure appropriate checkpointing before return from Shutdown callback. Test: updated unit test and integration test to ensure only the shard owner has been wiped out and leave the checkpoint information intact. Signed-off-by: Tao Jiang <taoj@vmware.com> * Add code coverage reporting Add code coverage reporting for unit test. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	ac8d341cb1	Revert "Remove shard info in checkpointer (#29 )" (#30 ) This reverts commit 7e382e90d5d9eb30ed38cc1ab452336860f48b57.	2021-12-20 21:21:14 -06:00
Tao Jiang	8369884952	Remove shard info in checkpointer (#29 ) Currently, only local cached shard info has been removed when worker losts the lease. The info inside checkpointer (dynamoDB) is not removed. This causes lease has been hold until the lease expiration and it might take too long for shard is ready for other worker to grab. This change release the lease in checkpointer immediately. The user need to ensure appropriate checkpointing before return from Shutdown callback. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	fa0bbc42fe	Update worker to let it inject checkpointer and kinesis (#28 ) * Update worker to let it inject checkpointer and kinesis Add two functions to inject checkpointer and kinesis for custom implementation or adding mock for unit test. This change also remove the worker_custom.go since it is no longer needed. Test: Update the integration tests to cover newly added functions. Signed-off-by: Tao Jiang <taoj@vmware.com> * Fix typo on the test function Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	250bb2e9ff	Use AWS built-in retry logic and refactor tests (#24 ) Update the unit test and move integration test under test folder. Update retry logic by switching to AWS's default retry. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	6df520b343	Remove signal handling from event loop (#20 ) Take signle handling out of event loop. Also, make the worker Shutdown idempotent and update tests. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	2ca82c25ca	Add support for providing custom checkpointer (#17 ) * Add credential configuration for resources Add credentials for Kinesis, DynamoDB and Cloudwatch. See the worker_test.go to see how to use it. Signed-off-by: Tao Jiang <taoj@vmware.com> * Add support for providing custom checkpointer Provide a new constructor for adding checkpointer instead of alway using default dynamodb checkpointer. The next step is to abstract out the Kinesis into a generic stream API and this will be bigger change and will be addressed in different PR. Test: Use the new construtor to inject dynamodb checkpointer and run the existing tests. Signed-off-by: Tao Jiang <taoj@vmware.com> * Add support for providing custom checkpointer Provide a new constructor for adding checkpointer instead of alway using default dynamodb checkpointer. The next step is to abstract out the Kinesis into a generic stream API and this will be bigger change and will be addressed in different PR. Fix checkfmt error. Test: Use the new construtor to inject dynamodb checkpointer and run the existing tests. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tao Jiang	c634c75ebc	Add credential configuration for resources (#14 ) Add credentials for Kinesis, DynamoDB and Cloudwatch. See the worker_test.go to see how to use it. Signed-off-by: Tao Jiang <taoj@vmware.com>	2021-12-20 21:21:14 -06:00
Tim Studd	cd343cca09	Add configuration options for AWS service endpoints (#5 ) * Add configuration options for AWS service endpoints Signed-off-by: Timothy Studd <tim@goguardian.com> * Fix KCL naming consistency issue Signed-off-by: Timothy Studd <tim@goguardian.com>	2021-12-20 21:20:13 -06:00
Tao Jiang	03685b2b19	Fix type conversion error Fix the compile issue of type conversion. int --> float64.	2021-12-20 21:20:13 -06:00
Tao Jiang	6a1a7b7da6	Fix the exponential backoff Fix the calculation of exponential backoff. ^ is the XOR in golang. Replaced it with math.exp2().	2021-12-20 21:20:13 -06:00
Tao Jiang	d6b5196b55	Update import path when switching github Update import path in files when switching to github.	2021-12-20 21:19:26 -06:00
Tao Jiang	10e8ebb3ff	KCL: Fix KCL stops processing when Kinesis Internal Error Current, KCL doesn't release shard when returning on error which causes the worker cannot get any shard because it has the maximum number of shard already. This change makes sure releasing shard when return. update the log message. Test: Integration test by forcing error on reading shard to simulate Kinesis Internal error and make sure the KCL will not stop processing. Jira CNA-1995 Change-Id: Iac91579634a5023ab5ed73c6af89e4ff1a9af564	2021-12-20 21:16:38 -06:00
Tao Jiang	3163d31f28	KCL: KCL should ignore deleted parent shard After a few days of shard splitting, the parent shard will be deleted by Kinesis system. KCL should ignore the error caused by deleted parent shared and move on. Test: Manuall split shard on kcl-test stream in photon-infra account Currently, shard3 is the parent shard of shard 4 and 5. Shard 3 has a parent shard 0 which has been deleted already. Verified the test can run and not stuck in waiting for parent shard. Jira CNA-2089 Change-Id: I15ed0db70ff9836313c22ccabf934a2a69379248	2021-12-20 21:16:38 -06:00
Tao Jiang	9addbb57f0	KCL: Fix random number generator Fix the random number generator by adding seed. https://stackoverflow.com/questions/12321133/golang-random-number-generator-how-to-seed-properly Jira CNA-1119 Change-Id: Idfe23d84f31a47dcf43c8025632ff6f115614d34	2021-12-20 21:16:38 -06:00

1 2

56 commits