Commit graph

25 commits

Author SHA1 Message Date
Tao Jiang
499e9cf1be Update aws go sdk and tests (#81)
Update aws go sdk to the latest. Also, update
integration tests by publishing data using both
PutRecord and PutRecords.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:15 -06:00
wgerges-discovery
384482169c Refactor getShardIDs (#70)
* Refactor

* Use `nextToken` paramter as string.

Use `nextToken` paramter as string instead of pointer to match the original code base.

* Log the last shard token when failing.

* Use aws.StringValue to get the string pointer value.

Co-authored-by: Wesam Gerges <wesam.gerges.discovery@gmail.com>
2021-12-20 21:21:15 -06:00
Tao Jiang
5dd53bf731 Add nil check before shutdown (#68)
Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:15 -06:00
Kevin Burns
8f0d7bc8d8 Reduce log noise from found shards in worker event loop (#66)
Signed-off-by: Kev Burns <kevburnsjr@gmail.com>
2021-12-20 21:21:15 -06:00
Aurélien Rainone
f1935bc0ff Fix potentially delayed shutdown on shard sync (#64)
ull-request #62 wrongly introduced an increased delay on
shutdown.

Before #62 the `stop` channel could be triggered while waiting for
`syncShard` milliseconds, so the function could return as soon as
`stop` was received.

However #62 changed this behavior by sleeping in the default case:
`stop` couldn't be handled right away anymore. Instead it was
handled after a whole new loop, potentially delaying shutdown by
minutes. (up to synchard * 1.5 ms).

This commit fixes that.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>
2021-12-20 21:21:15 -06:00
Tao Jiang
df60778d89 Re-org code for adding jittered delay for syncShard (#63)
Minor update for the previous commit by removing duplicated code.
No functional change.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:15 -06:00
Aurélien Rainone
43a936cab3 Issue 61/add shard sync jitter (#62)
* Add a random number generator to Worker

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Add random jitter to the worker shard sync sleep

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Add random jitter in case syncShard fails

Fixes #61

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>
2021-12-20 21:21:15 -06:00
dferstay
a35f4960a8 Make Worker.Shutdown() synchronous (#58)
Previously, a WaitGroup was used to track executing ShardConsumers
and prevent Worker.Shutdown() from returning until all ShardConsumers
had completed.  Unfortunately, it was possible for Shutdown() to race
with the eventLoop(), leading to a situation where Worker.Shutdown()
returns while a ShardConsumer is still executing.

Now, we increment the WaitGroup to keep track the eventLoop() as well
as the ShardConsumers.  This prevents shutdown from returning until all
background go-routines have completed.

Signed-off-by: Daniel Ferstay <dferstay@splunk.com>
2021-12-20 21:21:15 -06:00
Tao Jiang
eb56e3b1d7 Fix broken tests (#50)
Fix some broken unit and integ tests introduced by last commit.

Tests:
1. hmake test
2. Run integration test on Goland IDE and make sure all pass.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:15 -06:00
Aurélien Rainone
21980a54e3 Expose monitoring service (#49)
* Remove MonitoringConfiguration and export no-op service

MonitoringConfiguration is not needed anymore as the user directly
implements its monitoring service or use one the default constructors.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Provide a constructor for CloudWatchMonitoringService

Unexport all fields

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Provide a constructor to PrometheusMonitoringService

Unexport fields

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Remove all CloudWatch specific-stuff from config package

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* NewWorker accepts a metrics.MonitoringService

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Fix tests

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Add WithMonitoringService to config

Instead of having an additional parameter to NewWorker so that the
user can provide its own MonitoringService, WithMonitoringService
is added to the configuration. This is much cleaner and remains
in-line with the rest of the current API.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Fix tests after introduction of WithMonitoringService

Also, fix tests that should have been fixed in earlier commits.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Move Prometheus into its own package

Also rename it to prometheus.MonitoringService to not have to repeat
Prometheus twice when using.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Move CloudWatch metrics into its own package

Also rename it to cloudwatch.MonitoringService to not have to repeat
Cloudwatch twice when using.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>

* Remove references to Cloudwatch in comments

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>
2021-12-20 21:21:15 -06:00
Tao Jiang
0d91fbd443 Add generic logger support (#43)
* Add generic logger support

The current KCL has tight coupling with logrus and it causes
issue for customer to use different logging system such as zap log.
The issue has been opened via:
https://github.com/vmware/vmware-go-kcl/issues/27

This change is to created a logger interface be able to abstract
above logrus and zap log. It makes easy to add support for other
logging system in the fugure. The work is based on:
https://www.mountedthoughts.com/golang-logger-interface/

Some updates are made in order to make logging system easily
injectable and add more unit tests.

Tested against real kinesis and dyamodb as well.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Add lumberjack configuration options to have fine grained control

Update the file log configuratio by adding most of luberjack
configuration to avoid hardcode default value. Let user to specify
the value because log retention and rotation are very important
for prod environment.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Aurélien Rainone
c8a5aa1891 Fix possible deadlock with getRecords in eventLoop (#42)
A waitgroup should always be incremented before the creation of the
goroutine which decrements it (through Done) or there is the
potential for deadlock.
That was not the case since the wg.Add was performed after the
`go getRecords() ` line.

Also, since there's only one path leading to the wg.Done in getRecords,
I moved wg.Done out of the getRecords function and placed it
alongside the goroutine creation, thus totally removing the need to
pass the waitgroup pointer to the sc instance, this lead to the
removal of the `waitGroup` field from the `ShardConsumer` struct.

This has been tested in production and didn't create any problem.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
fa0bbc42fe Update worker to let it inject checkpointer and kinesis (#28)
* Update worker to let it inject checkpointer and kinesis

Add two functions to inject checkpointer and kinesis for custom
implementation or adding mock for unit test.

This change also remove the worker_custom.go since it is no longer
needed.

Test:
  Update the integration tests to cover newly added functions.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Fix typo on the test function

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
250bb2e9ff Use AWS built-in retry logic and refactor tests (#24)
Update the unit test and move integration test under test folder.
Update retry logic by switching to AWS's default retry.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
6df520b343 Remove signal handling from event loop (#20)
Take signle handling out of event loop. Also, make the worker
Shutdown idempotent and update tests.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
2ca82c25ca Add support for providing custom checkpointer (#17)
* Add credential configuration for resources

Add credentials for Kinesis, DynamoDB and Cloudwatch. See the worker_test.go
to see how to use it.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Add support for providing custom checkpointer

Provide a new constructor for adding checkpointer instead of alway using
default dynamodb checkpointer.

The next step is to abstract out the Kinesis into a generic stream API and
this will be bigger change and will be addressed in different PR.

Test:
  Use the new construtor to inject dynamodb checkpointer and run the existing
  tests.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Add support for providing custom checkpointer

Provide a new constructor for adding checkpointer instead of alway using
default dynamodb checkpointer.

The next step is to abstract out the Kinesis into a generic stream API and
this will be bigger change and will be addressed in different PR.

Fix checkfmt error.

Test:
  Use the new construtor to inject dynamodb checkpointer and run the existing
  tests.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
c634c75ebc Add credential configuration for resources (#14)
Add credentials for Kinesis, DynamoDB and Cloudwatch. See the worker_test.go
to see how to use it.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tim Studd
cd343cca09 Add configuration options for AWS service endpoints (#5)
* Add configuration options for AWS service endpoints

Signed-off-by: Timothy Studd <tim@goguardian.com>

* Fix KCL naming consistency issue

Signed-off-by: Timothy Studd <tim@goguardian.com>
2021-12-20 21:20:13 -06:00
Tao Jiang
d6b5196b55 Update import path when switching github
Update import path in files when switching to github.
2021-12-20 21:19:26 -06:00
Tao Jiang
10e8ebb3ff KCL: Fix KCL stops processing when Kinesis Internal Error
Current, KCL doesn't release shard when returning on error
which causes the worker cannot get any shard because it has
the maximum number of shard already. This change makes sure
releasing shard when return.

update the log message.

Test:
Integration test by forcing error on reading shard to
simulate Kinesis Internal error and make sure the KCL
will not stop processing.

Jira CNA-1995

Change-Id: Iac91579634a5023ab5ed73c6af89e4ff1a9af564
2021-12-20 21:16:38 -06:00
Tao Jiang
22de13ef8a Go-KCL: Update security scan
gas is now gosec. Need to update security scan and fix
security issue as needed.

No functional change.

Jira CNA-2022

Change-Id: I36f2a204114f3f13e2ed05579c04a9c89f528f9a
2021-12-20 21:16:38 -06:00
Tao Jiang
47daa9d5f0 KCL: Update copyright and permission
All source should be prepared in a manner that reflects
comments that VMware would be comfortable sharing with
the public.

Documentation only. No functional change.

Update the license to MIT to be consistent with approved
OSSTP product tracking ticket:
https://osstp.vmware.com/oss/#/upstreamcontrib/project/1101391

Jira CNA-1117

Change-Id: I3fe31f10db954887481e3b21ccd20ec8e39c5996
2021-12-20 21:16:27 -06:00
Tao Jiang
e2a945d824 KCL: Stuck on processing after kinesis shard splitting
The processing Kinesis gets stuck after splitting shard. The
reason is that the app doesn't do mandatory checkpoint.

KCL document states:
// When the value of {@link ShutdownInput#getShutdownReason()} is
// {@link com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownReason#TERMINATE} it is required that you
// checkpoint. Failure to do so will result in an IllegalArgumentException, and the KCL no longer making progress.

Also, fix shard lease to prevent one host takes more shard than
its configuration allowed.

Jira CNA-1701

Change-Id: Icbdacaf347c7a67b5793647ad05ff93cca629741
2021-12-20 21:15:25 -06:00
Tao Jiang
85c04db6b4 KCL: Fix the way in returning error
Fix bug when doing shard sync which removing shard info.

Jira ID: CNA-612

Change-Id: Ibaf55fffa39b793abbfe3bd57999e5d37f82a52f
2021-12-20 21:15:25 -06:00
Long Zhou
2b9301cd47 Flatten directory structure
cascade-kinesis-client will be used as a submodule of other projects,
so it should not have "src/vmware.com/cascade-kinesis-client" in
its path. To build this project locally, please manually create
the parent folders.

Change-Id: I8844e6a0e32aae65b28496915d8507e9fb1058c6
2021-12-20 21:15:15 -06:00
Renamed from src/vmware.com/cascade-kinesis-client/clientlibrary/worker/worker.go (Browse further)