Commit graph

14 commits

Author SHA1 Message Date
Aurélien Rainone
c8a5aa1891 Fix possible deadlock with getRecords in eventLoop (#42)
A waitgroup should always be incremented before the creation of the
goroutine which decrements it (through Done) or there is the
potential for deadlock.
That was not the case since the wg.Add was performed after the
`go getRecords() ` line.

Also, since there's only one path leading to the wg.Done in getRecords,
I moved wg.Done out of the getRecords function and placed it
alongside the goroutine creation, thus totally removing the need to
pass the waitgroup pointer to the sc instance, this lead to the
removal of the `waitGroup` field from the `ShardConsumer` struct.

This has been tested in production and didn't create any problem.

Signed-off-by: Aurélien Rainone <aurelien.rainone@gmail.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
46fea317de Release shard lease after shutdown (#31)
* Release shard lease after shutdown

Currently, only local cached shard info has been removed when worker losts the
lease. The info inside checkpointer (dynamoDB) is not removed. This causes
lease has been hold until the lease expiration and it might take too long
for shard is ready for other worker to grab. This change release the lease
in checkpointer immediately.

The user need to ensure appropriate checkpointing before return from
Shutdown callback.

Test:
  updated unit test and integration test to ensure only the shard owner
has been wiped out and leave the checkpoint information intact.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Add code coverage reporting

Add code coverage reporting for unit test.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
ac8d341cb1 Revert "Remove shard info in checkpointer (#29)" (#30)
This reverts commit 7e382e90d5d9eb30ed38cc1ab452336860f48b57.
2021-12-20 21:21:14 -06:00
Tao Jiang
8369884952 Remove shard info in checkpointer (#29)
Currently, only local cached shard info has been removed when worker losts the
lease. The info inside checkpointer (dynamoDB) is not removed. This causes
lease has been hold until the lease expiration and it might take too long
for shard is ready for other worker to grab. This change release the lease
in checkpointer immediately.

The user need to ensure appropriate checkpointing before return from
Shutdown callback.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
2ca82c25ca Add support for providing custom checkpointer (#17)
* Add credential configuration for resources

Add credentials for Kinesis, DynamoDB and Cloudwatch. See the worker_test.go
to see how to use it.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Add support for providing custom checkpointer

Provide a new constructor for adding checkpointer instead of alway using
default dynamodb checkpointer.

The next step is to abstract out the Kinesis into a generic stream API and
this will be bigger change and will be addressed in different PR.

Test:
  Use the new construtor to inject dynamodb checkpointer and run the existing
  tests.

Signed-off-by: Tao Jiang <taoj@vmware.com>

* Add support for providing custom checkpointer

Provide a new constructor for adding checkpointer instead of alway using
default dynamodb checkpointer.

The next step is to abstract out the Kinesis into a generic stream API and
this will be bigger change and will be addressed in different PR.

Fix checkfmt error.

Test:
  Use the new construtor to inject dynamodb checkpointer and run the existing
  tests.

Signed-off-by: Tao Jiang <taoj@vmware.com>
2021-12-20 21:21:14 -06:00
Tao Jiang
03685b2b19 Fix type conversion error
Fix the compile issue of type conversion. int --> float64.
2021-12-20 21:20:13 -06:00
Tao Jiang
6a1a7b7da6 Fix the exponential backoff
Fix the calculation of exponential backoff. ^ is the XOR in
golang. Replaced it with math.exp2().
2021-12-20 21:20:13 -06:00
Tao Jiang
d6b5196b55 Update import path when switching github
Update import path in files when switching to github.
2021-12-20 21:19:26 -06:00
Tao Jiang
10e8ebb3ff KCL: Fix KCL stops processing when Kinesis Internal Error
Current, KCL doesn't release shard when returning on error
which causes the worker cannot get any shard because it has
the maximum number of shard already. This change makes sure
releasing shard when return.

update the log message.

Test:
Integration test by forcing error on reading shard to
simulate Kinesis Internal error and make sure the KCL
will not stop processing.

Jira CNA-1995

Change-Id: Iac91579634a5023ab5ed73c6af89e4ff1a9af564
2021-12-20 21:16:38 -06:00
Tao Jiang
3163d31f28 KCL: KCL should ignore deleted parent shard
After a few days of shard splitting, the parent shard will be
deleted by Kinesis system. KCL should ignore the error caused
by deleted parent shared and move on.

Test:
Manuall split shard on kcl-test stream in photon-infra account
Currently, shard3 is the parent shard of shard 4 and 5. Shard 3
has a parent shard 0 which has been deleted already. Verified
the test can run and not stuck in waiting for parent shard.

Jira CNA-2089

Change-Id: I15ed0db70ff9836313c22ccabf934a2a69379248
2021-12-20 21:16:38 -06:00
Tao Jiang
22de13ef8a Go-KCL: Update security scan
gas is now gosec. Need to update security scan and fix
security issue as needed.

No functional change.

Jira CNA-2022

Change-Id: I36f2a204114f3f13e2ed05579c04a9c89f528f9a
2021-12-20 21:16:38 -06:00
Tao Jiang
47daa9d5f0 KCL: Update copyright and permission
All source should be prepared in a manner that reflects
comments that VMware would be comfortable sharing with
the public.

Documentation only. No functional change.

Update the license to MIT to be consistent with approved
OSSTP product tracking ticket:
https://osstp.vmware.com/oss/#/upstreamcontrib/project/1101391

Jira CNA-1117

Change-Id: I3fe31f10db954887481e3b21ccd20ec8e39c5996
2021-12-20 21:16:27 -06:00
Tao Jiang
48fd4dd51c KCL: remove panic in shard consumer
There might be verious reason for shard iterator to
expire, such as: not enough data in shard or process
even takes more than 5 minutes which cause shard
iterator not refreshing enough.

This change removes log.Fatal which causes panic.
Panic inside go routine will bring down the whole
app. Therefore, just log error and exit the go routine
instead.

Jira ID: CNA-1072

Change-Id: I34a8d9af7258f3ea75465e2245bbc25c2fafee35
2021-12-20 21:15:25 -06:00
Long Zhou
2b9301cd47 Flatten directory structure
cascade-kinesis-client will be used as a submodule of other projects,
so it should not have "src/vmware.com/cascade-kinesis-client" in
its path. To build this project locally, please manually create
the parent folders.

Change-Id: I8844e6a0e32aae65b28496915d8507e9fb1058c6
2021-12-20 21:15:15 -06:00
Renamed from src/vmware.com/cascade-kinesis-client/clientlibrary/worker/shard-consumer.go (Browse further)