Documentation: language review touch-ups.
This commit is contained in:
parent
013aeba953
commit
888c596f88
7 changed files with 16 additions and 16 deletions
Binary file not shown.
|
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 28 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 70 KiB After Width: | Height: | Size: 70 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 35 KiB After Width: | Height: | Size: 37 KiB |
|
|
@ -18,7 +18,7 @@ However, the worker that is processing a lease may change since [leases may be "
|
|||
|
||||
To persist metadata about lease state (e.g., [last read checkpoint, current assigned worker][kcl-concepts]), KCL creates a lease table in [DynamoDB][dynamodb].
|
||||
Each KCL application will have its own distinct lease table that includes the application name.
|
||||
More information, including schema, is provided at [KCL LeaseTable][kcl-leasetable].
|
||||
More information, including schema, is provided at [KCL Lease Table][kcl-leasetable].
|
||||
|
||||
## Lease Assignment
|
||||
|
||||
|
|
@ -45,7 +45,7 @@ Leases follow a relatively simple, progressive state machine:
|
|||
|
||||
Excluding `SHARD_END`, these phases are illustrative of KCL logic and are not explicitly codified.
|
||||
|
||||
1. `DISCOVERY`: KCL [shard-syncing](#shard-syncing) identifies new shards.
|
||||
1. `DISCOVERY`: KCL [shard syncing](#shard-syncing) identifies new shards.
|
||||
Discovered shards may result from:
|
||||
* First time starting KCL with an empty lease table.
|
||||
* Stream operations (i.e., split, merge) that create child shards.
|
||||
|
|
@ -62,7 +62,7 @@ Discovered shards may result from:
|
|||
* Lease deletion will not occur until after its child lease(s) enter `PROCESSING`.
|
||||
* This tombstone helps KCL ensure durability and convergence for all discovered leases.
|
||||
* For more information, see [LeaseCleanupManager#cleanupLeaseForCompletedShard(...)][lease-cleanup-manager-cleanupleaseforcompletedshard][^fixed-commit-footnote]
|
||||
* [Deletion is configurable][lease-cleanup-config] yet recommended to minimize I/O of lease table scans.
|
||||
* [Deletion is configurable][lease-cleanup-config], yet recommended to minimize I/O of lease table scans.
|
||||
|
||||
### Shard Syncing
|
||||
|
||||
|
|
@ -73,7 +73,7 @@ A shard sync is not guaranteed to identify new shards (e.g., KCL has already dis
|
|||
|
||||
The following diagram is an abridged sequence diagram of key classes that initialize the shard sync workflow:
|
||||

|
||||
|
|
@ -81,11 +81,11 @@ Finally, Scheduler starts the PeriodicShardSyncManager which schedules itself to
|
|||
The following diagram outlines the key classes involved in the shard sync workflow:
|
||||

|
||||
|
||||
For more information, here are the links to KCL code:
|
||||
|
|
@ -117,10 +117,10 @@ Assuming leases `(4, 5, 7)` already exist, the leases created for an initial pos
|
|||
* `TRIM_HORIZON` creates `(0, 1)` to resolve the gap starting from the `TRIM_HORIZON`
|
||||
* `AT_TIMESTAMP(epoch=200)` creates `(0, 1)` to resolve the gap leading into epoch 200
|
||||
|
||||
#### Avoiding a Shard-Sync
|
||||
#### Avoiding a Shard Sync
|
||||
|
||||
To reduce Kinesis Data Streams API calls, KCL will attempt to avoid unnecessary shard syncs.
|
||||
For example, if the discovered shards cover the entire partition range then a shard-sync is unlikely to yield a material difference.
|
||||
For example, if the discovered shards cover the entire partition range then a shard sync is unlikely to yield a material difference.
|
||||
For more information, see
|
||||
[PeriodicShardSyncManager#checkForShardSync(...)][periodic-shard-sync-manager-checkforshardsync])[^fixed-commit-footnote].
|
||||
|
||||
|
|
@ -132,7 +132,7 @@ This operation only accounts for lease assignments and does not factor in I/O lo
|
|||
For example, leases that are equally-distributed across KCL are not guaranteed to have equal I/O distribution.
|
||||
|
||||
![Sequence diagram of the KCL Lease Taking workflow.
|
||||
Participants include the LeaseCoordinator, LeaseTaker, LeaseRefresher, and Lease Table (DDB).
|
||||
Participants include the LeaseCoordinator, LeaseTaker, LeaseRefresher, and Lease Table (DynamoDB).
|
||||
LeaseRefresher is leveraged to acquire the leases from the lease table.
|
||||
LeaseTaker identifies which leases are eligible for taking/stealing.
|
||||
All taken/stolen leases are passed through LeaseRefresher to update the lease table.
|
||||
|
|
@ -148,8 +148,8 @@ Stolen leases are randomly selected from whichever worker has the most leases.
|
|||
The maximum number of leases to steal on each loop is configured via [maxLeasesToStealAtOneTime][max-leases-to-steal-config].
|
||||
|
||||
Customers should consider the following trade-offs when configuring the lease-taking cadence:
|
||||
1. `LeaseRefresher` invokes a DDB `scan` against the lease table which has a cost proportional to the number of leases.
|
||||
1. Frequent balancing may cause high lease turn-over which incurs DDB `write` costs, and potentially redundant work for stolen leases.
|
||||
1. `LeaseRefresher` invokes a DynamoDB `scan` against the lease table which has a cost proportional to the number of leases.
|
||||
1. Frequent balancing may cause high lease turn-over which incurs DynamoDB `write` costs, and potentially redundant work for stolen leases.
|
||||
1. Low `maxLeasesToStealAtOneTime` may increase the time to fully (re)assign leases after an impactful event (e.g., deployment, host failure).
|
||||
|
||||
# Additional Reading
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@ title KCL Shard Syncing Initialization (Abridged)
|
|||
participant Scheduler as S
|
||||
participant LeaseCoordinator as LC
|
||||
participant PeriodShardSyncManager as PSS
|
||||
participant "Lease Table (DDB)" as DDB
|
||||
participant "Lease Table\n(DynamoDB)" as DDB
|
||||
|
||||
alt on initialization
|
||||
S->S: create PeriodicShardSyncManager(\n ..., leaseRefresher, leasesRecoveryAuditorExecutionFrequencyMillis, ...)
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ participant ShardDetector as SD
|
|||
participant HierarchicalShardSyncer as HSS
|
||||
participant LeaseRefresher as LR
|
||||
participant LeaseSynchronizer as LS
|
||||
participant "Lease Table (DDB)" as DDB
|
||||
participant "Lease Table\n(DynamoDB)" as DDB
|
||||
|
||||
loop every leasesRecoveryAuditorExecutionFrequencyMillis
|
||||
opt if worker is not leader
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@ title KCL Lease Taking
|
|||
participant LeaseCoordinator as LC
|
||||
participant LeaseTaker as LT
|
||||
participant LeaseRefresher as LR
|
||||
participant "Lease Table (DDB)" as DDB
|
||||
participant "Lease Table\n(DynamoDB)" as DDB
|
||||
|
||||
loop every 2*(leaseDurationMillis + epsilonMillis)
|
||||
LC->LT: takeLeases()
|
||||
|
|
|
|||
Loading…
Reference in a new issue