* Add diagnostic events for logging visibility
* Refactor logging diagnostics into main Scheduler loop
* Refactor log timing and level and change privacies
* Revert ExecutorStateEvent to accept ExecutorService input type
* Minor style and messaging fixes
* Fix failing unit test
* Refactor diagnostic events to use factory for testing
* Fix constructor overloading for testing
* Refactor DiagnosticEventHandler to no args constructor
* Added configuration to ignore a number of ReadTimeouts before printing warnings.
Messaging now directs customer to configure the SDK with appropriate timeouts based on their processing model.
Warning messages from ShardConsumer now specify that the KCL will reattempt to subscribe to the stream as needed.
Added configuraiton to Lifecycle configuration to enable ignoring a number of ReadTimeouts before printing warning messages.
* Removed functional tests that are now replicated as unit tests
* Refactored after review of Pull Request
Marked original ShardConsumerSubscriber constructor as deprecated
Renamed tests to be more descriptive
* Updated Default Value injection to ShardConsumerSubscriber
* Refactored based on PR comments
* Removed Chained Constructor from test class
* Added comments to tests to make then easier to understand
* Seperating coding of each log suppression test
* Advance version to 2.1.3-SNAPSHOT
* Added timeouts for Kinesis and DynamoDB calls
Added a timeout to prevent an issue where the Kinesis or DynamoDB call
never completes.
For Kinesis call to GetRecords the timeout defaults to 30 seconds, and can be configured
on the PollingConfig.
For DynamoDB and Kinesis (when calling ListShards) the timeout
defaults to 60 seconds and can be configured on LeaseManagementConfig.
The KinesisDataFetcher uses the AWSExceptionManager to translate
execution exceptions into the expected exceptions. Currently if the
exception is unexpected the exception will be wrapped in a
RuntimeException before being returned. We depend on SdkExceptions
being the right type where we handle them upstream so we add a
configuration for SdkException which should ensure handling works as
expected.
If the Scheduler loses its lease for a shard it will attempt to
shutdown the ShardConsumer processing that shard. When shutting down
the ShardConsumer acquires a lock on `this` and makes the necessary
state changes.
This becomes an issue if the ShardConsumer is currently processing a
batch of records as processing of the records is done under the
general `this` lock.
When these two things combine the Scheduler can become stuck waiting
on the record processing to complete.
To fix this the ShardConsumer will now use a specific lock on shutdown
state changes to prevent the Scheduler from becoming blocked.
Allow the shutdown state change future to acquire the lock
When the ShardConsumer is being shutdown we create a future for the
state change originally the future needed to acquire the lock before
attempting to create the future task. This changes it to acquire the
lock while running on another thread, and complete the shutdown then.
* Changed Prefetch to catch `SdkException` instead of `SdkClientException`
* With SDK 2.x service exceptions are of the type `SdkServiceException`
* Adding `*.iml` to .gitignore
* Started to play around with restarting from last processed
After a failure the KCL would instead restart from what the
ShardConsumer says it last processed.
* Extracted the InternalSubscriber to its own class
Extracted the InternalSubscriber to ShardConsumerSubscriber to make
testing easier. Added tests for the ShardConsumerSubscriber that
verifies error handling and other components of the class.
Added tests that verify the restart from behavior.
* Moved the ProcessRecordsInputMatcher to its own class
* Initial changes to for restarting of the PrefetchRecordsPublisher
* Remove code coverage configuration
* Switched to using explicit locks to deal with blocking queue
When the blocking queue is full it would normally enter into a fully
parked state, but would continue to hold the lock.
This changes the process to only block for a second when attempting to
enqueue a response, and if it doesn't succeed check to see if it's
been reset before attempting again.
* Changed locking around the restart, and how fetcher gets updated
Changed the locking around the restart to use a reader/writer lock
instead of single lock with a yield.
Changed how the fetcher is reset to not restart from an
advanceIteratorTo which would retrieve a new shard iterator. Instead
the resetIterator method takes both the iterator to start from, the
last accepted sequence number, and the initial position.
* Changed test to ensure that PositionResetException is thrown
Changed the test to wait for the queue to reach capacity before
restarting the PrefetchRecordsPublisher. This should mostly ensure
that calling restartFrom will trigger a throw of a
PositionResetException.
Added @VisibleFortest on the queue since it was already being used in testing.
* Move to snapshot
* Ensure that only one thread can be sending data at a time
In the test the TestPublisher is accessed from two threads: the test
thread, and the dispatch thread. Both have the possibility of calling
send() under certain conditions. This changes it so that only one of
the threads can actively be sending data at a time.
TestPublisher#requested was changed to volatile to ensure that calling
cancel can correctly set it to zero.
* Block the test until the blocking thread is in control
This test is somewhat of an odd case as it intends to test what
happens when nothing is dispatched to the ShardConsumerSubcriber for
some amount of time, but data is queued for dispatch. To do this we
block the single thread of the executor with a lock to ensure that
items pile up in the queue so that should the restart work incorrectly
we will see lost data.
The AWS SDK depends on being able to find which region it should
operate in. In many cases this is either explicitly set or retrieved
from a variety of sources. For these test cases we don't actually
care what the region. To ensure that the tests operate as expected the
region is set before the test runs, and reset upon completion.
* Introducing MultiLangDaemon support for Enhanced Fan-Out.
* MultiLangDaemon now supports the following command line options.
* `--properties-file`: Properties file that the KCL should use to set up the Scheduler.
* `--log-configuration`: logback.xml that the KCL should use for logging.
* Updated AWS SDK dependency to 2.2.0.
* MultiLangDaemon now uses logback for logging.
* Remove a possible deadlock on polling queue fill
Adding new items to the receive queue for the PrefetchRecordsPublisher
when at capacity would deadlock retrievals as it was already holding
a lock on this.
The method addArrivedRecordsInput did not need to be synchronized on
this as it didn't change any of the protected
state (requestedResponses). There is a call to drainQueueForRequests
immediately after the addArrivedRecordsInput that will ensure newly
arrived data is dispatched.
This fixes#448
* Small fix on the reasoning comment
* Adjust the test to act more like the ShardConsumer
The ShardConsuemr, which is the principal user of the
PrefetchRecordsPublisher, uses RxJava to consume from publisher. This
test uses RxJava to consume, and notifies the test thread once
MAX_ITEMS * 3 have been received. This ensures that we cycle through
the queue at least 3 times.
* Removed the upper limit on the retrievals
The way RxJava's request management makes it possible that more
requests than we might expect can happen.
Added the SequenceNumberValidator that will be used in the checkpoint
process to ensure that the sequence number is valid for the shard
being checkpointed.
* Run multiple instance of scheduler on one JVM
* handling creation of shardSyncer in DynamoDBLeaseManagementFactory and LeaseManagementConfig
* remove multi-threading unit test and do some small refactorings
* refectoring
* deprecate ShardSyncer and use HierarchichalShardSyncer instead; change the order for metricsFactory and HierarchichalShardSyncer in ShardConsumerArgument
* fix typos and use mock object of shardSyncer
* delete improper comments
* fix comments
* remove duplicated comments
Reverted 3 commits:
Revert "Change version number to 2.0.3-experimental"
Revert: 54c171dc2a.
Revert "Experimental support for sequence number validation in the publisher (#401)"
Revert: 592499f7bc.
Revert "Support Validating Records are From to the Expected Shard (#400)"
Revert: 01f5db8049.
* This feature enables customers to perform actions on DynamoDB lease tables once created and in the active state
* Introducing TableCreatorCallback for DynamoDB lease management
* Introducing DoesNothingTableCreatorCallback
* Intoducing TableCreatorCallback config in LeaseManagementConfig, with DoesNothingTableCreatorCallback as the default
* Introducing TableCreatorCallbackInput object.
* Updating the javadoc
* Handling ReadTimeouts gracefully
* Emitting logging messages at DEBUG level for retryable exceptions
* Introducing SubscribeToShardRetryableException
* Addressing comments
* Making private ThrowableCategory class static
* Creating static instances for acquiretimeout and readtimeout categories
* Cleaned up imports
* Renamed and moved SubscribeToShardRetryableException to RetryableRetrievalException
* Renamed UNKNOWN exception type to Other
* Moved sequence number validation to an experimental feature
Moved the sequence number validation to become an experimental feature
that can be removed in the future.
Added an annotation for experimental features.
* Delete merge conflict again?
* Add some reminder that this stuff is experimental
* Added a reason field, and some reasons
Added a reason value to the annotation, and updated two of the unusual places.