Commit graph

307 commits

Author SHA1 Message Date
ashwing
f6dec3e579 Fix to prevent data loss and stuck shards in the event of failed records delivery in Polling readers (#603)
* Fix to prevent data loss and stuck shards in the event of failed records delivery.

* Review comment fixes

* Access specifiers fix
2019-09-03 09:20:34 -07:00
Cory-Bradshaw
85d31c91f1
Merge pull request #599 from ashwing/v2.2.3-snapshot
Creating snapshot version for 2.2.3 release
2019-08-20 12:01:59 -07:00
Ashwin Giridharan
20cce175be Creating snapshot version for 2.2.3 release 2019-08-20 11:52:25 -07:00
ashwing
b8331f76e3 KCL 2.2.2 release (#598)
* KCL 2.2.2 release

* Addressing release notes feedback
2019-08-19 16:27:34 -07:00
ashwing
a17d14527a Preventing duplicate delivery due to unacknowledged event while completing the subscription (#596)
* Preventing duplicate delivery due to unacknowledged event while completing the subscription

* Refactored clearRecordsDeliveryQueue logic and added comments

* Code refactoring as per review comments

* Nit fix

* Add logging to unexpected subscription state scenario
2019-08-19 14:35:06 -07:00
ashwing
3f6afc6563 Limited threads resiliency fix durability nonblock (#573)
* Adding unit test case for record delivery validation

* Initial prototype for notification mechanism between ShardConsumerSubscriber and FanoutPublisher. The SDK Threads are made to block wait on the ack from the ShardConsumerSubscriber

* initial non blocking prototype

* Refactoring src and test

* Added unit test cases. Addressed review comments. Handled edge cases

* Minor code changes. Note that the previous commit has blocking impl of PrefetchPublisher

* Refactored the cleanup logic

* Fix for Cloudwatch exception handling and other revioew comment fixes

* Typo fix

* Removing cloudwatch fix. Will be released in a separate commit.

* Changing RejectedTaskEvent log message for the release

* Added javadoc to RecordsDeliveryAck and optimized imports

* Adding Kinesis Internal API tag for new concrete implementations
2019-08-16 14:24:19 -07:00
Micah Jaffe
c2a3f18670 Update ShardEnd checkpoint failure messaging (#591)
* Update shard end checkpoint failure messaging

* Update shard end checkpoint failure messaging
2019-08-13 13:18:52 -07:00
ashwing
161590c2ce Adding wait to CW PutMetric future calls (#584)
* Making CW publish calls as blocking to reduce the throttling. Disclosing the CW publish failures.

* Fixing uniut test cases and adding CW exception manager
2019-08-09 09:52:53 -07:00
yatins47
a150402e9c Fixing bug where initial subscription failure causes shard consumer to get stuck. (#562)
* Fixing bug where initial subscription fails cause shard consumer to get stuck.

* Adding some comments for the changes and simplifying the unit test.

* Adding unit tests for handling restart in case of rejection execution exception from executor service.
2019-07-11 07:16:04 -07:00
ashwing
9e2d6fa497 Fix for invalid ShardConsumer state transitions due to rejected executions (#560)
* Fix to prevent ShardConsumer state transition, when the source state task execution is rejected by the executor service.

* Unit test case improvements

* Optimized imports

* Removed unnecessary sleep in unit test case

* Fixing imports

* Fixing import again with wildcard removed

* Adding asserts to exception cases in SharConsumerTest
2019-07-08 16:30:27 -07:00
Micah Jaffe
b6236d8077 Update version to 2.2.2-SNAPSHOT (#565) 2019-07-05 12:49:53 -07:00
Micah Jaffe
7d8b281d24
Prepare for v2.2.1 (#561)
* Prepare for v2.2.1

* Minor messaging fixes

* Update CHANGELOG.md

* Update README.md

* Update CHANGELOG.md

* Update README.md

* Update CHANGELOG.md

* Update README.md
2019-07-01 11:49:56 -07:00
Micah Jaffe
fa72cf1517 Adding more logging around the rejected task executions at the Scheduler and RxJava layer (#559)
* Add diagnostic events for logging visibility

* Refactor logging diagnostics into main Scheduler loop

* Refactor log timing and level and change privacies

* Revert ExecutorStateEvent to accept ExecutorService input type

* Minor style and messaging fixes

* Fix failing unit test

* Refactor diagnostic events to use factory for testing

* Fix constructor overloading for testing

* Refactor DiagnosticEventHandler to no args constructor
2019-06-28 10:02:57 -07:00
Cory-Bradshaw
6159b869ed 2.2.1-SNAPSHOT Updated Snapshot versioning (#544)
* 2.2.1-SNAPSHOT Updated Snapshot versioning

* Updated 1.x versioning in README
2019-04-09 12:15:33 -07:00
Cory-Bradshaw
9a15971bde Added @RunWith to integration tests for LeaseIntegrationTests (#543) 2019-04-09 10:00:38 -07:00
Cory-Bradshaw
c8f82836b1 Preparation for v2.2.0 (#536) 2019-04-08 11:20:08 -07:00
awslankakamal
6c64055d9b Updating license to Apache License 2.0 (#523) 2019-04-05 15:25:09 -07:00
Cory-Bradshaw
2851a8b6e0 Introducing configuration for ignoring ReadTimeouts before printing warnings. (#528)
* Added configuration to ignore a number of ReadTimeouts before printing warnings.

     Messaging now directs customer to configure the SDK with appropriate timeouts based on their processing model.
     Warning messages from ShardConsumer now specify that the KCL will reattempt to subscribe to the stream as needed.
     Added configuraiton to Lifecycle configuration to enable ignoring a number of ReadTimeouts before printing warning messages.

* Removed functional tests that are now replicated as unit tests

* Refactored after review of Pull Request

Marked original ShardConsumerSubscriber constructor as deprecated
Renamed tests to be more descriptive

* Updated Default Value injection to ShardConsumerSubscriber

* Refactored based on PR comments

* Removed Chained Constructor from test class

* Added comments to tests to make then easier to understand

* Seperating coding of each log suppression test
2019-04-05 15:13:10 -07:00
Cory-Bradshaw
1bfaa90322 Updated versions to 2.1.4-SNAPSHOT (#526) 2019-03-26 11:59:10 -07:00
Justin Pfifer
a629185786 Release 2.1.3 of the Amazon Kinesis Client Library for Java (#519)
Milestone#30: https://github.com/awslabs/amazon-kinesis-client/milestone/30
* Added a message to recommend using `KinesisClientUtil` when an acquire timeout occurs in the `FanOutRecordsPublisher`.
  * PR#514: https://github.com/awslabs/amazon-kinesis-client/pull/514
* Added a sleep between retries while waiting for a newly created stream consumer to become active.
  * PR#506: https://github.com/awslabs/amazon-kinesis-client/issues/506
* Added timeouts on all futures returned from the DynamoDB and Kinesis clients.
  The timeouts can be configured by setting `LeaseManagementConfig#requestTimeout(Duration)` for DynamoDB, and `PollingConfig#kinesisRequestTimeout(Duration)` for Kinesis.
  * PR#518: https://github.com/awslabs/amazon-kinesis-client/pull/518
* Upgraded to SDK version 2.5.10.
  * PR#518: https://github.com/awslabs/amazon-kinesis-client/pull/518
* Artifacts for the Amazon Kinesis Client for Java are now signed by a new GPG key:
  pub   4096R/86368934 2019-02-14 [expires: 2020-02-14]
  uid                  Amazon Kinesis Tools <amazon-kinesis-tools@amazon.com>
2019-03-18 16:28:42 -07:00
Justin Pfifer
6a70e3db31 Added Timeouts for DynamoDB and Kinesis Calls (#518)
* Advance version to 2.1.3-SNAPSHOT

* Added timeouts for Kinesis and DynamoDB calls

Added a timeout to prevent an issue where the Kinesis or DynamoDB call
never completes.

For Kinesis call to GetRecords the timeout defaults to 30 seconds, and can be configured
on the PollingConfig.

For DynamoDB and Kinesis (when calling ListShards) the timeout
defaults to 60 seconds and can be configured on LeaseManagementConfig.
2019-03-18 10:06:59 -07:00
lbourdages
2f3907d19f Introducing sleep between DescribeStreamConsumer calls (#507)
* * Wait between each describe stream consumer retry

* ! Fix imports in test

* * Apply review: catch InterruptedException outside of while loop
2019-03-08 09:08:19 -08:00
sullis
c3b3846357 Upgrading maven-compiler-plugin to 3.8.0 (#498) 2019-03-06 15:14:50 -08:00
Justin Pfifer
6685a924d5 Added acquire timeout message, and a test. (#514)
The test doesn't verify the message, but does verify that an acquire
timeout triggers the FanOutRecordsPublisher to call logAcquireTimeoutMessage.
2019-03-06 13:18:15 -08:00
Justin Pfifer
610295eab4
Correct the date for Release 2.1.2 (#505) 2019-02-18 16:06:52 -08:00
Justin Pfifer
2ea2717ae2 Release 2.1.2 of the Amazon Kinesis Client Library (#504)
https://github.com/awslabs/amazon-kinesis-client/milestone/29
* Fixed handling of the progress detection in the `ShardConsumer` to restart from the last accepted record, instead of the last queued record.
  * https://github.com/awslabs/amazon-kinesis-client/pull/492
* Fixed handling of exceptions when using polling so that it will no longer treat `SdkException`s as an unexpected exception.
  * https://github.com/awslabs/amazon-kinesis-client/pull/497
  * https://github.com/awslabs/amazon-kinesis-client/pull/502
* Fixed a case where lease loss would block the `Scheduler` while waiting for a record processor's `processRecords` method to complete.
  * https://github.com/awslabs/amazon-kinesis-client/pull/501
2019-02-18 15:56:12 -08:00
Justin Pfifer
61f54eb64e Add SdkException to exception manager of KinesisDataFetcher (#502)
The KinesisDataFetcher uses the AWSExceptionManager to translate
execution exceptions into the expected exceptions.  Currently if the
exception is unexpected the exception will be wrapped in a
RuntimeException before being returned.  We depend on SdkExceptions
being the right type where we handle them upstream so we add a
configuration for SdkException which should ensure handling works as
expected.
2019-02-18 15:23:56 -08:00
Justin Pfifer
c053789409 Use an explicit lock for shutdown instead of the general lock (#501)
If the Scheduler loses its lease for a shard it will attempt to
shutdown the ShardConsumer processing that shard.  When shutting down
the ShardConsumer acquires a lock on `this` and makes the necessary
state changes.

This becomes an issue if the ShardConsumer is currently processing a
batch of records as processing of the records is done under the
general `this` lock.

When these two things combine the Scheduler can become stuck waiting
on the record processing to complete.

To fix this the ShardConsumer will now use a specific lock on shutdown
state changes to prevent the Scheduler from becoming blocked.

Allow the shutdown state change future to acquire the lock

When the ShardConsumer is being shutdown we create a future for the
state change originally the future needed to acquire the lock before
attempting to create the future task.  This changes it to acquire the
lock while running on another thread, and complete the shutdown then.
2019-02-15 12:05:23 -08:00
Sahil Palvia
3b3998a59e Fixing exception log messaging with Prefetching (#497)
* Changed Prefetch to catch `SdkException` instead of `SdkClientException`
  * With SDK 2.x service exceptions are of the type `SdkServiceException`
* Adding `*.iml` to .gitignore
2019-02-08 15:49:53 -08:00
Justin Pfifer
fd0cb8e98f Capability of restarting the subscription from last processed batch (#492)
* Started to play around with restarting from last processed

After a failure the KCL would instead restart from what the
ShardConsumer says it last processed.

* Extracted the InternalSubscriber to its own class

Extracted the InternalSubscriber to ShardConsumerSubscriber to make
testing easier. Added tests for the ShardConsumerSubscriber that
verifies error handling and other components of the class.

Added tests that verify the restart from behavior.

* Moved the ProcessRecordsInputMatcher to its own class

* Initial changes to for restarting of the PrefetchRecordsPublisher

* Remove code coverage configuration

* Switched to using explicit locks to deal with blocking queue

When the blocking queue is full it would normally enter into a fully
parked state, but would continue to hold the lock.

This changes the process to only block for a second when attempting to
enqueue a response, and if it doesn't succeed check to see if it's
been reset before attempting again.

* Changed locking around the restart, and how fetcher gets updated

Changed the locking around the restart to use a reader/writer lock
instead of single lock with a yield.

Changed how the fetcher is reset to not restart from an
advanceIteratorTo which would retrieve a new shard iterator.  Instead
the resetIterator method takes both the iterator to start from, the
last accepted sequence number, and the initial position.

* Changed test to ensure that PositionResetException is thrown

Changed the test to wait for the queue to reach capacity before
restarting the PrefetchRecordsPublisher.  This should mostly ensure
that calling restartFrom will trigger a throw of a
PositionResetException.

Added @VisibleFortest on the queue since it was already being used in testing.

* Move to snapshot

* Ensure that only one thread can be sending data at a time

In the test the TestPublisher is accessed from two threads: the test
thread, and the dispatch thread.  Both have the possibility of calling
send() under certain conditions.  This changes it so that only one of
the threads can actively be sending data at a time.

TestPublisher#requested was changed to volatile to ensure that calling
cancel can correctly set it to zero.

* Block the test until the blocking thread is in control

This test is somewhat of an odd case as it intends to test what
happens when nothing is dispatched to the ShardConsumerSubcriber for
some amount of time, but data is queued for dispatch.  To do this we
block the single thread of the executor with a lock to ensure that
items pile up in the queue so that should the restart work incorrectly
we will see lost data.
2019-02-07 12:50:41 -08:00
Justin Pfifer
b2751f09d5
Advance version to 2.1.2-SNAPSHOT (#496) 2019-02-07 07:59:37 -08:00
Sahil Palvia
5ff227c2c2
Release 2.1.1 (#494)
* Updating the versions to 2.1.1
* Updating release notes and CHANGELOG
* Updating the user agent version number
2019-02-06 15:03:56 -08:00
Sahil Palvia
7f5bd73c0b * Upgrading the SDK (#493)
* Changing KCL version to 2.1.1-SNAPSHOT
2019-02-06 11:41:53 -08:00
Justin Pfifer
a15713911c Set the region for test runs (#488)
The AWS SDK depends on being able to find which region it should
operate in.  In many cases this is either explicitly set or retrieved
from a variety of sources.  For these test cases we don't actually
care what the region. To ensure that the tests operate as expected the
region is set before the test runs, and reset upon completion.
2019-01-29 18:53:54 -08:00
Tony Wang
d79691cb3f fix wrong parameter order for session creds (#486) 2019-01-28 07:17:24 -08:00
jiaxul
8d9427e06c Add a WorkerState of SHUT_DOWN_STARTED (#457)
Added a new WorkerState that indicates when a shutdown has started
2019-01-24 12:53:39 -08:00
Sahil Palvia
03c15eb275
Introducing MultiLangDaemon support: (#483)
* Introducing MultiLangDaemon support for Enhanced Fan-Out.
* MultiLangDaemon now supports the following command line options.
  * `--properties-file`: Properties file that the KCL should use to set up the Scheduler.
  * `--log-configuration`: logback.xml that the KCL should use for logging.
* Updated AWS SDK dependency to 2.2.0.
* MultiLangDaemon now uses logback for logging.
2019-01-14 17:35:35 -08:00
Justin Pfifer
a05e22f782
Release 2.0.5 of the Amazon Kinesis Client for Java (#465) 2018-11-12 08:54:04 -08:00
Justin Pfifer
f52f2559ed Remove a possible deadlock on polling queue fill (#462)
* Remove a possible deadlock on polling queue fill

Adding new items to the receive queue for the PrefetchRecordsPublisher
when at capacity would deadlock retrievals as it was already holding
a lock on this.

The method addArrivedRecordsInput did not need to be synchronized on
this as it didn't change any of the protected
state (requestedResponses).  There is a call to drainQueueForRequests
immediately after the addArrivedRecordsInput that will ensure newly
arrived data is dispatched.

This fixes #448

* Small fix on the reasoning comment

* Adjust the test to act more like the ShardConsumer

The ShardConsuemr, which is the principal user of the
PrefetchRecordsPublisher, uses RxJava to consume from publisher. This
test uses RxJava to consume, and notifies the test thread once
MAX_ITEMS * 3 have been received. This ensures that we cycle through
the queue at least 3 times.

* Removed the upper limit on the retrievals

The way RxJava's request management makes it possible that more
requests than we might expect can happen.
2018-11-07 16:33:49 -08:00
Sahil Palvia
b83a32b492
Making configurations consistent in entire package (#453) 2018-10-25 10:40:35 -07:00
Sahil Palvia
9f9620354e Adding travis and codebuild badges to readme for v2.x branch (#456) 2018-10-25 08:00:16 -07:00
Sahil Palvia
6bd63f70dd
Updating BuildStatus badge (#452)
* Adding new BuildStatus badges.

* Updating to show single build status
2018-10-23 12:32:31 -07:00
Sahil Palvia
df6d25a201 Updating version to 2.0.5-SNAPSHOT (#450) 2018-10-23 08:09:59 -07:00
Sahil Palvia
eb7b3fd1bb Release 2.0.4 of the Amazon Kinesis Client for Java (#447)
Release 2.0.4 of the Amazon Kinesis Client for Java
2018-10-18 10:32:23 -07:00
Justin Pfifer
f2fb9ead0d
Added a sequence number validator to ensure safer checkpoints (#432)
Added the SequenceNumberValidator that will be used in the checkpoint
process to ensure that the sequence number is valid for the shard
being checkpointed.
2018-10-10 13:02:15 -07:00
akhani18
2609e1ce46 Add a listener to capture task execution in shardConsumer (#417)
* Add a listener to capture when tasks are executed in the ShardConsumer
2018-10-10 13:01:41 -07:00
xiaoyu meng
14c68296f0 Introducing HierarchicalShardSyncer inorder to run multiple Schedulers in a JVM (#395)
* Run multiple instance of scheduler on one JVM

* handling creation of shardSyncer in DynamoDBLeaseManagementFactory and LeaseManagementConfig

* remove multi-threading unit test and do some small refactorings

* refectoring

* deprecate ShardSyncer and use HierarchichalShardSyncer instead; change the order for metricsFactory and HierarchichalShardSyncer in ShardConsumerArgument

* fix typos and use mock object of shardSyncer

* delete improper comments

* fix comments

* remove duplicated comments
2018-10-09 17:29:59 -07:00
Sahil Palvia
854e316b83 Quick fix for shutdown race issue (#439)
* Added a synchronized lock in the initialize and shutdown methods
2018-10-09 13:30:35 -07:00
Sahil Palvia
0326e217f6
Updating version to 2.0.4-SNAPSHOT (#438) 2018-10-09 11:55:50 -07:00
shask-amazon
31ab0af901 Added an API on LeaseCoordinator and LeaseTaker to get all leases for… (#428)
* Added an API on LeaseCoordinator and LeaseTaker to get all leases for the application
2018-10-09 07:56:13 -07:00