From 4e746cb5a39df5aa833ab32ce7ba961637753a15 Mon Sep 17 00:00:00 2001 From: Abhishek Kekane Date: Fri, 20 Jan 2017 11:38:09 +0530 Subject: [PATCH] Add periodic task to clean up workflow failure Change-Id: I9d7fb601903307b54a71def2aeceb00170313317 Implements: bp add-periodic-tasks --- specs/ocata/approved/lite-specs.rst | 122 ++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 specs/ocata/approved/lite-specs.rst diff --git a/specs/ocata/approved/lite-specs.rst b/specs/ocata/approved/lite-specs.rst new file mode 100644 index 0000000..6bdc7ac --- /dev/null +++ b/specs/ocata/approved/lite-specs.rst @@ -0,0 +1,122 @@ +================== +Masakari Spec Lite +================== + +Please keep this template section in place and add your own copy of it between +the markers. Please fill only one Spec Lite per commit. + + +------------------------- + +:link: <Link to the blueprint.> + +:problem: <What is the driver to make the change.> + +:solution: <High level description what needs to get done. For example: + "We need to add client function X.Y.Z to interact with new server + functionality Z".> + +:impacts: <All possible \*Impact flags (same as in commit messages) or 'None'.> + +Optionals (please remove this line and fill or remove the rest until End): +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +:how: <More technical details than the high level overview of `solution` + if needed.> + +:alternatives: <Any alternative approaches that might be worth of bringing + to discussion.> + +:timeline: <Estimation of the time needed to complete the work.> + +:reviewers: <If reviewers has been agreed for the functionality, list them + here.> + +:assignee: <If known, list who is going to work on the feature implementation + here> + +End of Template ++++++++++++++++ + +Add periodic task to clean up workflow failure +---------------------------------------------- + +:link: https://blueprints.launchpad.net/masakari/+spec/add-periodic-tasks + +:problem: Due to some unknown circumstances, there is a possibility few of the + notifications might get into ‘error’ status or it might remain in + ‘new’ status forever. There should be some way to retrieve such + notifications and process them to completion. + +:solution: Add a periodic task “process_unfinished_notifications’ which will + execute at regular interval as defined by the new config option + “process_unfinished_notifications_interval” in seconds. Default + value for this option will be 120 seconds, however operator can set + this interval value as per the requirement. Inside this + periodic task, it will retrieve all notifications which are in + ‘error’ or ‘new’ status and then execute recovery action workflow to + process all of them. The notifications which are in ‘new’ status + will be picked up based on a new config option + ‘retry_notification_new_status_interval’. Default value for this + option will be 60 seconds, however operator can set this interval + value as per the requirement. Each notification has ‘generated_time’ + field, if this time + retry_notification_new_status_interval value + (in seconds) is greater than or equal to the current system time, + then all such notifications in ‘new’ status will be picked up by + this periodic task. Also, the notifications in ‘error’ status will + be picked up too. + + Let’s understand the transition state of notification for different + statuses for success case: + notification current status error -> running -> finished + notification current status new-> running -> finished + If the workflow execution fails, then the transition state of + notification would be: + notification current status error -> running -> failed + notification current status new-> running -> failed + + Note: One important point to take note of is if the original + notification status is ‘new’ then it won’t be retried again if the + workflow fails to process it in the periodic task. It’s status will + be directly set to ‘failed’. The operator needs to take corrective + action for all notifications which are in ‘failed‘ state manually. + +:alternatives: Add two periodic tasks ‘process_error_notifications’ and + ‘process_queued_notifications’ to process notifications which + are in ‘error’ and ‘new’ status respectively. These periodic + tasks will be called at regular intervals as defined by two + new config options “process_error_notification_interval’ and + ‘process_queued_notifications_interval’. The logic for retrieval + of notifications which are in ‘error’ and ‘new’ statuses will be + exactly same as above solution. The only difference would be + in the notification status upon its completion as explained + below. + + Transition state of notification for different statuses for + success case. + notification current status + error (process_error_notifications) -> running -> finished + notification current status + new (process_queued_notifications) -> running -> finished + + If the workflow execution fails, then the transition state of + notification would be: + notification current status + error (process_error_notifications) -> running -> failed + notification current status + new (process_queued_notifications) -> running -> error + This means that the notification will be again eligible for + reprocessing during the next cycle of + ‘process_error_notifications’ periodic task. + +:impacts: None + +:timeline: Expected to be merged within the Ocata time frame. + +:reviewers: sam47priya@gmail.com, kajinamit@nttdata.co.jp, + tushar.vitthal.patil@gmail.com + +:assignee: Abhishek Kekane + +End of Add periodic task to clean up workflow failure ++++++++++++++++++++++++++++++++++++++++++++++++++++++