Feedback and questions regarding webhooks

In your documentation you state:

Should a configured webhook integration fail for more than 10 times without a successful response, it will be paused until a user enables it again in the Integration settings page.

I don’t think that’s a good idea. 10 times is by far not enough. Instead I would suggest to turn it automatically off if requests have failed for three days straight, or similar. Your network and API also could have issues.

Imagine the following:
Event is triggered. Event makes an API call to your website. But your website / API currently has an issue. Requests to your website fails → Hook returns „Bad request“. Now alone your website being down could trigger pausing the webhook. And yes, that does indeed happen. We see your API having troubles quite often. This morning for example.

Getting an email every time a webhook fails is kinda annoying. Please add an option to send these to a different inbox, or only send an email when a webhook gets automatically paused.

Your documentation also states:

awork expects a webhook request to return with a successful response within 30 seconds. Otherwise, the event will be marked as failed and retried for up to 10 times.

Does re-trying actually work? Because I was unable to verify that.
When is retrying attempted? When a webserver is not reachable at all? Or on response code not being 200?

Hi @thomas.poechtrager,

thanks a lot for the feedback and questions!

I will take the feedback for when a webhook is deactivated and settings for when e-mails should be send on failures with me.

For your question:
A webhook request is considered successful when the server responds with a status code 200-299.

Regarding the retry:
I will create a bug ticket for us to investigate this and inform you once the issue is resolved.

Best regards
Ian

Hi @thomas.poechtrager,

we release a fix for the webhook retries tonight, so that each event is retried up to 5 times with delays between the retries of 5sec, 1min, 5min, 30min, 1h. On top of that we add a jitter of 1-30sec so that on retry not all events are retried at the same time. Only after failing for 5 times, we increase the FailureCount of the webhook.

A webhook is deactivated after 10 consecutive failures. A successful API request resets the FailureCount.

Regarding our API performance: We are aware of the problem and have made mayor improvements in the area in the last 2 weeks. There were several issues with the way our database handled queries that were really hard to spot but we finally found the underlying issues. Together with a bunch of other improvements, this week was way more stable than the previous ones. We will heavily invest in performance improvements in the future as well.

I hope this answers your questions.
Best regards
Ian