Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn Alerts Proof of concept #203

Merged
merged 15 commits into from
Jan 22, 2025
Merged

Turn Alerts Proof of concept #203

merged 15 commits into from
Jan 22, 2025

Conversation

DevChima
Copy link
Contributor

@DevChima DevChima commented Jan 8, 2025

Purpose

We need a way to alert when turn messages are failing to send and deliver. 
We also need a way to act on these events for specific contacts, for example a specific error code starts the switch to SMS.

Solution

An app that receives Turn messages and exposes event metrics to prometheus.

Checklist

  • Build app
  • Added or updated unit tests
  • Updated readme/documentation (if necessary)

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments/questions but looking good so far.


# Turn Alerts

TURN_ALERTS_JOURNEY_URL = env.str("TURN_ALERTS_JOURNEY_URL", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe use the details that we configure per org: https://github.com/praekeltfoundation/rp-sidekick/blob/develop/sidekick/models.py#L21-L22

I think check if they are still being used elsewhere then we take it from there.

The idea with sidekick was always to allow multi-tenancy, for MC both QA and prod run in one instance with different orgs configured. This means you need the org id in the url like the interceptor.

from django.db import models


class TurnAlerts(models.Model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the reason for saving these? I think we want to avoid that if possible

data = {"wa_id": wa_id}

response = requests.post(
settings.TURN_ALERTS_JOURNEY_URL, headers=headers, json=data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the url from the org we just need to add the correct path on to it here.

Comment on lines 18 to 23
request_count = Counter(
"turn_alerts_request_total", "Total number of requests to TurnAlertsLayerView"
)
processing_time = Histogram(
"turn_alerts_request_processing_seconds", "Processing time of requests in seconds"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're more interested in how many messages and events are coming through.

Messages grouped by fallback, direction, type
Events grouped by status, code etc

I think in the gist we group events by type as an example

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another few comments

"fallback_channel",
"direction",
"message_type",
"message_body",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need all these, first 3 is probably fine

Comment on lines 36 to 46
event_count = Counter(
"turn_alerts_event_count",
"Number of events",
["message_status", "error_code", "error_data"],
)

successful_event_count = Counter(
"turn_alerts_successful_event_count",
"Number of events",
["message_type", "message_status", "conversation_id"],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two?

Also not sure if error_data would be useful in grafana

)


processing_time = Histogram(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funtionally no but I added this just to see how the buckets thing works to split the proessing times. I can remove.

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

Do we need all those serializers? I don't recall the hub having all of them

from sidekick.models import Organization


class TurnAlerts(models.Model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for saving this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have this saved is so it can be configurable. Do we want to hardcode the journey_id etc or have it easily changed.

Copy link
Contributor Author

@DevChima DevChima Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we no longer need at least half of the serializers there since we no longer use most of the key values for prometheus. I left them there just in case we needed to capture more data. I could delete most of them if you think otherwise. @erikh360

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please remove all serializers we don't need.

Sorry I misunderstood this model. Lets chat in Slack

organization = turn_alerts.org

headers = {
"Authorization": f"Bearer {turn_alerts.hmac_secret}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HMAC secret is something we use to validate the the request came from Turn, since the endpoint doesn't have auth on.

This should be the engage token from the org model

)
def start_turn_journey(wa_id):

turn_alerts = TurnAlerts.objects.get(pk=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded ID here?

Comment on lines 88 to 89
conversation_id=conversation_id,
conversation_type=conversation_type,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the status payload from Turn for the id and type of conversation. An example of the conversation type I see is "service". I left the ID too just in case there's an event and we need to track the conversation. Not sure what other use case would need it. @erikh360

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step in the right direction! Few comments and questions

from sidekick.models import Organization


class TurnAlerts(models.Model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have it defined in models.py and here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for testing. It creates an instance of turnalerts with the configurable objects for the org. Ex, error_code.

TurnOutboundSerializer(data=request.data).is_valid(raise_exception=True)
outbound = request.data
direction = outbound["_vnd"]["v1"].get("direction")
message_type = (outbound.get("type", ""),)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a tuple?

elif webhook_type == "turn":
TurnOutboundSerializer(data=request.data).is_valid(raise_exception=True)
outbound = request.data
direction = outbound["_vnd"]["v1"].get("direction")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sometimes not available? I would assume its always there, message can only be inbound or outbound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen it not available but I just used a get in case it happens albeit quite unlikely.

)

else:
recipient_id = statuses.get("recipient_id")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only need for the falied event that starts the journey.

event_count = Counter(
"turn_alerts_event_count",
"Number of events",
["message_status", "error_code", "conversation_id", "conversation_type"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll ever aggregate on conversation_id?

What are the possible values for conversation_type?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add fallback channel here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only conversation type I've seen so far is "service" and I'm not sure what it means. We can have fallbackchannel. I'll add that.

on_fallback_channel = request.headers.get("X-Turn-Fallback-Channel", "0") == "1"
is_turn_event = request.headers.get("X-Turn-Event", "0") == "1"

for alert in turn_alerts:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work if we have multiple alerts configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on the question. I ahve at least two alerts configured locally for two separate orgs that I've been testing with.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes here you are filtering by organisation, so 1 per org will only return 1.

If you have 2 on one org, journey_id, org_error_code, engage_token and engage_url gets overwritten for each alert found.

I would rather add the error code and journey id to a dict, which you can then check in the loop where we get the error code. The engage_* doesn't have to be set in the alert loop since we're only dealing with one org at a time.

Comment on lines 84 to 90
match = re.match(
(
r"^\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]"
r"*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*$"
),
recipient_id,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks that the recipient id matches a valid SA number. If yes, start the journey task otherwise don't.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we filter on SA phone numbers anywhere else?

turn_alerts/views.py Show resolved Hide resolved
Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments, and it doesn't look like the tests have been updated since we changed the views to match ndoh-hub, I still see vendor and contact references.

on_fallback_channel = request.headers.get("X-Turn-Fallback-Channel", "0") == "1"
is_turn_event = request.headers.get("X-Turn-Event", "0") == "1"

for alert in turn_alerts:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes here you are filtering by organisation, so 1 per org will only return 1.

If you have 2 on one org, journey_id, org_error_code, engage_token and engage_url gets overwritten for each alert found.

I would rather add the error code and journey id to a dict, which you can then check in the loop where we get the error code. The engage_* doesn't have to be set in the alert loop since we're only dealing with one org at a time.

Comment on lines 84 to 90
match = re.match(
(
r"^\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]"
r"*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*$"
),
recipient_id,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we filter on SA phone numbers anywhere else?

Comment on lines 32 to 33
"conversation_id",
"conversation_type",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these two

from turn_alerts.models import TurnActions


def create_turnalerts_account(org=None, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called create_turnalerts_account but is not creating an account?

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, mostly on the tests

from turn_alerts.models import TurnActions


def create_turnalerts(org=None, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is still confusing

self.client.force_authenticate(user)

self.org = create_org()
self.turnalerts_account = create_turnalerts(org=self.org)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_org() creates the org object that is being used here:

url = reverse("turn-messages", kwargs={'org_id' : self.org.id})

self.client.force_authenticate(user)

self.org = create_org()
self.turnalerts_account = create_turnalerts(org=self.org)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


url = reverse("turn-messages", kwargs={"org_id": self.org.id})

with patch("requests.post") as mock_post:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use responses to mock requests? or is this how we do it in the reset of sidekick?

Comment on lines 376 to 377
with patch.object(TurnActions.objects, "filter", return_value=[turn_action]):
with patch.object(Organization.objects, "get", return_value=self.org):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just create the correct objects here? The org definitely exists so that patch doesn't look like we need it

Comment on lines 378 to 379
view = TurnAlertsLayerView()
response = view.post(request, org_id=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for doing it this way and not using the client created in the setUp() function?

@@ -226,3 +227,5 @@
)

REDIS_URL = os.environ.get("REDIS_URL", "redis://localhost:6379/0")

TURN_HMAC_SECRET = os.environ.get("TURN_HMAC_SECRET", "REPLACEME")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to each webhook configured in turn

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments

class TurnSecret(models.Model):
org = models.ForeignKey(Organization, on_delete=models.CASCADE)
secret = models.CharField(max_length=255)
help_text = "The secret for validating webhook requests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this help text should be here?


def validate_signature(request, org_id, turn_secret_id):
try:
organization = Organization.objects.get(id=org_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will mean we're doing the org lookup twice, remove this and replace the line under it:
turn_secret = TurnSecret.objects.get(id=turn_secret_id, org_id=org_id)

and remove the org exception

Copy link
Collaborator

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@DevChima DevChima merged commit 97786a6 into develop Jan 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants