Comment by SantiagoVargas

Comment by SantiagoVargas 3 months ago

5 replies

Right, we ping the servers every minute. Since we charge a one-time fee the credits expire after a year, but the service is scaleable. To answer your question I'll give you some more context:

The architecture uses scalable AWS serverless components (Lambda, SQS, DynamoDB) and is well-suited to handle a large increase in monitored endpoints. The primary scaling mechanism is the automatic concurrency scaling of the Lambda functions processing messages from SQS queues. Should we scale to 10,000 endpoints we do expect some bottlenecks that would require optimizing i.e. increasing lambda timeouts/memory etc. but we'll cross that bridge when we get to it.

For the actual sms sending our numbers can send up to 100 sms texts/second.

jedberg 3 months ago

So if one hosts one's site on AWS, then your system probably isn't going to work, eh? :)

If AWS goes down, your site and mine both go down together. This was basically why Pagerduty got out to an early win -- they never used AWS when everyone else did.

  • jjtang1 3 months ago

    Where you host is underrated. When I started building on-call for Rootly the first thing we did was build a multi-cloud setup (AWS and GCP) for honestly pretty overkill reliability. Don’t regret it one bit.

    • SantiagoVargas 3 months ago

      Just looked into Rootly - looks great. This was an mvp launch to test the concept but I'll see about building the multi-cloud setup. Nice to see another Western alum here.

  • SantiagoVargas 3 months ago

    Haha not quite! We host our main startup (Website + app) on AWS and have been using it for around 7 months internally, it's worked great for us so far.

    But if something crazy like the 2023 outage happens again then you're absolutely right. Though you'd likely get a news alert for it - our fallback :)

    If we get enough traction we'll look into a multi-cloud setup to mitigate that risk. For now our goal is to help with notifying you when your server goes down due to more common reasons.

vivzkestrel 3 months ago

thank you for the detail responses, so i understand that you have a lambda function that fires a request to fetch a website url from dynamodb, since lambda's require a memory limit and a timeout, how much memory is each function using and what is the timeout for a request (30s?) Also does each lambda function handle a single url or we doing asyncio aiohttp stuff with a whole bunch of urls at one go?