Comment by vivzkestrel

Comment by vivzkestrel 3 months ago

9 replies

View on Hacker News

how do you determine if the server went down?

esafak 3 months ago

By checking a health end point. (I'm not the owner.)

Reply View 8 replies

SantiagoVargas 3 months ago

Correct. It requires an unauthenticated endpoint that retuns a 200 response. So usually this is the /health endpoint, but as long as we can send a ping it works.

Reply View | 0 replies
vivzkestrel 3 months ago

ok how does it actually work. i get it you ll check for 500 errors by hitting multiple endpoints every x units of time. But the number of endpoints you must check also keeps going up for your service. Today you start and have 10 endpoints,6 months down the line you need to check 10000 endpoints every x units of time. How do you manage scaling this?

Reply View | 6 replies
- SantiagoVargas 3 months ago
  
  Right, we ping the servers every minute. Since we charge a one-time fee the credits expire after a year, but the service is scaleable. To answer your question I'll give you some more context:
  The architecture uses scalable AWS serverless components (Lambda, SQS, DynamoDB) and is well-suited to handle a large increase in monitored endpoints. The primary scaling mechanism is the automatic concurrency scaling of the Lambda functions processing messages from SQS queues. Should we scale to 10,000 endpoints we do expect some bottlenecks that would require optimizing i.e. increasing lambda timeouts/memory etc. but we'll cross that bridge when we get to it.
  For the actual sms sending our numbers can send up to 100 sms texts/second.
  
  Reply View | 5 replies
  
  jedberg 3 months ago
  
  So if one hosts one's site on AWS, then your system probably isn't going to work, eh? :)
  If AWS goes down, your site and mine both go down together. This was basically why Pagerduty got out to an early win -- they never used AWS when everyone else did.
  
  Reply View | 3 replies
  
  vivzkestrel 3 months ago
  
  thank you for the detail responses, so i understand that you have a lambda function that fires a request to fetch a website url from dynamodb, since lambda's require a memory limit and a timeout, how much memory is each function using and what is the timeout for a request (30s?) Also does each lambda function handle a single url or we doing asyncio aiohttp stuff with a whole bunch of urls at one go?
  
  Reply View | 0 replies