We accidentally burned through 200GB of proxy bandwidth in 6 hours
(blog.skyvern.com)57 points by suchintan 8 hours ago
57 points by suchintan 8 hours ago
Hello,
A (different) proxy company owner here. This sucks! Sorry that you lost out on so much bandwidth.
Feel free to reach out to me at tim@pingproxies.com and I'd be happy to get you set up on our service and credit you with 100GB of free bandwidth to help soften the blow. I'll also be able to get you pricing alittle better than you're currently on if you are interested ;)
Within the next few months we're also releasing a bunch of tools to help stop things like this happening on our residential network such as some intelligent routing logic, spend controls and a few other things.
You may also want to look into Static Residential ISP Proxies - we charge these per IP address rather than bandwidth and they often end up more economical. We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.
@ everyone else in the thread; if you run a start-up and need proxies then email me - happy to credit you with 50GB free residential bandwidth + give some advice on infra if needed.
Cheers, Tim at Ping
Literally everyone says they use ethical sourcing, but I never believe that about any residential proxy service without solid proof.
Our main business is Static ISP Proxies; here we liaise directly with datacenters and carriers such as ATT, Comcast and others to bring subnets to their network and we'll then purchase IP transit from them.
We do also have residential peer proxies available - you're right to have ethical concerns as there are bad actors out their that effectively build botnets and spread malware to get their nodes but the industry has developed a lot over the last few years and there are numerous companies, including ourselves, which have pretty strict ethical guidelines. Their are three main ways to ethically source real residential nodes:
1. Direct payment to peers for traffic sent through their devices. There are several networks like EarnApp, Honey, Pawns and others where people can sign up and earn money for bandwidth sent through their devices. We liaise with these networks to add nodes to our pool.
2. Quid pro quo with peer through providing free apps in return for the ability to route traffic through their devices. We don't currently engage in this method but we are planning on doing so within the next 12 months through a free VPN - the important thing here is that peers have to understand what they're signing up for in return for the free service - as long as you're upfront, then it is my belief that their is informed consent and it is therefore ethical; there is often a good value proposition to the customer in these cases i.e spend $7 a month on a paid VPN service or get a free one in return for exchanging a small amount of bandwidth which has zero marginal cost.
3. Offer SDK to developers to monetize applications - this is pretty common and while it is similar to 2. - the ability to distribute the SDK to various developers makes it easier to get a large number of peers online. Again though, its important app developers provide notice of this to their users and most reputable SDK providers have strict guidelines and mandatory screens that must be shown to end users prior to registering them as a residential proxy node.
There is also a lot of other things that are involved with making an ethical network - a big thing is to just signal that bad actors and criminals aren't welcome on your network. This is usually done by banning certain domains; for example, we ban all .edu and .gov domains as well as most banking/finance websites + are a member of the Internet Watch Foundation and block their listed domains. This has stops bad actors from using our proxy network for evil + protects peers in the network from bad activity going through their devices.
Happy to answer any other questions if you have them :)
Huh?
> We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.
How does a residential proxy work? Do people rent out their internet connections to commercial services?
Computers getting infected with malware, pre-compromised cheap internet devices from Amazon/Wish.com, and game developers monetizing "free" games by running proxies in the background.
There are usually a few layers of resellers so technically the proxy provider can throw their hands up in the air and say they are unaware of any malicious activity.
I don't think it's a cloud. It's more likely a residential proxy network, which are typically created by installing malware on users' machines.
The operators of these proxy networks want to avoid detection by both the users whose bandwidth they're stealing, and by the companies whose data is being scraped. So they want to make the bandwidth very expensive. And that expensive bandwidth in turn means that their only clients are dodgy as well. Either people looking to scrape data without consent and monetize it, or outright criminals.
I use one. I run a bot on IRC that extracts the <title> of every link posted (or downloads the image/whatever and extracts Metadata) and announces that to the channel. It has become more and more pointless to run this on a vps. Google/YouTube block the IP range, a lot of websites return the cloudflare security check, Amazon works on some days and doesn't on others... Ever since I proxy via residential proxies it just works. I'm a smooth criminal. :>
There's many reputable residential proxy networks too, usually there's a lot of vetting involved too as they don't want people running illegal activities though their network.
It's almost a necessity these days to have access to that due to how much datacenter ranges are blocked.
It's kind of surprising that a presumptively legitimate company (and YC-funded startup) would out themselves as buying black market residential proxy bandwidth, isn't it?
Their frontpage also advertises the ability to pass CAPTCHAs, whether by automation or more likely by delegating them to third-world CAPTCHA farms. If that's a major selling point for your automation service then your target market probably ranges from dubious (e.g. data scrapers trying to get around limits) to extremely dubious (e.g. ticket scalpers, spammers, click fraud, etc).
How long have you been here? It's not surprising at all. HN and YC have not demonstrated an aversion to "uh, greyhat" activity.
If it were 2000, people would be sharing their ad clicking startups.
YC has funded a looooooot of sketchy companies.
Here more on "free VPNs”
https://www.kaspersky.com/blog/what-is-wrong-with-free-vpn-s...
Usually such proxy networks are outright criminal (even if users are not).
It’s not necessarily malware. There are services that are pretty upfront and pay cash money for residential US bandwidth. That said, naive people might be surprised when their IP starts getting blocked.
e.g. https://www.honeygain.com/ (something like 100GB = $20).
>>>and it's legal to scrape publicly available data, even if the websites hosting it try to block it
Is that something that's been fully decided? https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc. is the most relevant case I'm aware of, and it suggests it might actually be illegal (if you know you've been blocked, at least).
https://techcrunch.com/2024/01/24/court-rules-in-favor-of-a-...
This is another interesting example where it was allowed
Yeah, the author confirmed it in this thread actually:
Residential proxy service
https://smartproxy.com/proxies/residential-proxies/pricing
(may not be this service, but this is an example, and the price is consistent with their larger commitments)
Blizzard quite famously used BitTorrent to save bandwidth, dunno if they still do:
The discussion linked in the post is from 2022, and the corresponding issue has already been fixed:
https://issues.chromium.org/issues/40220332
I wonder if there is a more recent bug related to this?
I would have liked to see a bit more of 5 Whys here. It seems like a consistent lesson that startups have to learn over and over is how to manage external dependencies, and particularly the dangers of having Google as a dependency. This is new Chrom(e|ium) behavior, and it has a real cost, both for this company and for users, which may or may not be worth the ROI, but this is what happens when you have a large scale external dependency: stuff moves without your knowledge, consent, or control.
Instead of Always. Be. Closing. it should be Always. Be. Mitigating. Dependencies. for startups.
This is a great callout.
We had an internal discussion about how to manage dependencies effectively, and we made the decision accept the risk that comes with blindly relying on Chrome for now, instead of investing heavily in mitigating that risk today.
The main motivator was for us to continue moving fast, and accept that we have a few hard dependencies in our business.
The goal is to find product market fit, then allocate time to de-risk some of these hard dependencies. If we fail to find product market fit, this may not matter at all
I think that's a fair strategy. Strong PMF generally overcomes weak execution, the challenge is that when you have hard dependencies on entities like Google or Apple it can easily become existential. Even if you choose to move forward with this dependency you should establish guard rails within your system to ensure you catch shifts faster that may be impactful and have a plan for mitigation. For instance, you should identify key points of integration and possible alternatives even if you choose not to migrate now, so that a future migration is better understood and can be discussed intelligently in the heat of the moment. Even internal documentation can assist as a mitigation for dependency risk.
What infrastructure is this using? Bandwidth seems pretty pricy
No kidding. AWS's notoriously expensive data transfer is only $0.09/GB. Who's charging $2.50/GB? Are they running on a cellular SIM with no data plan?
You guys should look into some unlimited bandwidth options. I use https://scrapingfish.com/unlimited
Haha, apologies for the language!
We use residential proxy networks when running Skyvern to help simulate real human behaviour (because that's what Skyvern is trying to do).
We run headful browser instances (meaning a real chrome instance running with a real viewport) for the same cause!
you shouldn’t be paying by the terabyte. Colocate and just pay for the maximum throughout. Far better rates
doesn't work when the sites you're scraping block the IPs/range of your server. They're using a proxy botnet that costs a premium
you shouldn’t be paying by the terabyte. Colocate and just pay for the maximum throughout. Far better rates
doesn't work when the sites you're scraping block the IPs/range of your server. They're using a proxy botnet that costs a premium
I'm now expecting we'll see a couple things in the next few years:
1. An explosion of residential proxy networks and other stuff to circumvent blocking of cloud IP ranges, for all the various AI scraping tools to use.
2. A corresponding explosion of countermeasures to the above. Instead of blocking suspicious IPs, maybe they get a 3GB file on their request to /scrape-target.html