Comment by shakna
When I was writing a crawler for my search engine (now offline), I found almost no crawler library actually compliant with the real world. So I ended up going to a lot of effort to write one that complied with Amazon and Google's rather complicated nested robots files, including respecting the cool off periods as requested.
... And then found their own crawlers can't parse their own manifests.
Could you link the source of your crawler library?