Comment by IshKebab
Damn surely you stop using ASCII formats before your dataset gets to 2 TB??
Damn surely you stop using ASCII formats before your dataset gets to 2 TB??
BAM format is widely used but assemblies still tend to be generated and exchanged in FASTA text. BAM is quite a big spec and I think it's fair to say that none of the simpler binary equivalents to FASTA and FASTQ have caught on yet (XKCD competing standards etc.)
Ha. it gets worse. Search engines or blacklist processors often use gigantic url lists, which are stored as plain ASCII, which is then fed into a perfect hash generator, which accesses those url's unordered. I.e. they need to create a second ordering index to access the urllist. The perfect hashing guys are mathematicians and so they don't care because their definition of a mphf (minimal perfect hash function) is just a random ordering of unique indices, but they don't care to store the ordering also. So we have ASCII and no index.