How Do You Manage a Terabyte?

This post originally appeared on the Software Carpentry website.

This question has come up a couple of times, and I'd welcome feedback from readers. Suppose you have a large, but not enormous, amount of scientific data to manage: too much to easily keep a copy on every researcher's laptop, but not enough to justify buying special-purpose storage hardware or hiring a full-time sys admin. What do you do? Break it into pieces, compress them with gzip or its moral equivalent, put the chunks on a central server, and create an index so that people can download and uncompress what they need, when they need it? Or...or what? What do you do, and why?

Feedback on this page? Contact us

Last edited: 2024-04-29 at 14:57:25 UTC

Edit this page on GitHub