Organizing the tsunami of data that often accompanies scientific research is a daunting task for any IT or data science professional.
This challenge fell to Jay Smestad, senior director of information technology at PacBio, whose job was to curb the spiral sequencing data size and associated costs associated with creating the data-intensive tools used to map genomic code.
Pacific Biosciences, or PacBio, based in Menlo Park, California, creates petabytes of potentially valuable data that physicians, researchers, and other Ph.D.-bearing employees may need in the blink of an eye.
Savings Sequence
The company, which develops genomic sequencing systems and associated test accessories, stores for years data used in sequencer development and other product testing. It has a variety of on-premises systems, both legacy and more modern, ranging from NetApp, Vast Data, Spectra Logic, and others.
“At PacBio, storage is our biggest single IT cost,” says Smestad. “It’s the one we argue about and try to stay on top of.”
Smestad and his team chose the Komprise Intelligent Data Management software to automate the layering of lesser-used data, shifting data from expensive flash arrays needed for immediate, intensive workloads to cheaper tape storage for general archiving and deeper cold storage. .
The team also uses Komprise automation tools to weigh up better storage practices that keep data in the cheapest possible locations without sacrificing accessibility.
Storage is our biggest single IT cost. It’s the one we argue about and try to keep up with.
Jay SmestadSenior Director of Information Technology, Pacific Biosciences
“I think our media costs are about $0.08 per gig,” he said. “It’s super cheap. We just back up everything. We never delete unless the users delete [data] independent.”
Sinking in a data ocean
Before choosing the Komprise software, the IT team had managed a handful of different storage arrays, moving data according to scripts and tools developed by administrators. This often made data management between brands, operating systems and locations a tedious process. Stability was not a guarantee.
“You basically had administrators writing scripts or moving data from one file server to another,” Smestad said. “It’s a manual process. … If [IT admins] with fat fingers, it’s gone. It’s really scary stuff. People don’t like doing that job, [and] I really wanted to see it [our storage] more like a commodity.”
Complicating things further was the lack of unified namespaces for organizing user storage spaces and data silos – a task that Smestad and his team performed manually before the automation tools were added.
The software helps shift data around the newly christened namespaces, Smestad noted, but also helps add budget figures and data usage estimates. Those numbers were helpful when his department advocated additional technologies and upgrades.
“It was something I could present to senior management,” he said. “Putting something in a framework where you can share it with C-level people and they get it is also a really good part of the tool.”
Competition among data management software companies continues to grow as demand for multi-cloud, non-proprietary management tools grows. Direct competitors to Komprise include StrongBox Data Solutions, Data Dynamics and Aparavi, and platform-specific software from storage hardware vendors, including Dell EMC’s ClarityNow or the open source iRODS.
Mapping the future
PacBio uses Spectra Logic’s BlackPearl storage system to store older and infrequently used data, most of which is backed up to tape, Smestad said. Data that the workforce doesn’t have access to shifts to lower and lower levels every six months.
“We get about two recovery requests a year,” he said. “[Our employees] to generate [data] and they perform their analysis. If it’s garbage, they’ll never look at it again.”
A Vast Data flash array addresses the more pressing storage and data needs of PacBio. The IT team maintains this system and removes older data to maintain performance and fast access speeds.
“We need to get out of those primary tiers and get that data to those cheaper tiers quickly,” he said. “We do that with Komprise.”
Smestad expects to consider future Komprise tools to further automate policy creation and data classification.
Tim McCarthy is a journalist living on the north coast of Massachusetts. He covers news about cloud and data storage.