Dr. Herodotos Herodotou Seminar - Multi-tiered Storage Management for Cluster Computing
- Friday, May 19, 2017 from 3:10pm to 4:00pm
- Barnard Hall - view map
Multi-tiered Storage Management for Cluster Computing
The ever-growing data storage and I/O demands of modern large-scale data analytics are challenging the current distributed storage systems. A promising trend is to exploit the recent improvements in memory, storage media, and network technologies for sustaining high performance at low cost. While recent work explores using memory and SSDs as a cache for local storage or combining local with network-attached storage, no work has ever looked at all storage tiers together in a distributed setting. We present a novel distributed file system that is aware of heterogeneous storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. The system offers a variety of pluggable policies for automating data management for increased performance and better cluster utilization. At the same time, the storage media are explicitly exposed to users and applications, allowing them to choose the distribution of replicas in the cluster based on their own performance and fault tolerance requirements. We analyze the new trends and challenges that led to our data-centric design choices, and discuss how those choices inspire new research opportunities for data-intensive processing systems.
Dr. Herodotos Herodotou is a tenure-track Lecturer in the Department of Electrical Engineering, Computer Engineering and Informatics at the Cyprus University of Technology, where he is leading the Data Intensive Computing Research Lab. He received his Ph.D. in Computer Science from Duke University in May 2012. His Ph.D. dissertation work received the SIGMOD Jim Gray Doctoral Dissertation Award Honorable Mention as well as the Outstanding Ph.D. Dissertation Award in Computer Science at Duke. His research interests are in large-scale Data Processing Systems and Database Systems. In particular, his work focuses on ease-of-use, manageability, and automated tuning of both centralized and distributed data-intensive computing systems. In addition, he is interested in applying database techniques in other areas like scientific computing, bioinformatics, social computing, and maritime technologies. His work experience includes research positions at Microsoft Research, Yahoo! Labs, and Aster Data as well as software engineering positions at Microsoft and RWD Technologies.