In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Will defend his dissertation
I/O is probably the most limiting factor on high-end machines for large scale parallel applications as of today. Parallel I/O offers interfaces to allow multiple processes to access the same file concurrently. MPI, the {\it de-facto} standard for message passing in parallel scientific applications, has a parallel I/O specification defined in version two of the standard. Some of the features that distinguish MPI-I/O from POSIX I/O include relaxed consistency semantics, file views, collective I/O operations, and shared file pointers. As of today, when it comes to meeting the needs of the HPC community, existing MPI-I/O libraries are limited in terms of performance, modularity, and portability over different hardware architectures and file systems. In this thesis the following goals are accomplished: develop a novel and flexible architecture for a parallel I/O library, develop new algorithms for collective I/O operations, develop automatic selection algorithms for choosing an optimal or close to optimal collective I/O algorithms and its associated parameters such as the number of aggregators (the processes that actually handle the low-level I/O operations), and finally develop a static, pre-execution tuning methodology to tune for runtime parameters/algorithms that are ideal for a certain scenario.