SCM

H5FDdsm wiki

H5FDdsm

From HPCforge Wiki

(Redirected from Main Page)
Jump to: navigation, search

The H5FDdsm project provides a Virtual File Driver for HDF5, which can be used to link two applications via a virtual file system. One application (server/host) owns a memory buffer, which may be distributed over N processes (DSM buffer) - the second application (client) writes to HDF5 in parallel using M processes and the data is diverted to the DSM host, where it can be read in parallel as if from disk. The file system is bypassed completely and the data is transmitted using one of several network protocols (MPI or TCP over sockets currently supported). Note that the interface can also be used within the same application as a parallel data staging layer, in this case, no connection is required and information is exchanged between processes using MPI.

Contents

Interface Description and Specifications

The interface is built around the HDF5 file IO library using a DSM (Distributed Shared Memory) architecture. To use the interface, one needs to follow these steps:

  • Use the HDF5 API, set up the H5FDdsm driver and link the simulation to these libraries.
  • (Optional) Use a DSM configuration file to tell the simulation where to connect to.

HDF5

HDF5 is the only format supported by the library. Other formats built on top of HDF5 should also be able to make use of the DSM driver if functions are implemented to support the selection of the driver. In other words, the code must be adapted to one of the compatible formats in order to write out data using the DSM interface. Once the simulation code adapted to write in parallel to disk, no additional line of code needs to be added to use the DSM.

For now it is also necessary to use and compile a customized version of HDF5. This version adds some features to the original HDF5 interface in order to use customized MPI virtual file drivers such as the DSM driver we provide. The VFD patched tarball can be downloaded from the H5FDdsm project page.

The next major release of HDF5 (1.10) will include these changes.

General Instructions for Building with H5FDdsm Support

First untar the archive and create a build directory (note that HDF5 must be built using cmake, for more information about cmake, have a look at the Download And Install CMake page):

tar -xjf hdf5-vfd-1.8.x.tar.bz2
cd hdf5-vfd-1.8.x
mkdir build
cd build
ccmake ..

Because the main objective of H5FDdsm is to re-route IO in parallel, you will need to build HDF5 with parallel IO support. For a general usage, here are some recommended options:

CMAKE_INSTALL_PREFIX             /path/to/install/dir
HDF5_BUILD_CPP_LIB               OFF
HDF5_BUILD_FORTRAN               ON
HDF5_BUILD_HL_LIB                ON
HDF5_BUILD_TOOLS                 ON
HDF5_ENABLE_HSIZET               ON
HDF5_ENABLE_LARGE_FILE           ON
HDF5_ENABLE_PARALLEL             ON
HDF5_ENABLE_Z_LIB_SUPPORT        ON
MPI_COMPILER                     /path/to/bin/mpicc 

After the configuration/generation steps, run make and make install.

Platform Specific Instructions

H5FDdsm

Get and Compile H5FDdsm

The last version of H5FDdsm can be obtained by downloading the latest release tarball or by using the git repository (more information in the SCM section).

As for HDF5, untar the archive, create a build directory and run cmake (never run cmake directly in the source directory):

tar -xjf h5fddsm-0.9.x.tar.bz2
cd h5fddsm-0.9.x
mkdir build
cd build
ccmake ..

The following options are then available, you will also need to provide the exact path of the HDF5 cmake directory, which is a subdirectory of your local HDF5 installation with .cmake config/target files:

H5FD_DSM_BUILD_FORTRAN           ON                                           
H5FD_DSM_BUILD_STEERING          ON (depending on steering functionalities requested or not)                                                                      
HDF5_DIR                         /path/to/hdf5/install/dir/share/cmake/hdf5-version

Again, after the configuration/generation process, run make and make install. If you have enabled H5FD_DSM_BUILD_TESTING, you can check that everything is up and running by doing (in the build directory):

ctest . (Linux)
ctest -C <CTEST_BUILD_CONFIGURATION> (Windows)

API

Main Interface

Once HDF successfully compiled, one needs only to compile and link against the H5FDdsm library using the following C or F90 interface:

(C) herr_t H5Pset_fapl_dsm(hid_t fapl_id, MPI_Comm intra_comm, void *local_buf_ptr, size_t local_buf_len);

(F90) h5pset_fapl_dsm_f(prp_id, comm, hdferr)
          INTEGER(HID_T) prp_id
          INTEGER comm, hdferr

Parameters:

  • hid_t fapl_id IN: File access property list identifier
  • MPI_Comm comm IN: MPI communicator
  • void* local_buf_ptr IN: optional local memory buffer argument (default: NULL)
  • size_t local_buf_len IN: optional local memory buffer size argument (default: 0)

The usage is basically the same as the MPI-IO driver (cf. http://www.hdfgroup.org/HDF5/Tutor/parallel.html) except that there is no need to set any transfer options (H5Pset_dxpl_mpio).

Typical usage within a simulation code:

#include <H5FDdsm.h>
 
int main()
{
  ...
  fapl_id = H5Pcreate(H5P_FILE_ACCESS);
  H5Pset_fapl_dsm(fapl_id, MPI_COMM_WORLD, NULL, 0);
  file_id = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl_id);
  H5Pclose(fapl_id);
  ...
  H5Dcreate/write/etc
  ...
  H5Fclose(file_id);
  ...
}

C++ Interface

To create DSM buffer objects and pass them to the library directly, one can use the following function:

 herr_t H5FD_dsm_set_manager(void *manager);

For instance:

#include <H5FDdsm.h>
 
int main()
{
  H5FDdsmManager *dsmManager = new H5FDdsmManager();
  // Manager initialization
  ...
  fapl_id = H5Pcreate(H5P_FILE_ACCESS);
  H5FD_dsm_set_manager(dsmManager);
  H5Pset_fapl_dsm(fapl_id, MPI_COMM_WORLD, NULL, 0);
  file_id = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl_id);
  H5Pclose(fapl_id);
  ...
  H5Dcreate/write/etc
  ...
  H5Fclose(file_id);
  ...
}

Steering Interface

When H5FD_DSM_BUILD_STEERING is turned on, the steering interface can be used to send commands and data back to the sending code. A more comprehensive documentation of the API presented below can be found in the header files of the different interfaces (C/F90).

For reference, here is the list of functions that are available:

herr_t H5FD_dsm_steering_init(MPI_Comm comm, void *buffer);
herr_t H5FD_dsm_steering_update();
herr_t H5FD_dsm_steering_is_enabled(const char *name);
herr_t H5FD_dsm_steering_scalar_get/set(const char *name, hid_t mem_type, void *data);
herr_t H5FD_dsm_steering_vector_get/set(const char *name, hid_t mem_type, hsize_t number_of_elements, void *data);
herr_t H5FD_dsm_steering_is_set(const char *name, int *set);
herr_t H5FD_dsm_steering_begin_query();
herr_t H5FD_dsm_steering_end_query();
herr_t H5FD_dsm_steering_get_handle(const char *name, hid_t *handle);
herr_t H5FD_dsm_steering_free_handle(hid_t handle);
herr_t H5FD_dsm_steering_wait();

DSM Configuration

Client Configuration File

On the simulation side, some options need to be told to the DSM such as the base server destination and transmission mode. These options are written in a .dsm_client_config file (file path set using the environment variable H5FD_DSM_CONFIG_PATH) and are automatically generated when the DSM is used with the ParaView ICARUS plugin. For instance:

[Comm]
DSM_COMM_SYSTEM=socket
DSM_BASE_HOST=test.staff.cscs.ch
DSM_BASE_PORT=22000
DSM_STATIC_INTERCOMM=false

DSM_COMM_SYSTEM=socket/mpi/mpi_rma/dmapp allows the user to choose between sockets for inter-communication and MPI (please note that the MPI option will only work if the same MPI distribution is used between the post-processing part and the simulation part, and if the different clusters/machines/nodes share the same architecture and process managers).

DSM_BASE_HOST=host defines the main contact point to initiate the connection between both ends, when using the socket mode host can be a machine host name or an IP address, when using the MPI mode, host is the connection string provided by the DSM server.

DSM_BASE_PORT=port defines the port which is used to initiate the connection, when using the MPI mode, this option is not used.

DSM_STATIC_INTERCOMM=true/false allows the inter-communicator creation to be either static or dynamic (only affects MPI communication), useful for platforms not supporting dynamic MPI communication. In that case MPMD jobs can be run using the following type of command:

mpirun -np 176 Receiver : -np 3168 Sender