Reading hoomd xml and gsd files
hoomdxml reader
HOOMD-blue is a molecular simulation engine highly optimized for use on GPUs. Older version of HOOMD-Blue utilized an XML file format for defining simulation positions, masses, bonds, angles, etc. Support for this format has been deprecated since HOOMD-blue v2 and completely removed since the release of version 3. Nonetheless, it is still a convenient and descriptive file format, and the ability to read in such files is still important for the analysis (or conversion) of older simulations that rely on this legacy format. To read these legacy files (as well as the current GSD file format), and in particular, to convert to mbuild compounds, I wrote a lightweight Python library, hoomdxml_reader to handle these files.
Packages such as mdtraj provides support for the HOOMD-blue XML file format, including automated grouping of particles by connectivity, similar to the hoomdxml_reader. It also provides support for GSD files. However, masses, charges, angles and dihedrals are not parsed from the files by mdtraj (note, mdtraj will assign masses if the name of a particle in the file is an element, but does no know how to handle non-atomistic species such as a CG beads). mdanalysis also has an XML reader, but similarly, does not do a full parsing of the information in the XML.
This missing information may be necessary, especially for converting legacy XML files to the current GSD file format used by HOOMD-blue or to convert to MoSDeF’s mBuild Compounds or GMSO format (note, while mbuild compounds will not use angle or dihedral information, GMSO topologies will).
Usage of this package (see the documentation for more info) is pretty straight forward. One can simply pass an xml or gsd file to the System class, which will parse the information and populate various lists and data structures in System.
import hoomdxml_reader as hxml
system = hxml.System("example.hoomdxml")
# print the number of particles
print(system.n_particles)
# print the position, mass and charge of the first particle
print(system.xyz[0], system.masses[0], system.charges[0])
By default the code will identify all connected components and group them into molecules. The grouping is done by creating a “pattern” based on the name of each of the particles in the molecule, in the order the appear in the file.
For example, united atom pentane, composed of interaction sites named ‘CH3’ and ‘CH2’, has the “pattern”: ‘CH3CH2CH2CH2CH3’ in this scheme. This assumes that in a given simulation, identical molecules have identical ordering (which is the typical behavior). A list of all the unique patterns found with this scheme can be access via the unique_molecules dictionary. Given the set hierarchy of the input files and the fact that identical molecules typically are just copies of each other, this simple scheme is efficient and sufficient, and should work equally as well for both atomistic and non-atomistic species.
print(system.unique_molecules)
Since molecule names are not provided in the files, we can easily reassign them by passing a simple dictionary. This name would also be used when converting to an mbuild Compound, where the CH3 and CH2 interaction sites for a given molecule would be grouped into a Compound named pentane.
molecule_dict = {'CH3CH2CH2CH2CH3': 'pentane'}
system.set_molecule_name_by_dictionary(molecule_dict)
print(system.unique_molecules)
Converting to Compound is straightforward:
import hoomdxml_reader as hxml
import hoomdxml_reader.convert as convert
import mbuild as mb
#system is an instance of the hoomdxml_reader System class
system = hxml.System("example.hoomdxml")
# we will rename the molecules in the system;
# this is not necessary for the conversion but only done for clarity
molecule_dict = {'CH3CH2CH2CH2CH3': 'pentane', 'water': 'water'}
system.set_molecule_name_by_dictionary(molecule_dict)
#mb_system is an instance of the mBuild Compound class
mb_system = convert.System_to_Compound(system)
# convert just the first molecule
mb_molecule = convert.Molecule_to_Compound(system, system.molecules[0],
name=system.molecules[0].name)