Through the looking glass: Implementing KIM-API for Your Molecular Dynamics Simulator
In for penny in for pound! After giving a bare-bones example on how to create a KIM-API potable model using C++, I thought why not go deeper and have a look at how KIM-API works on the simulator side. It might help clarifying several key design decision and ideas. Goal here is simple, initialize and call a KIM-API portable model for inference, and show the key steps involved.
I will use the C API, instead of C++ API for simulator side calls as, i) it is more universal, C functions can be used in almost every language cross-call, where C++ name mangling complicates it, ii) it is nearly the same in both languages, as you will see you can very easily guess the C++ API calls from the C names. Also here we will use a dummy simulator ( a simple C/C++ main function!) but the core idea still holds.
The Knowledgebase of Interatomic Models (KIM-API) is a powerful framework that provides a standardized interface between molecular simulation codes and interatomic potentials. If you're developing a molecular dynamics simulator or computational materials science tool, integrating KIM-API gives you access to hundreds of validated interatomic models without having to implement each one individually.
KIM-API provides the API in C, C++ and Fortran natively. It also provides a Python package KIMPY for interfacing with Python based simulators (mostly used in ASE calculators). I am currently trying to port it Julia as well for providing interface with Julia based MD simulators like Molly.jl (more on it later).
Core Concepts
Before diving into implementation, let's understand the key components:
- Model: The interatomic potential implementation (e.g., Stillinger-Weber, EAM, MEAM)
- ComputeArguments: Container for input/output data (positions, forces, energy)
- Neighbor Lists: Efficient data structures for keeping tabs on particle interactions
- Units System: Consistent unit handling across models and simulators
Lets walk through the steps on how to create the models and get them to spit energy and forces. Basic flow involves 3 major steps:
- Initialize the model (calls the
model_driver_create
function we implemented in our portable model driver). - Creating the
ComputeArguments
, basically setting pointers to positions, species, etc, and memory locations to save the energy/forces or other computed properties like virials. These pointers are typically provided by the simulator. - Neighbor list set-up. I believe this is the most tricky and crucial step. As remaineder of the compute arguments are just read/write operations, neighbor lists are interactively used by the model driver (remember those
GetNeighborList
calls?). Therefore it is crucial to abstract away any arbitrary arguments in the neighbor list calls such that we can have a uniform API across all simulators, and languages. We will discuss it in more detail later.
Step 1: Model Initialization
The first step is creating and initializing a KIM model. This involves specifying the units system and model name. Supported units of Length, Energy, Charge, Temperature, and Time are strongly typed and enumerated in KIM-API to ensure error-proof model inference. Most model works with all units enumerated, by internal unit conversion mechanism (mostly by transforming the parameters); some models, like the ML ones, might raise an error when asked to initialize in units different from the ones they were trained in. This is because there is no cheap or easy way to transform the model parameters in ML models post training. You can check all the supported units in linked KIM-API docs pages.
C Function Name: KIM_Model_Create
call
int KIM_Model_Create(
int numbering, // 0 for zero-based, 1 for one-based indexing
int length_unit, // e.g., KIM_LENGTH_UNIT_A (Angstrom)
int energy_unit, // e.g., KIM_ENERGY_UNIT_eV
int charge_unit, // e.g., KIM_CHARGE_UNIT_e
int temperature_unit, // e.g., KIM_TEMPERATURE_UNIT_K
int time_unit, // e.g., KIM_TIME_UNIT_ps
const char* model_name, // e.g., "SW_StillingerWeber_1985_Si__MO_405512056662_006"
int* units_accepted, // Output: 1 if units accepted, 0 otherwise
void** model_ptr // Output: pointer to created model
);
This function creates a model instance with your specified units. The model will either accept your units or request a conversion. Always check units_accepted
to ensure compatibility.
Step 2: Creating Compute Arguments
Once the model is initialized, create a compute arguments object to hold calculation data. Initializing it is simple:
C Function Name: KIM_Model_ComputeArgumentsCreate
int KIM_Model_ComputeArgumentsCreate(
void* model,
void** compute_arguments
);
This creates a container that will hold all the data needed for calculations - particle positions, species, and output quantities like energy and forces. Before assigning those pointers, we need to check what all arguments are supported. For example, some models might be able to compute total energy, but might not compute per particle energy decomposition. Hence they will return notSupported
enum value for the Support Status.
Step 3: Checking Model Capabilities
Before setting up calculations, verify what the model supports:
C Function: KIM_ComputeArguments_GetArgumentSupportStatus
void KIM_ComputeArguments_GetArgumentSupportStatus(
void* compute_arguments,
int argument_name, // e.g., KIM_COMPUTE_ARGUMENT_NAME_partialEnergy
int* support_status // Output: KIM_SUPPORT_STATUS_required/optional/notSupported
);
Common argument names include:
KIM_COMPUTE_ARGUMENT_NAME_partialEnergy
- Total energyKIM_COMPUTE_ARGUMENT_NAME_partialForces
- Forces on particlesKIM_COMPUTE_ARGUMENT_NAME_partialParticleEnergy
- Per-particle energiesKIM_COMPUTE_ARGUMENT_NAME_partialVirial
- Virial stress tensor
Check here for the complete list.
Step 4: Setting Data Pointers
How does your model know what is asked to be computed? Simple, it checks for valid pointers in the ComputeArgument
containers, and presume that for any argument, if there exist a valid pointer, that property has to be computed. Once ensuring that the model indeed supports the property you want the model to compute, you set the pointers.
C Functions for Setting Pointers
// For integer data (particle count, species codes, contributing flags)
int KIM_ComputeArguments_SetArgumentPointerInteger(
void* compute_arguments,
int argument_name,
int* ptr
);
// For floating-point data (coordinates, forces, energy)
int KIM_ComputeArguments_SetArgumentPointerDouble(
void* compute_arguments,
int argument_name,
double* ptr
);
Essential pointers to set:
numberOfParticles
- Total particle count (including ghost atoms for PBC)particleSpeciesCodes
- Integer codes for each particle's elementcoordinates
- 3N array of particle positionsparticleContributing
- 1 for real atoms, 0 for ghost atomspartialEnergy
- Pointer to store computed energypartialForces
- 3N array to store computed forces
If you are familiar with the LAMMPS Pair style implementation, some of these pointers (for LAMMPS) are
KIM Argument | LAMMPS Ptr |
---|---|
coordinates |
atom->x |
partialEnergy |
eng_vdwl |
partialForces |
atom->f |
Similarly, for other simulators you need to find the equivalent pointers to set.
Step 5: Neighbor List Setup
The crown jewel of the KIM-API design! Let us understand the challenge first.
Each simulator comes with a high-performance neighbor list calculator. They need for different tasks like domain decomposition, fixes, constraints etc. But the model driver/model also needs the neighbor lists for computing pairwise, or angle terms etc. One solution for this could be to bundle KIM-API with its own neighbor lists (as KIMPY does for ASE), but that would mean computing the neighbor lists twice for every call. It would be really time consuming and wasteful. So KIM-API uses very clever way to reuse the simulator neighbor lists.
On the simulator side, first you need to ask for the cutoff radius or influence distance, you can get it with the following function
Getting Neighbor List Requirements
void KIM_Model_GetNeighborListPointers(
void* model,
int* number_of_neighbor_lists,
double** cutoffs,
int** will_not_request_neighbors_of_ghost_particles
);
As you can see you will get three values out of this call
number_of_neighbor_lists
, some model use multiple lists for computation, e.g. EAMcutoffs
for each neighbor lists- will the non contributing particles will also need neighbor list (e.g. used in staged graph convolutions)
Using these values you can ask your simulator to compute the neighbor lists however it seems fit. Now information form your simulator is passed to KIM-API via a callback function. You will register your callback function to KIM-API as,
Setting the Neighbor Callback
int KIM_ComputeArguments_SetCallbackPointer(
void* compute_arguments,
int callback_name, // KIM_COMPUTE_CALLBACK_NAME_GetNeighborList
int language_name, // KIM_LANGUAGE_NAME_c
void* func_ptr, // Your callback function, here: get_neighbors_callback
void* data_object // Data to pass to callback
);
And now the callback function itself. The callback function should always the the following signature (it is a must). For C++ classes, you need to declare it as static
(those pesky this
).
data_object
: This is the key to making static callbacks work with object-oriented code. It's an opaque pointer that KIM-API passes back to you unchanged. Common uses:
- C++: Pointer to your simulator class instance
- C: Pointer to a struct containing neighbor list data
- Python/Julia: Pointer to a wrapper object containing closures
- Fortran: Pointer to a derived type
data_object
basically acts like a Trojan-horse of simulator context, hidden into the KIM-API. It passes the neighbor-whatever object from your simulator to the KIM-API. Now whenever KIM-API need the neighbor list it will pass this object back to the callback function ( here we have named it get_neighbors_callback
), and this function can use the data_object
as the simulator "context". It can cast it back, ask it for the neighbor list, and pass that information back to the KIM-API model. No need of extra computation, and it works for all simulators/ languages, and what not.
Lets have a closer look at the neighbor function,
Neighbor Callback Function Signature
int get_neighbors_callback(
void* data_object, // Your context data
int number_of_neighbor_lists, // Total number of lists (for validation)
double* cutoffs, // Array of cutoff distances
int neighbor_list_index, // Which neighbor list
int particle_index, // Which particle (simulator's indexing)
int* number_of_neighbors, // OUTPUT: How many neighbors found
int** neighbors_ptr // OUTPUT: Pointer to neighbor indices array
);
Breaking down each parameter:
data_object
: The context object making a round trip back!
neighbor_list_index
: Models can request multiple neighbor lists with different cutoffs. For example, a Tersoff potential might need:
- List 0: First neighbor shell (2.7 Å)
- List 1: Second neighbor shell (3.2 Å)
Return expectations:
- Return 0 for success, non-zero for error
- The neighbor array must remain valid until the next callback or compute completes
- Indices should match your simulator's numbering (KIM handles conversions)
Implementation Examples
Here is a very dummy examples to hammer the point home of how neighbor list callback works with all the different kind of neighbor lists and simulator strategies
Strategy 1: Pre-computed Lists (Simple but Memory Intensive, e.g. LAMMPS)
typedef struct {
int** all_neighbors; // neighbors[particle][neighbor_idx]
int* num_neighbors; // count for each particle
double cutoff;
} PrecomputedList;
typedef struct {
PrecomputedList* lists; // Array of lists for different cutoffs
int num_lists;
} SimulatorData;
int get_neighbors_precomputed(void* data_object, ..., int neighbor_list_index,
int particle_index, int* number_of_neighbors,
int** neighbors_ptr) {
SimulatorData* sim = (SimulatorData*)data_object;
PrecomputedList* list = &sim->lists[neighbor_list_index];
*number_of_neighbors = list->num_neighbors[particle_index];
*neighbors_ptr = list->all_neighbors[particle_index];
return 0;
}
Strategy 2: On-Demand Computation (Memory Efficient, e.g. ASE, KIMPY)
typedef struct {
double* positions; // Particle positions
int num_particles;
double* box; // Simulation box
int* temp_neighbors; // Reusable buffer
CellList* cell_list; // Spatial data structure
} SimulatorData;
int get_neighbors_on_demand(void* data_object, ..., int neighbor_list_index,
int particle_index, int* number_of_neighbors,
int** neighbors_ptr) {
SimulatorData* sim = (SimulatorData*)data_object;
double cutoff = cutoffs[neighbor_list_index];
// Use cell list to find neighbors efficiently
int count = 0;
double* pos_i = &sim->positions[3 * particle_index];
// Iterate through nearby cells
CellIterator iter = cell_list_get_iterator(sim->cell_list, pos_i, cutoff);
while (cell_iterator_has_next(&iter)) {
int j = cell_iterator_next(&iter);
double* pos_j = &sim->positions[3 * j];
double dist_sq = distance_squared(pos_i, pos_j, sim->box);
if (dist_sq <= cutoff * cutoff) {
sim->temp_neighbors[count++] = j;
}
}
*number_of_neighbors = count;
*neighbors_ptr = sim->temp_neighbors;
return 0;
}
Strategy 3: Closure-Based (Functional Languages, as it is implemented in Julia right now)
For languages with first-class functions, you can use closures to capture the neighbor list logic:
# Julia example
struct NeighborClosure
get_neighbors::Function
storage::Vector{Vector{Int32}} # Persistent storage
end
function create_neighbor_closure(positions, box, pbc)
# Create spatial data structure
cell_list = CellList(positions, box, max_cutoff)
function get_neighbors(particle_idx, cutoff)
neighbors = Int32[]
# Find neighbors using cell_list
for j in nearby_particles(cell_list, particle_idx, cutoff)
if distance(positions[particle_idx], positions[j]) <= cutoff
push!(neighbors, j)
end
end
return neighbors
end
return get_neighbors
end
Critical Implementation Details
Memory Persistence
The Golden Rule: Neighbor data must remain valid after your callback returns!
// WRONG - Array on stack will be invalid after return
int get_neighbors_bad(void* data_object, ...) {
int local_neighbors[1000]; // Stack allocated!
// ... fill local_neighbors ...
*neighbors_ptr = local_neighbors; // BUG: Dangling pointer!
return 0;
}
// CORRECT - Use persistent storage
typedef struct {
int* neighbor_buffer; // Heap allocated
size_t buffer_size;
} SimData;
int get_neighbors_good(void* data_object, ...) {
SimData* sim = (SimData*)data_object;
// ... fill sim->neighbor_buffer ...
*neighbors_ptr = sim->neighbor_buffer; // Safe: buffer persists
return 0;
}
Handling Multiple Neighbor Lists
Models may need different cutoffs for different interaction ranges:
typedef struct {
int* storage_list0;
int* storage_list1;
int* storage_list2;
size_t max_neighbors;
} MultiListData;
int get_neighbors_multi(void* data_object, ..., int neighbor_list_index, ...) {
MultiListData* data = (MultiListData*)data_object;
int* storage;
// Select appropriate storage for this list
switch (neighbor_list_index) {
case 0: storage = data->storage_list0; break;
case 1: storage = data->storage_list1; break;
case 2: storage = data->storage_list2; break;
default: return 1; // Error: invalid index
}
// Fill storage with neighbors for requested cutoff
double cutoff = cutoffs[neighbor_list_index];
// ... neighbor finding logic ...
*neighbors_ptr = storage;
return 0;
}
Index Conversions and Ghost Particles
KIM-API is a language agnostic implementation, so it works with both 0-based and 1-based implementations. To make it easier for you, you need to select the correct numbering when you initialize the model, but still three of the most common frustration of interfacing C/C++ with Julia/Fortran are i) row vs col major ordering, ii) off by one errors!
So you need to be consistent:
// If your simulator uses 1-based indexing internally
int get_neighbors_fortran_style(void* data_object, ...,
int particle_index, ...) {
// KIM might pass 0-based or 1-based depending on model
// Your callback should use YOUR numbering consistently
MySimulator* sim = (MySimulator*)data_object;
// If KIM is configured for 0-based but you use 1-based:
int my_index = particle_index + 1; // Convert if needed
// Get neighbors using your indexing
int count = sim->get_neighbor_count(my_index);
int* my_neighbors = sim->get_neighbor_list(my_index);
// Return in your numbering - KIM handles conversions
*number_of_neighbors = count;
*neighbors_ptr = my_neighbors;
return 0;
}
Common Pitfalls and Solutions
Pitfall 1: Returning Local Arrays
// Problem: Stack allocation
int neighbors[100];
*neighbors_ptr = neighbors; // Undefined behavior!
// Solution: Use persistent storage in data_object
Pitfall 2: Not Handling Multiple Cutoffs
// Problem: Assuming single cutoff
double global_cutoff = 5.0; // Ignoring cutoffs array!
// Solution: Use the provided cutoff
double cutoff = cutoffs[neighbor_list_index];
Phew! that was long, now for the remainder of easy stuff.
Step 6: Species Mapping
KIM-API uses integer codes for species. You need to map your element symbols to KIM codes:
C Function: KIM_Model_GetSpeciesSupportAndCode
int KIM_Model_GetSpeciesSupportAndCode(
void* model,
const char* species_name, // e.g., "Si", "Fe", "O"
int* species_code
);
Convert your species array to integer codes before computation.
Step 7: Computing Properties
With everything set up, let it compute the properties,
C Function: KIM_Model_Compute
int KIM_Model_Compute(
void* model,
void* compute_arguments
);
After this call, your output arrays (energy, forces) will be populated with computed values.
Step 8: Cleanup
Don't forget to clean up allocated resources:
int KIM_Model_ComputeArgumentsDestroy(
void* model,
void** compute_arguments
);
void KIM_Model_Destroy(
void** model
);
Complete Workflow Example
To revise:
1. Create model with desired units
2. Create compute arguments
3. Check what properties the model supports
4. Generate neighbor lists based on model's cutoff
5. Map species names to KIM codes
6. Set all required pointers:
- Number of particles
- Species codes
- Positions
- Contributing flags
- Output arrays (energy, forces)
7. Set neighbor list callback
8. Call compute
9. Read results from output arrays
10. Clean up resources
Minor points to remember
For periodic systems:
- Create ghost atoms for particles near boundaries
- Include ghosts in position and species arrays
- Set contributing flag: 1 for real atoms, 0 for ghosts
- Ensure neighbor lists include appropriate ghost atoms
Always check return codes:
- 0 indicates success
- Non-zero indicates an error
- Use KIM's logging functions for debugging
A Note on the Julia Implementation
For those interested in seeing these concepts in action, I've developed kim_api.jl, a Julia package that wraps KIM-API functionality. Julia's excellent C interoperability through ccall
makes it particularly straightforward to interface with KIM-API's C functions. The package demonstrates:
- Clean abstraction over C pointers and memory management
- Type-safe species mapping
- Automatic neighbor list generation
- High-level interface that hides complexity while maintaining performance
The Julia implementation serves as both a practical tool and a reference for understanding KIM-API integration. Its readable syntax and direct mapping to C functions make it an excellent learning resource for developers working in any language.
Hope this was fun read! By following the patterns outlined in this tutorial and adapting them to your specific language and architecture, you can add robust KIM-API support to any molecular simulation code.
Resources
- KIM-API Documentation
- OpenKIM Model Repository
- Example Implementations
- kim_api.jl Julia Package - Reference implementation demonstrating the concepts in this tutorial (still under development)
- KIMPY