By Amit Gupta — Aug 25, 2025

Through the looking glass: Implementing KIM-API for Your Molecular Dynamics Simulator

In for penny in for pound! After giving a bare-bones example on how to create a KIM-API potable model using C++, I thought why not go deeper and have a look at how KIM-API works on the simulator side. It might help clarifying several key design decision and ideas. Goal here is simple, initialize and call a KIM-API portable model for inference, and show the key steps involved.

I will use the C API, instead of C++ API for simulator side calls as, i) it is more universal, C functions can be used in almost every language cross-call, where C++ name mangling complicates it, ii) it is nearly the same in both languages, as you will see you can very easily guess the C++ API calls from the C names. Also here we will use a dummy simulator ( a simple C/C++ main function!) but the core idea still holds.

The Knowledgebase of Interatomic Models (KIM-API) is a powerful framework that provides a standardized interface between molecular simulation codes and interatomic potentials. If you're developing a molecular dynamics simulator or computational materials science tool, integrating KIM-API gives you access to hundreds of validated interatomic models without having to implement each one individually.

KIM-API provides the API in C, C++ and Fortran natively. It also provides a Python package KIMPY for interfacing with Python based simulators (mostly used in ASE calculators). I am currently trying to port it Julia as well for providing interface with Julia based MD simulators like Molly.jl (more on it later).

Core Concepts

Before diving into implementation, let's understand the key components:

Model: The interatomic potential implementation (e.g., Stillinger-Weber, EAM, MEAM)
ComputeArguments: Container for input/output data (positions, forces, energy)
Neighbor Lists: Efficient data structures for keeping tabs on particle interactions
Units System: Consistent unit handling across models and simulators

Lets walk through the steps on how to create the models and get them to spit energy and forces. Basic flow involves 3 major steps:

Initialize the model (calls the model_driver_create function we implemented in our portable model driver).
Creating the ComputeArguments, basically setting pointers to positions, species, etc, and memory locations to save the energy/forces or other computed properties like virials. These pointers are typically provided by the simulator.
Neighbor list set-up. I believe this is the most tricky and crucial step. As remaineder of the compute arguments are just read/write operations, neighbor lists are interactively used by the model driver (remember those GetNeighborList calls?). Therefore it is crucial to abstract away any arbitrary arguments in the neighbor list calls such that we can have a uniform API across all simulators, and languages. We will discuss it in more detail later.

Step 1: Model Initialization

The first step is creating and initializing a KIM model. This involves specifying the units system and model name. Supported units of Length, Energy, Charge, Temperature, and Time are strongly typed and enumerated in KIM-API to ensure error-proof model inference. Most model works with all units enumerated, by internal unit conversion mechanism (mostly by transforming the parameters); some models, like the ML ones, might raise an error when asked to initialize in units different from the ones they were trained in. This is because there is no cheap or easy way to transform the model parameters in ML models post training. You can check all the supported units in linked KIM-API docs pages.

C Function Name: `KIM_Model_Create` call

int KIM_Model_Create(
    int numbering,           // 0 for zero-based, 1 for one-based indexing
    int length_unit,         // e.g., KIM_LENGTH_UNIT_A (Angstrom)
    int energy_unit,         // e.g., KIM_ENERGY_UNIT_eV
    int charge_unit,         // e.g., KIM_CHARGE_UNIT_e
    int temperature_unit,    // e.g., KIM_TEMPERATURE_UNIT_K
    int time_unit,          // e.g., KIM_TIME_UNIT_ps
    const char* model_name,  // e.g., "SW_StillingerWeber_1985_Si__MO_405512056662_006"
    int* units_accepted,     // Output: 1 if units accepted, 0 otherwise
    void** model_ptr        // Output: pointer to created model
);

This function creates a model instance with your specified units. The model will either accept your units or request a conversion. Always check units_accepted to ensure compatibility.

Step 2: Creating Compute Arguments

Once the model is initialized, create a compute arguments object to hold calculation data. Initializing it is simple:

C Function Name: `KIM_Model_ComputeArgumentsCreate`

int KIM_Model_ComputeArgumentsCreate(
    void* model,
    void** compute_arguments
);

This creates a container that will hold all the data needed for calculations - particle positions, species, and output quantities like energy and forces. Before assigning those pointers, we need to check what all arguments are supported. For example, some models might be able to compute total energy, but might not compute per particle energy decomposition. Hence they will return notSupported enum value for the Support Status.

Step 3: Checking Model Capabilities

Before setting up calculations, verify what the model supports:

C Function: `KIM_ComputeArguments_GetArgumentSupportStatus`

void KIM_ComputeArguments_GetArgumentSupportStatus(
    void* compute_arguments,
    int argument_name,    // e.g., KIM_COMPUTE_ARGUMENT_NAME_partialEnergy
    int* support_status   // Output: KIM_SUPPORT_STATUS_required/optional/notSupported
);

Common argument names include:

KIM_COMPUTE_ARGUMENT_NAME_partialEnergy - Total energy
KIM_COMPUTE_ARGUMENT_NAME_partialForces - Forces on particles
KIM_COMPUTE_ARGUMENT_NAME_partialParticleEnergy - Per-particle energies
KIM_COMPUTE_ARGUMENT_NAME_partialVirial - Virial stress tensor

Check here for the complete list.

Step 4: Setting Data Pointers

How does your model know what is asked to be computed? Simple, it checks for valid pointers in the ComputeArgument containers, and presume that for any argument, if there exist a valid pointer, that property has to be computed. Once ensuring that the model indeed supports the property you want the model to compute, you set the pointers.

C Functions for Setting Pointers

// For integer data (particle count, species codes, contributing flags)
int KIM_ComputeArguments_SetArgumentPointerInteger(
    void* compute_arguments,
    int argument_name,
    int* ptr
);

// For floating-point data (coordinates, forces, energy)
int KIM_ComputeArguments_SetArgumentPointerDouble(
    void* compute_arguments,
    int argument_name,
    double* ptr
);

Essential pointers to set:

numberOfParticles - Total particle count (including ghost atoms for PBC)
particleSpeciesCodes - Integer codes for each particle's element
coordinates - 3N array of particle positions
particleContributing - 1 for real atoms, 0 for ghost atoms
partialEnergy - Pointer to store computed energy
partialForces - 3N array to store computed forces

If you are familiar with the LAMMPS Pair style implementation, some of these pointers (for LAMMPS) are

KIM Argument	LAMMPS Ptr
`coordinates`	`atom->x`
`partialEnergy`	`eng_vdwl`
`partialForces`	`atom->f`

Similarly, for other simulators you need to find the equivalent pointers to set.

Step 5: Neighbor List Setup

The crown jewel of the KIM-API design! Let us understand the challenge first.
Each simulator comes with a high-performance neighbor list calculator. They need for different tasks like domain decomposition, fixes, constraints etc. But the model driver/model also needs the neighbor lists for computing pairwise, or angle terms etc. One solution for this could be to bundle KIM-API with its own neighbor lists (as KIMPY does for ASE), but that would mean computing the neighbor lists twice for every call. It would be really time consuming and wasteful. So KIM-API uses very clever way to reuse the simulator neighbor lists.

On the simulator side, first you need to ask for the cutoff radius or influence distance, you can get it with the following function

Getting Neighbor List Requirements

void KIM_Model_GetNeighborListPointers(
    void* model,
    int* number_of_neighbor_lists,
    double** cutoffs,
    int** will_not_request_neighbors_of_ghost_particles
);

As you can see you will get three values out of this call

number_of_neighbor_lists, some model use multiple lists for computation, e.g. EAM
cutoffs for each neighbor lists
will the non contributing particles will also need neighbor list (e.g. used in staged graph convolutions)

Using these values you can ask your simulator to compute the neighbor lists however it seems fit. Now information form your simulator is passed to KIM-API via a callback function. You will register your callback function to KIM-API as,

Setting the Neighbor Callback

int KIM_ComputeArguments_SetCallbackPointer(
    void* compute_arguments,
    int callback_name,        // KIM_COMPUTE_CALLBACK_NAME_GetNeighborList
    int language_name,        // KIM_LANGUAGE_NAME_c
    void* func_ptr,          // Your callback function, here: get_neighbors_callback
    void* data_object        // Data to pass to callback
);

And now the callback function itself. The callback function should always the the following signature (it is a must). For C++ classes, you need to declare it as static (those pesky this).

data_object: This is the key to making static callbacks work with object-oriented code. It's an opaque pointer that KIM-API passes back to you unchanged. Common uses:

C++: Pointer to your simulator class instance
C: Pointer to a struct containing neighbor list data
Python/Julia: Pointer to a wrapper object containing closures
Fortran: Pointer to a derived type

data_object basically acts like a Trojan-horse of simulator context, hidden into the KIM-API. It passes the neighbor-whatever object from your simulator to the KIM-API. Now whenever KIM-API need the neighbor list it will pass this object back to the callback function ( here we have named it get_neighbors_callback), and this function can use the data_object as the simulator "context". It can cast it back, ask it for the neighbor list, and pass that information back to the KIM-API model. No need of extra computation, and it works for all simulators/ languages, and what not.

Lets have a closer look at the neighbor function,

Neighbor Callback Function Signature

int get_neighbors_callback(
    void* data_object,           // Your context data
    int number_of_neighbor_lists, // Total number of lists (for validation)
    double* cutoffs,             // Array of cutoff distances
    int neighbor_list_index,     // Which neighbor list
    int particle_index,          // Which particle (simulator's indexing)
    int* number_of_neighbors,    // OUTPUT: How many neighbors found
    int** neighbors_ptr          // OUTPUT: Pointer to neighbor indices array
);

Breaking down each parameter:

data_object: The context object making a round trip back!

neighbor_list_index: Models can request multiple neighbor lists with different cutoffs. For example, a Tersoff potential might need:

List 0: First neighbor shell (2.7 Å)
List 1: Second neighbor shell (3.2 Å)

Return expectations:

Return 0 for success, non-zero for error
The neighbor array must remain valid until the next callback or compute completes
Indices should match your simulator's numbering (KIM handles conversions)

Implementation Examples

Here is a very dummy examples to hammer the point home of how neighbor list callback works with all the different kind of neighbor lists and simulator strategies

Strategy 1: Pre-computed Lists (Simple but Memory Intensive, e.g. LAMMPS)

typedef struct {
    int** all_neighbors;      // neighbors[particle][neighbor_idx]
    int* num_neighbors;       // count for each particle
    double cutoff;
} PrecomputedList;

typedef struct {
    PrecomputedList* lists;   // Array of lists for different cutoffs
    int num_lists;
} SimulatorData;

int get_neighbors_precomputed(void* data_object, ..., int neighbor_list_index,
                             int particle_index, int* number_of_neighbors,
                             int** neighbors_ptr) {
    SimulatorData* sim = (SimulatorData*)data_object;
    PrecomputedList* list = &sim->lists[neighbor_list_index];
    
    *number_of_neighbors = list->num_neighbors[particle_index];
    *neighbors_ptr = list->all_neighbors[particle_index];
    return 0;
}

Strategy 2: On-Demand Computation (Memory Efficient, e.g. ASE, KIMPY)

typedef struct {
    double* positions;        // Particle positions
    int num_particles;
    double* box;             // Simulation box
    int* temp_neighbors;     // Reusable buffer
    CellList* cell_list;     // Spatial data structure
} SimulatorData;

int get_neighbors_on_demand(void* data_object, ..., int neighbor_list_index,
                           int particle_index, int* number_of_neighbors,
                           int** neighbors_ptr) {
    SimulatorData* sim = (SimulatorData*)data_object;
    double cutoff = cutoffs[neighbor_list_index];
    
    // Use cell list to find neighbors efficiently
    int count = 0;
    double* pos_i = &sim->positions[3 * particle_index];
    
    // Iterate through nearby cells
    CellIterator iter = cell_list_get_iterator(sim->cell_list, pos_i, cutoff);
    while (cell_iterator_has_next(&iter)) {
        int j = cell_iterator_next(&iter);
        double* pos_j = &sim->positions[3 * j];
        
        double dist_sq = distance_squared(pos_i, pos_j, sim->box);
        if (dist_sq <= cutoff * cutoff) {
            sim->temp_neighbors[count++] = j;
        }
    }
    
    *number_of_neighbors = count;
    *neighbors_ptr = sim->temp_neighbors;
    return 0;
}

Strategy 3: Closure-Based (Functional Languages, as it is implemented in Julia right now)

For languages with first-class functions, you can use closures to capture the neighbor list logic:

# Julia example
struct NeighborClosure
    get_neighbors::Function
    storage::Vector{Vector{Int32}}  # Persistent storage
end

function create_neighbor_closure(positions, box, pbc)
    # Create spatial data structure
    cell_list = CellList(positions, box, max_cutoff)
    
    function get_neighbors(particle_idx, cutoff)
        neighbors = Int32[]
        # Find neighbors using cell_list
        for j in nearby_particles(cell_list, particle_idx, cutoff)
            if distance(positions[particle_idx], positions[j]) <= cutoff
                push!(neighbors, j)
            end
        end
        return neighbors
    end
    
    return get_neighbors
end

Critical Implementation Details

Memory Persistence

The Golden Rule: Neighbor data must remain valid after your callback returns!

// WRONG - Array on stack will be invalid after return
int get_neighbors_bad(void* data_object, ...) {
    int local_neighbors[1000];  // Stack allocated!
    // ... fill local_neighbors ...
    *neighbors_ptr = local_neighbors;  // BUG: Dangling pointer!
    return 0;
}

// CORRECT - Use persistent storage
typedef struct {
    int* neighbor_buffer;  // Heap allocated
    size_t buffer_size;
} SimData;

int get_neighbors_good(void* data_object, ...) {
    SimData* sim = (SimData*)data_object;
    // ... fill sim->neighbor_buffer ...
    *neighbors_ptr = sim->neighbor_buffer;  // Safe: buffer persists
    return 0;
}

Handling Multiple Neighbor Lists

Models may need different cutoffs for different interaction ranges:

typedef struct {
    int* storage_list0;
    int* storage_list1;
    int* storage_list2;
    size_t max_neighbors;
} MultiListData;

int get_neighbors_multi(void* data_object, ..., int neighbor_list_index, ...) {
    MultiListData* data = (MultiListData*)data_object;
    int* storage;
    
    // Select appropriate storage for this list
    switch (neighbor_list_index) {
        case 0: storage = data->storage_list0; break;
        case 1: storage = data->storage_list1; break;
        case 2: storage = data->storage_list2; break;
        default: return 1;  // Error: invalid index
    }
    
    // Fill storage with neighbors for requested cutoff
    double cutoff = cutoffs[neighbor_list_index];
    // ... neighbor finding logic ...
    
    *neighbors_ptr = storage;
    return 0;
}

Index Conversions and Ghost Particles

KIM-API is a language agnostic implementation, so it works with both 0-based and 1-based implementations. To make it easier for you, you need to select the correct numbering when you initialize the model, but still three of the most common frustration of interfacing C/C++ with Julia/Fortran are i) row vs col major ordering, ii) off by one errors!

So you need to be consistent:

// If your simulator uses 1-based indexing internally
int get_neighbors_fortran_style(void* data_object, ..., 
                               int particle_index, ...) {
    // KIM might pass 0-based or 1-based depending on model
    // Your callback should use YOUR numbering consistently
    
    MySimulator* sim = (MySimulator*)data_object;
    // If KIM is configured for 0-based but you use 1-based:
    int my_index = particle_index + 1;  // Convert if needed
    
    // Get neighbors using your indexing
    int count = sim->get_neighbor_count(my_index);
    int* my_neighbors = sim->get_neighbor_list(my_index);
    
    // Return in your numbering - KIM handles conversions
    *number_of_neighbors = count;
    *neighbors_ptr = my_neighbors;
    return 0;
}

Common Pitfalls and Solutions

Pitfall 1: Returning Local Arrays

// Problem: Stack allocation
int neighbors[100];
*neighbors_ptr = neighbors;  // Undefined behavior!

// Solution: Use persistent storage in data_object

Pitfall 2: Not Handling Multiple Cutoffs

// Problem: Assuming single cutoff
double global_cutoff = 5.0;  // Ignoring cutoffs array!

// Solution: Use the provided cutoff
double cutoff = cutoffs[neighbor_list_index];

Phew! that was long, now for the remainder of easy stuff.

Step 6: Species Mapping

KIM-API uses integer codes for species. You need to map your element symbols to KIM codes:

C Function: `KIM_Model_GetSpeciesSupportAndCode`

int KIM_Model_GetSpeciesSupportAndCode(
    void* model,
    const char* species_name,  // e.g., "Si", "Fe", "O"
    int* species_code
);

Convert your species array to integer codes before computation.

Step 7: Computing Properties

With everything set up, let it compute the properties,

C Function: `KIM_Model_Compute`

int KIM_Model_Compute(
    void* model,
    void* compute_arguments
);

After this call, your output arrays (energy, forces) will be populated with computed values.

Step 8: Cleanup

Don't forget to clean up allocated resources:

int KIM_Model_ComputeArgumentsDestroy(
    void* model,
    void** compute_arguments
);

void KIM_Model_Destroy(
    void** model
);

Complete Workflow Example

To revise:

1. Create model with desired units
2. Create compute arguments
3. Check what properties the model supports
4. Generate neighbor lists based on model's cutoff
5. Map species names to KIM codes
6. Set all required pointers:
   - Number of particles
   - Species codes
   - Positions
   - Contributing flags
   - Output arrays (energy, forces)
7. Set neighbor list callback
8. Call compute
9. Read results from output arrays
10. Clean up resources

Minor points to remember

For periodic systems:

Create ghost atoms for particles near boundaries
Include ghosts in position and species arrays
Set contributing flag: 1 for real atoms, 0 for ghosts
Ensure neighbor lists include appropriate ghost atoms

Always check return codes:

0 indicates success
Non-zero indicates an error
Use KIM's logging functions for debugging

A Note on the Julia Implementation

For those interested in seeing these concepts in action, I've developed kim_api.jl, a Julia package that wraps KIM-API functionality. Julia's excellent C interoperability through ccall makes it particularly straightforward to interface with KIM-API's C functions. The package demonstrates:

Clean abstraction over C pointers and memory management
Type-safe species mapping
Automatic neighbor list generation
High-level interface that hides complexity while maintaining performance

The Julia implementation serves as both a practical tool and a reference for understanding KIM-API integration. Its readable syntax and direct mapping to C functions make it an excellent learning resource for developers working in any language.

Hope this was fun read! By following the patterns outlined in this tutorial and adapting them to your specific language and architecture, you can add robust KIM-API support to any molecular simulation code.

Resources

KIM-API Documentation
OpenKIM Model Repository
Example Implementations
kim_api.jl Julia Package - Reference implementation demonstrating the concepts in this tutorial (still under development)
KIMPY