Poorly Written Notes

Through the looking glass: Implementing KIM-API for Your Molecular Dynamics Simulator

Amit Gupta — Mon, 25 Aug 2025 14:38:13 GMT

In for penny in for pound! After giving a bare-bones example on how to create a KIM-API potable model using C++, I thought why not go deeper and have a look at how KIM-API works on the simulator side. It might help clarifying several key design decision and ideas. Goal here is simple, initialize and call a KIM-API portable model for inference, and show the key steps involved.

I will use the C API, instead of C++ API for simulator side calls as, i) it is more universal, C functions can be used in almost every language cross-call, where C++ name mangling complicates it, ii) it is nearly the same in both languages, as you will see you can very easily guess the C++ API calls from the C names. Also here we will use a dummy simulator ( a simple C/C++ main function!) but the core idea still holds.

The Knowledgebase of Interatomic Models (KIM-API) is a powerful framework that provides a standardized interface between molecular simulation codes and interatomic potentials. If you're developing a molecular dynamics simulator or computational materials science tool, integrating KIM-API gives you access to hundreds of validated interatomic models without having to implement each one individually.

KIM-API provides the API in C, C++ and Fortran natively. It also provides a Python package KIMPY for interfacing with Python based simulators (mostly used in ASE calculators). I am currently trying to port it Julia as well for providing interface with Julia based MD simulators like Molly.jl (more on it later).

Core Concepts

Before diving into implementation, let's understand the key components:

Model: The interatomic potential implementation (e.g., Stillinger-Weber, EAM, MEAM)
ComputeArguments: Container for input/output data (positions, forces, energy)
Neighbor Lists: Efficient data structures for keeping tabs on particle interactions
Units System: Consistent unit handling across models and simulators

Lets walk through the steps on how to create the models and get them to spit energy and forces. Basic flow involves 3 major steps:

Initialize the model (calls the model_driver_create function we implemented in our portable model driver).
Creating the ComputeArguments, basically setting pointers to positions, species, etc, and memory locations to save the energy/forces or other computed properties like virials. These pointers are typically provided by the simulator.
Neighbor list set-up. I believe this is the most tricky and crucial step. As remaineder of the compute arguments are just read/write operations, neighbor lists are interactively used by the model driver (remember those GetNeighborList calls?). Therefore it is crucial to abstract away any arbitrary arguments in the neighbor list calls such that we can have a uniform API across all simulators, and languages. We will discuss it in more detail later.

Step 1: Model Initialization

The first step is creating and initializing a KIM model. This involves specifying the units system and model name. Supported units of Length, Energy, Charge, Temperature, and Time are strongly typed and enumerated in KIM-API to ensure error-proof model inference. Most model works with all units enumerated, by internal unit conversion mechanism (mostly by transforming the parameters); some models, like the ML ones, might raise an error when asked to initialize in units different from the ones they were trained in. This is because there is no cheap or easy way to transform the model parameters in ML models post training. You can check all the supported units in linked KIM-API docs pages.

C Function Name: `KIM_Model_Create` call

int KIM_Model_Create(
    int numbering,           // 0 for zero-based, 1 for one-based indexing
    int length_unit,         // e.g., KIM_LENGTH_UNIT_A (Angstrom)
    int energy_unit,         // e.g., KIM_ENERGY_UNIT_eV
    int charge_unit,         // e.g., KIM_CHARGE_UNIT_e
    int temperature_unit,    // e.g., KIM_TEMPERATURE_UNIT_K
    int time_unit,          // e.g., KIM_TIME_UNIT_ps
    const char* model_name,  // e.g., "SW_StillingerWeber_1985_Si__MO_405512056662_006"
    int* units_accepted,     // Output: 1 if units accepted, 0 otherwise
    void** model_ptr        // Output: pointer to created model
);

This function creates a model instance with your specified units. The model will either accept your units or request a conversion. Always check units_accepted to ensure compatibility.

Step 2: Creating Compute Arguments

Once the model is initialized, create a compute arguments object to hold calculation data. Initializing it is simple:

C Function Name: `KIM_Model_ComputeArgumentsCreate`

int KIM_Model_ComputeArgumentsCreate(
    void* model,
    void** compute_arguments
);

This creates a container that will hold all the data needed for calculations - particle positions, species, and output quantities like energy and forces. Before assigning those pointers, we need to check what all arguments are supported. For example, some models might be able to compute total energy, but might not compute per particle energy decomposition. Hence they will return notSupported enum value for the Support Status.

Step 3: Checking Model Capabilities

Before setting up calculations, verify what the model supports:

C Function: `KIM_ComputeArguments_GetArgumentSupportStatus`

void KIM_ComputeArguments_GetArgumentSupportStatus(
    void* compute_arguments,
    int argument_name,    // e.g., KIM_COMPUTE_ARGUMENT_NAME_partialEnergy
    int* support_status   // Output: KIM_SUPPORT_STATUS_required/optional/notSupported
);

Common argument names include:

KIM_COMPUTE_ARGUMENT_NAME_partialEnergy - Total energy
KIM_COMPUTE_ARGUMENT_NAME_partialForces - Forces on particles
KIM_COMPUTE_ARGUMENT_NAME_partialParticleEnergy - Per-particle energies
KIM_COMPUTE_ARGUMENT_NAME_partialVirial - Virial stress tensor

Check here for the complete list.

Step 4: Setting Data Pointers

How does your model know what is asked to be computed? Simple, it checks for valid pointers in the ComputeArgument containers, and presume that for any argument, if there exist a valid pointer, that property has to be computed. Once ensuring that the model indeed supports the property you want the model to compute, you set the pointers.

C Functions for Setting Pointers

// For integer data (particle count, species codes, contributing flags)
int KIM_ComputeArguments_SetArgumentPointerInteger(
    void* compute_arguments,
    int argument_name,
    int* ptr
);

// For floating-point data (coordinates, forces, energy)
int KIM_ComputeArguments_SetArgumentPointerDouble(
    void* compute_arguments,
    int argument_name,
    double* ptr
);

Essential pointers to set:

numberOfParticles - Total particle count (including ghost atoms for PBC)
particleSpeciesCodes - Integer codes for each particle's element
coordinates - 3N array of particle positions
particleContributing - 1 for real atoms, 0 for ghost atoms
partialEnergy - Pointer to store computed energy
partialForces - 3N array to store computed forces

If you are familiar with the LAMMPS Pair style implementation, some of these pointers (for LAMMPS) are

KIM Argument	LAMMPS Ptr
`coordinates`	`atom->x`
`partialEnergy`	`eng_vdwl`
`partialForces`	`atom->f`

Similarly, for other simulators you need to find the equivalent pointers to set.

Step 5: Neighbor List Setup

The crown jewel of the KIM-API design! Let us understand the challenge first.
Each simulator comes with a high-performance neighbor list calculator. They need for different tasks like domain decomposition, fixes, constraints etc. But the model driver/model also needs the neighbor lists for computing pairwise, or angle terms etc. One solution for this could be to bundle KIM-API with its own neighbor lists (as KIMPY does for ASE), but that would mean computing the neighbor lists twice for every call. It would be really time consuming and wasteful. So KIM-API uses very clever way to reuse the simulator neighbor lists.

On the simulator side, first you need to ask for the cutoff radius or influence distance, you can get it with the following function

Getting Neighbor List Requirements

void KIM_Model_GetNeighborListPointers(
    void* model,
    int* number_of_neighbor_lists,
    double** cutoffs,
    int** will_not_request_neighbors_of_ghost_particles
);

As you can see you will get three values out of this call

number_of_neighbor_lists, some model use multiple lists for computation, e.g. EAM
cutoffs for each neighbor lists
will the non contributing particles will also need neighbor list (e.g. used in staged graph convolutions)

Using these values you can ask your simulator to compute the neighbor lists however it seems fit. Now information form your simulator is passed to KIM-API via a callback function. You will register your callback function to KIM-API as,

Setting the Neighbor Callback

int KIM_ComputeArguments_SetCallbackPointer(
    void* compute_arguments,
    int callback_name,        // KIM_COMPUTE_CALLBACK_NAME_GetNeighborList
    int language_name,        // KIM_LANGUAGE_NAME_c
    void* func_ptr,          // Your callback function, here: get_neighbors_callback
    void* data_object        // Data to pass to callback
);

And now the callback function itself. The callback function should always the the following signature (it is a must). For C++ classes, you need to declare it as static (those pesky this).

data_object: This is the key to making static callbacks work with object-oriented code. It's an opaque pointer that KIM-API passes back to you unchanged. Common uses:

C++: Pointer to your simulator class instance
C: Pointer to a struct containing neighbor list data
Python/Julia: Pointer to a wrapper object containing closures
Fortran: Pointer to a derived type

data_object basically acts like a Trojan-horse of simulator context, hidden into the KIM-API. It passes the neighbor-whatever object from your simulator to the KIM-API. Now whenever KIM-API need the neighbor list it will pass this object back to the callback function ( here we have named it get_neighbors_callback), and this function can use the data_object as the simulator "context". It can cast it back, ask it for the neighbor list, and pass that information back to the KIM-API model. No need of extra computation, and it works for all simulators/ languages, and what not.

Lets have a closer look at the neighbor function,

Neighbor Callback Function Signature

int get_neighbors_callback(
    void* data_object,           // Your context data
    int number_of_neighbor_lists, // Total number of lists (for validation)
    double* cutoffs,             // Array of cutoff distances
    int neighbor_list_index,     // Which neighbor list
    int particle_index,          // Which particle (simulator's indexing)
    int* number_of_neighbors,    // OUTPUT: How many neighbors found
    int** neighbors_ptr          // OUTPUT: Pointer to neighbor indices array
);

Breaking down each parameter:

data_object: The context object making a round trip back!

neighbor_list_index: Models can request multiple neighbor lists with different cutoffs. For example, a Tersoff potential might need:

List 0: First neighbor shell (2.7 Å)
List 1: Second neighbor shell (3.2 Å)

Return expectations:

Return 0 for success, non-zero for error
The neighbor array must remain valid until the next callback or compute completes
Indices should match your simulator's numbering (KIM handles conversions)

Implementation Examples

Here is a very dummy examples to hammer the point home of how neighbor list callback works with all the different kind of neighbor lists and simulator strategies

Strategy 1: Pre-computed Lists (Simple but Memory Intensive, e.g. LAMMPS)

typedef struct {
    int** all_neighbors;      // neighbors[particle][neighbor_idx]
    int* num_neighbors;       // count for each particle
    double cutoff;
} PrecomputedList;

typedef struct {
    PrecomputedList* lists;   // Array of lists for different cutoffs
    int num_lists;
} SimulatorData;

int get_neighbors_precomputed(void* data_object, ..., int neighbor_list_index,
                             int particle_index, int* number_of_neighbors,
                             int** neighbors_ptr) {
    SimulatorData* sim = (SimulatorData*)data_object;
    PrecomputedList* list = &sim->lists[neighbor_list_index];
    
    *number_of_neighbors = list->num_neighbors[particle_index];
    *neighbors_ptr = list->all_neighbors[particle_index];
    return 0;
}

Strategy 2: On-Demand Computation (Memory Efficient, e.g. ASE, KIMPY)

typedef struct {
    double* positions;        // Particle positions
    int num_particles;
    double* box;             // Simulation box
    int* temp_neighbors;     // Reusable buffer
    CellList* cell_list;     // Spatial data structure
} SimulatorData;

int get_neighbors_on_demand(void* data_object, ..., int neighbor_list_index,
                           int particle_index, int* number_of_neighbors,
                           int** neighbors_ptr) {
    SimulatorData* sim = (SimulatorData*)data_object;
    double cutoff = cutoffs[neighbor_list_index];
    
    // Use cell list to find neighbors efficiently
    int count = 0;
    double* pos_i = &sim->positions[3 * particle_index];
    
    // Iterate through nearby cells
    CellIterator iter = cell_list_get_iterator(sim->cell_list, pos_i, cutoff);
    while (cell_iterator_has_next(&iter)) {
        int j = cell_iterator_next(&iter);
        double* pos_j = &sim->positions[3 * j];
        
        double dist_sq = distance_squared(pos_i, pos_j, sim->box);
        if (dist_sq <= cutoff * cutoff) {
            sim->temp_neighbors[count++] = j;
        }
    }
    
    *number_of_neighbors = count;
    *neighbors_ptr = sim->temp_neighbors;
    return 0;
}

Strategy 3: Closure-Based (Functional Languages, as it is implemented in Julia right now)

For languages with first-class functions, you can use closures to capture the neighbor list logic:

# Julia example
struct NeighborClosure
    get_neighbors::Function
    storage::Vector{Vector{Int32}}  # Persistent storage
end

function create_neighbor_closure(positions, box, pbc)
    # Create spatial data structure
    cell_list = CellList(positions, box, max_cutoff)
    
    function get_neighbors(particle_idx, cutoff)
        neighbors = Int32[]
        # Find neighbors using cell_list
        for j in nearby_particles(cell_list, particle_idx, cutoff)
            if distance(positions[particle_idx], positions[j]) <= cutoff
                push!(neighbors, j)
            end
        end
        return neighbors
    end
    
    return get_neighbors
end

Critical Implementation Details

Memory Persistence

The Golden Rule: Neighbor data must remain valid after your callback returns!

// WRONG - Array on stack will be invalid after return
int get_neighbors_bad(void* data_object, ...) {
    int local_neighbors[1000];  // Stack allocated!
    // ... fill local_neighbors ...
    *neighbors_ptr = local_neighbors;  // BUG: Dangling pointer!
    return 0;
}

// CORRECT - Use persistent storage
typedef struct {
    int* neighbor_buffer;  // Heap allocated
    size_t buffer_size;
} SimData;

int get_neighbors_good(void* data_object, ...) {
    SimData* sim = (SimData*)data_object;
    // ... fill sim->neighbor_buffer ...
    *neighbors_ptr = sim->neighbor_buffer;  // Safe: buffer persists
    return 0;
}

Handling Multiple Neighbor Lists

Models may need different cutoffs for different interaction ranges:

typedef struct {
    int* storage_list0;
    int* storage_list1;
    int* storage_list2;
    size_t max_neighbors;
} MultiListData;

int get_neighbors_multi(void* data_object, ..., int neighbor_list_index, ...) {
    MultiListData* data = (MultiListData*)data_object;
    int* storage;
    
    // Select appropriate storage for this list
    switch (neighbor_list_index) {
        case 0: storage = data->storage_list0; break;
        case 1: storage = data->storage_list1; break;
        case 2: storage = data->storage_list2; break;
        default: return 1;  // Error: invalid index
    }
    
    // Fill storage with neighbors for requested cutoff
    double cutoff = cutoffs[neighbor_list_index];
    // ... neighbor finding logic ...
    
    *neighbors_ptr = storage;
    return 0;
}

Index Conversions and Ghost Particles

KIM-API is a language agnostic implementation, so it works with both 0-based and 1-based implementations. To make it easier for you, you need to select the correct numbering when you initialize the model, but still three of the most common frustration of interfacing C/C++ with Julia/Fortran are i) row vs col major ordering, ii) off by one errors!

So you need to be consistent:

// If your simulator uses 1-based indexing internally
int get_neighbors_fortran_style(void* data_object, ..., 
                               int particle_index, ...) {
    // KIM might pass 0-based or 1-based depending on model
    // Your callback should use YOUR numbering consistently
    
    MySimulator* sim = (MySimulator*)data_object;
    // If KIM is configured for 0-based but you use 1-based:
    int my_index = particle_index + 1;  // Convert if needed
    
    // Get neighbors using your indexing
    int count = sim->get_neighbor_count(my_index);
    int* my_neighbors = sim->get_neighbor_list(my_index);
    
    // Return in your numbering - KIM handles conversions
    *number_of_neighbors = count;
    *neighbors_ptr = my_neighbors;
    return 0;
}

Common Pitfalls and Solutions

Pitfall 1: Returning Local Arrays

// Problem: Stack allocation
int neighbors[100];
*neighbors_ptr = neighbors;  // Undefined behavior!

// Solution: Use persistent storage in data_object

Pitfall 2: Not Handling Multiple Cutoffs

// Problem: Assuming single cutoff
double global_cutoff = 5.0;  // Ignoring cutoffs array!

// Solution: Use the provided cutoff
double cutoff = cutoffs[neighbor_list_index];

Phew! that was long, now for the remainder of easy stuff.

Step 6: Species Mapping

KIM-API uses integer codes for species. You need to map your element symbols to KIM codes:

C Function: `KIM_Model_GetSpeciesSupportAndCode`

int KIM_Model_GetSpeciesSupportAndCode(
    void* model,
    const char* species_name,  // e.g., "Si", "Fe", "O"
    int* species_code
);

Convert your species array to integer codes before computation.

Step 7: Computing Properties

With everything set up, let it compute the properties,

C Function: `KIM_Model_Compute`

int KIM_Model_Compute(
    void* model,
    void* compute_arguments
);

After this call, your output arrays (energy, forces) will be populated with computed values.

Step 8: Cleanup

Don't forget to clean up allocated resources:

int KIM_Model_ComputeArgumentsDestroy(
    void* model,
    void** compute_arguments
);

void KIM_Model_Destroy(
    void** model
);

Complete Workflow Example

To revise:

1. Create model with desired units
2. Create compute arguments
3. Check what properties the model supports
4. Generate neighbor lists based on model's cutoff
5. Map species names to KIM codes
6. Set all required pointers:
   - Number of particles
   - Species codes
   - Positions
   - Contributing flags
   - Output arrays (energy, forces)
7. Set neighbor list callback
8. Call compute
9. Read results from output arrays
10. Clean up resources

Minor points to remember

For periodic systems:

Create ghost atoms for particles near boundaries
Include ghosts in position and species arrays
Set contributing flag: 1 for real atoms, 0 for ghosts
Ensure neighbor lists include appropriate ghost atoms

Always check return codes:

0 indicates success
Non-zero indicates an error
Use KIM's logging functions for debugging

A Note on the Julia Implementation

For those interested in seeing these concepts in action, I've developed kim_api.jl, a Julia package that wraps KIM-API functionality. Julia's excellent C interoperability through ccall makes it particularly straightforward to interface with KIM-API's C functions. The package demonstrates:

Clean abstraction over C pointers and memory management
Type-safe species mapping
Automatic neighbor list generation
High-level interface that hides complexity while maintaining performance

The Julia implementation serves as both a practical tool and a reference for understanding KIM-API integration. Its readable syntax and direct mapping to C functions make it an excellent learning resource for developers working in any language.

Hope this was fun read! By following the patterns outlined in this tutorial and adapting them to your specific language and architecture, you can add robust KIM-API support to any molecular simulation code.

Resources

KIM-API Documentation
OpenKIM Model Repository
Example Implementations
kim_api.jl Julia Package - Reference implementation demonstrating the concepts in this tutorial (still under development)
KIMPY

KIM-API Model Drivers from scratch I: using C++

Amit Gupta — Mon, 04 Aug 2025 14:36:11 GMT

Introduction

This is supposed to be a two/three part series on how to write OpenKIM model driver using C++, C and Fortran (Fortran if I ever got to use it!). Majority of codes/slides etc can be accessed here: https://github.com/ipcamit/kim-api-tutorial .

Have you ever wanted to create portable interatomic potentials that work seamlessly across different molecular dynamics simulators? The Knowledgebase of Interatomic Models (KIM) API makes this possible, but diving into model driver development can feel overwhelming, especially if you're not deeply familiar with C++.

This guide takes you from the fundamental C++ concepts you'll encounter in KIM-API code all the way to understanding a complete Lennard-Jones model driver implementation. Think of this as a friendly companion that demystifies the technical aspects while giving you practical insights into how everything fits together.

Why KIM-API?

Before we dive into the code, let's understand what problem KIM-API solves. Traditionally, if you developed an interatomic potential, you'd need to implement it separately for each simulator (LAMMPS, ASE, DL_POLY, etc.). KIM-API provides a standardized interface that allows you to write your potential once and have it work everywhere. The magic happens through a combination of smart API design and some C++ techniques we'll explore together.

What makes KIM-API particularly impressive is its multi-language support. While we're focusing on C++ in this tutorial, the same model can be called from simulators written in C, Fortran, Python, or other languages (The model driver in C post is coming out "soon" !). This language-agnostic design influences many of the technical choices we'll see, from the use of extern "C" linkage to the way functions are registered through pointers.

Prerequisites and Setup

To follow along with this tutorial, you'll need:

A Unix-like environment (preferably Linux)
A C++ compiler (g++)
CMake
Basic familiarity with C/C++ programming
(Recommended) VS Code with remote development extensions

For the easiest setup, you can use the KIM Developer Platform Docker container:

# Pull the Docker image
docker pull ghcr.io/openkim/developer-platform:latest-minimal

# Run the container
docker run -it --name kim_dev -v `pwd`:/home/openkim/tutorial ghcr.io/openkim/developer-platform:latest-minimal bash

# In VS Code, use "Attach to Running Container" to connect
cd tutorial

Essential C++ Concepts for KIM-API

Let me introduce you to five C++ concepts that appear frequently in KIM-API code. Don't worry if these seem complex at first -- we'll break each one down with examples. Each of these concepts exists for a specific reason related to KIM-API's goal of being a truly multi-language framework. If you are comfortable with C++, you can skip.

1. `reinterpret_cast`: The "I Know What I'm Doing" Cast

Casting usually means changing datatype of a variable. Some of the type changes are easier to understand, for example floating points to integers, some changes might not be feasible or disallowed . Normally casting includes set of rules on how to change one object to the desired one. In the float to int example above you might desire that while converting you round off to nearest integer, or simply truncate the floating part (akin to floor function, the usual C++ way). Which one do you want depends on your use case. For complex objects (like C++ classes) this conversion is even more tricky as it involves multiple variables, allocated memory blocks etc.

Think of reinterpret_cast as telling the compiler: "Trust me, I want to look at this memory differently." It's like having a box labeled "Books" but deciding to treat it as "General Storage" -- the contents don't change, just how you interpret them. Normally (static_cast) wont allow you to do it, but reinterpret_cast will ensure that it just starts treating the object as you asked, like an obedient disciple.

In KIM-API, we use reinterpret_cast to convert between specific model objects and generic void* pointers. This allows the API to handle different model types uniformly while maintaining C compatibility. Why does this matter? Because C doesn't have classes or templates, it only understands basic pointers. By converting everything to void*, we create a common ground that all languages can work with.

#include 

struct Data { 
    int a; 
    double b; 
};

int main() {
    Data myData = {10, 3.14};
    
    // Treat the Data object's memory as raw bytes
    char* bytePtr = reinterpret_cast(&myData);
    
    // We can now examine the raw memory
    std::cout << "First few bytes of Data object:\n";
    for (int i = 0; i < sizeof(Data); ++i) {
        std::cout << std::hex << (int)(unsigned char)bytePtr[i] << " ";
    }
    
    return 0;
}

Why it matters for KIM-API: The framework needs to store pointers to your model objects in a generic way that works across languages. When KIM calls your functions, you'll retrieve your model object using reinterpret_cast to convert the generic pointer back to your specific type (Adapter pattern discussed below).

2. Templates: Asking the Compiler to Write Code for You

Templates are like recipe cards where you leave some ingredients blank. When you use the recipe, you fill in the specific ingredients, and the compiler creates the actual code for you. It is an extremely powerful feature of the C++ language, where the template variables you leave are autocompleted by the compiler based on use cases.

#include 

// Function template to find the maximum of two values
template   // T is a placeholder for any type
T maximum(T a, T b) {
    return (a > b) ? a : b;
}

int main() {
    std::cout << "Max(5, 10): " << maximum(5, 10) << std::endl;         // T becomes int
    std::cout << "Max(3.14, 2.71): " << maximum(3.14, 2.71) << std::endl; // T becomes double
    
    return 0;
}

In the above example, you need not provide integer/floating point interpretation of your maximum function. The compiler saw that you first called the maximum function with two floating point numbers (therefore T is float) and it created a float maximum (float a, float b) function for you. In the second call compiler understood that now T is an int so it created a int maximum (int a, int b) function.

What if you had a call like maximum(float, int)?
The compiler will fail to deduce the correct type and you should get the following compile time error:

tmp.cpp:11:56: error: no matching function for call to ‘maximum(double, int)’
   11 |     std::cout << "Max(3.14, 2.71): " << maximum(3.14, 2) << std::endl; // T becomes double
      |                                                        ^
tmp.cpp:5:3: note: candidate: ‘template T maximum(T, T)’
    5 | T maximum(T a, T b) {

Advanced KIM-API Usage - Compute Dispatch: One of the most powerful uses of templates in KIM-API is the "compute dispatch" pattern. Instead of having runtime checks inside your inner loops (which can kill performance), you use templates to generate multiple versions of your compute function at compile time:

// Template parameters control what calculations are performed
template
int ComputeImplementation(/* parameters */) {
    // Loop over particles
    for (int i = 0; i < nParticles; ++i) {
        // Calculate distance, etc.
        
        // These if statements are evaluated at compile time!
        if (doEnergy) {
            *energy += calculateEnergy(r);
        }
        
        if (doForces) {
            calculateForces(r, forces);
        }
        
        if (doStress) {
            calculateStress(r, stress);
        }
    }
}

// In your main Compute function:
int Compute(/* parameters */) {
    // Determine what the simulator requested
    bool hasForces = (forces != NULL);
    bool hasEnergy = (energy != NULL);
    bool hasStress = (stress != NULL);
    
    // Dispatch to the right template instantiation
    if (hasForces && hasEnergy && !hasStress) {
        return ComputeImplementation(/* parameters */);
    } else if (hasForces && !hasEnergy && !hasStress) {
        return ComputeImplementation(/* parameters */);
    }
    // ... handle all 8 combinations
}

The beauty of this approach is that the compiler removes all the if statements inside the loops, creating highly optimized code for each specific case. You write the logic once, but get multiple optimized versions automatically.

3. `extern "C"`: Universal Language Compatibility

C++ "mangles" function names to support features like overloading. For example, process_data(int) might become something like _Z12process_datai internally. The extern "C" directive tells the compiler to use C-style naming, keeping function names unchanged.

This is absolutely critical for KIM-API because it needs to work with Fortran simulators, C-based codes, and other languages. These languages don't understand C++ name mangling -- they expect functions to have simple, predictable names.

// Without extern "C" - C++ will mangle these names
void process_data(int data) {
    volatile int x = data;
}

// With extern "C" - name stays as "process_data_for_c"
extern "C" void process_data_for_c(int data) {
    volatile int z = data;
}

Why it matters for KIM-API: The entry point function model_driver_create must have extern "C" linkage so that KIM can find it regardless of what language the simulator is written in. This is the bridge that makes multi-language support possible.

4. Static: Keep Only One Copy

The static keyword in C++ has different meanings depending on context. For class methods, it means "this function doesn't need an object instance" -- similar to Python's @staticmethod.

Understanding why KIM-API uses static functions requires thinking about function pointers across languages. In C and Fortran, you can only create pointers to regular functions, not to member functions of objects (which have an implicit this parameter, similar to python self). Static member functions don't have a this parameter, making them compatible with C-style function pointers.

class LennardJones {
public:
    // Static member function - no 'this' pointer
    static int Compute(/* parameters */) {
        // Can't access non-static member variables here
        // Must retrieve the object from KIM's storage
        return 0;
    }
    
    // Regular member function - has implicit 'this' pointer
    int RegularCompute(/* parameters */) {
        // Actual signature int RegularCompute(LennardJones* this, /* paramerters */)
        // Can access member variables through 'this'
        // But can't be used as a C-style function pointer!
        return 0;
    }
};

Why it matters for KIM-API: KIM needs to store pointers to your functions in a way that works across all languages. Static member functions can be treated as regular C functions, making them perfect for this purpose.

5. PIMPL Pattern: Hide the Implementation Details

PIMPL (Pointer to Implementation) is like having a public reception desk that handles all requests while the actual work happens in a private office behind the scenes. This pattern separates the interface from the implementation. Lot of C++ KIM-API drivers follow this pattern so it is useful to know it if you want to look for implementation of various drivers.

Important Note: The example we'll walk through is a minimal implementation that does NOT use PIMPL. We're keeping things simple to focus on the core concepts. In production model drivers, you might see PIMPL used to keep the public interface stable while allowing the implementation to change.

KIM-API's Multi-Language Design Philosophy

Before we dive into the code, let's understand how KIM-API's design choices enable multi-language support:

Zero-Based vs One-Based Indexing: Different languages have different conventions. C and C++ use zero-based arrays, while Fortran traditionally uses one-based arrays. KIM-API lets you specify which convention your model uses through SetModelNumbering().
Function Pointers Instead of Virtual Functions: Virtual functions are a C++ feature that doesn't translate to C or Fortran. By using function pointers registered through the API, KIM maintains language neutrality.
Opaque Pointers: The void* pointer approach means that each language only needs to understand basic pointer types, not complex C++ objects.
Explicit Memory Management: While modern C++ might use smart pointers, KIM-API uses explicit new and delete to maintain compatibility with C-style memory management that all languages understand.

Understanding the KIM-API Model Driver Structure

Now that we've covered the C++ basics and understood the multi-language design philosophy, let's see how a KIM-API model driver fits together. A model driver consists of:

Implementation (the driver): The physics and algorithms
Parameters (the model): Specific values like epsilon and sigma for Lennard-Jones
Compute Arguments: The interface with the simulator

The basic flow looks like this:

Simulator (any language) → KIM-API → Your Model Driver → Calculations → Results back to Simulator

Walking Through a Lennard-Jones Implementation

Let's examine a complete, working Lennard-Jones model driver. This is a minimal implementation designed for clarity -- it doesn't use advanced patterns like PIMPL or compute dispatch.

The Header File (MyLJ.hpp)

#ifndef LJ_HPP_ 
#define LJ_HPP_ 

#include "KIM_ModelDriverHeaders.hpp"

// Entry point function with C linkage
extern "C" {
int model_driver_create(KIM::ModelDriverCreate * const modelDriverCreate,
                        KIM::LengthUnit const requestedLengthUnit,
                        KIM::EnergyUnit const requestedEnergyUnit,
                        KIM::ChargeUnit const requestedChargeUnit,
                        KIM::TemperatureUnit const requestedTemperatureUnit,
                        KIM::TimeUnit const requestedTimeUnit);
}

class LennardJones612
{
 public:
  // Constructor and destructor
  LennardJones612(KIM::ModelDriverCreate * const modelDriverCreate,
                  KIM::LengthUnit const requestedLengthUnit,
                  KIM::EnergyUnit const requestedEnergyUnit,
                  KIM::ChargeUnit const requestedChargeUnit,
                  KIM::TemperatureUnit const requestedTemperatureUnit,
                  KIM::TimeUnit const requestedTimeUnit,
                  int * const ier);
  ~LennardJones612();

  // Static member functions for KIM callbacks
  static int Destroy(KIM::ModelDestroy * const modelDestroy);
  static int Refresh(KIM::ModelRefresh * const modelRefresh);
  static int Compute(KIM::ModelCompute const * const modelCompute,
                     KIM::ModelComputeArguments const * const modelComputeArguments);
  static int ComputeArgumentsCreate(
      KIM::ModelCompute const * const modelCompute,
      KIM::ModelComputeArgumentsCreate * const modelComputeArgumentsCreate);
  static int ComputeArgumentsDestroy(
      KIM::ModelCompute const * const modelCompute,
      KIM::ModelComputeArgumentsDestroy * const modelComputeArgumentsDestroy);

  // Model parameters - stored directly in the class (no PIMPL)
  std::string species;
  double cutoff, sigma, epsilon;
};

#endif  // LJ_HPP_

Notice how all the KIM callback functions are declared as static. This is the adapter pattern in action -- these static functions will retrieve the actual model object and forward calls to it. This design allows C and Fortran codes to call these functions through simple function pointers.

The Implementation File (MyLJ.cpp) - Key Sections

Let's walk through the implementation step by step.

1. The Entry Point

extern "C" {
// universal C like function to be called in all languages
int model_driver_create(KIM::ModelDriverCreate * const modelDriverCreate,
                        KIM::LengthUnit const requestedLengthUnit,
                        KIM::EnergyUnit const requestedEnergyUnit,
                        KIM::ChargeUnit const requestedChargeUnit,
                        KIM::TemperatureUnit const requestedTemperatureUnit,
                        KIM::TimeUnit const requestedTimeUnit)
{
    int ier;
    
    // Create our model object
    LennardJones612 * modelObject;
    // allocate a LennardJones driver object using `new`
    modelObject = new LennardJones612(modelDriverCreate,
                                      requestedLengthUnit,
                                      requestedEnergyUnit,
                                      requestedChargeUnit,
                                      requestedTemperatureUnit,
                                      requestedTimeUnit,
                                      &ier);
    // delete if failed
    if (ier != 0) {
        delete modelObject;
        return ier;
    }
    
    // Store the pointer in KIM's system
    modelDriverCreate->SetModelBufferPointer(static_cast(modelObject));
    
    return 0;
}
}

This function is the first thing KIM calls when creating your model. The extern "C" wrapper ensures that whether KIM is being called from a Fortran MD code or a C++ simulator, it can always find this function by name.

2. The Constructor - Setting Up the Model

The constructor does several important tasks, each designed with multi-language compatibility in mind:

LennardJones612::LennardJones612(/* parameters */) {
    *ier = 0;
    
    // Step 1: Set the numbering convention (0-based vs 1-based arrays)
    // This is crucial for Fortran compatibility!
    *ier = modelDriverCreate->SetModelNumbering(KIM::NUMBERING::zeroBased);
    
    // Step 2: Define units for calculations
    // you will usually get the requested units from the simulator
    // Some drivers, mostly the ML ones, raise error as they can support
    // multiple units, where as others transform their parameters accordingly
    *ier = modelDriverCreate->SetUnits(requestedLengthUnit,
                                      requestedEnergyUnit,
                                      KIM::CHARGE_UNIT::unused,
                                      KIM::TEMPERATURE_UNIT::unused,
                                      KIM::TIME_UNIT::unused);
    
    // Step 3: Read parameters from file
    // ... (code to read sigma, epsilon, cutoff, species)
    
    // Step 4: Unit conversion
    // If you want your model so support unit conversion, you can get the
    // conversion constants as shown below
    double convertLength = 1.0;
    double convertEnergy = 1.0;
    *ier = KIM::ModelDriverCreate::ConvertUnit(
        fromLength, fromEnergy, fromCharge, fromTemperature, fromTime,
        requestedLengthUnit, requestedEnergyUnit, requestedChargeUnit,
        requestedTemperatureUnit, requestedTimeUnit,
        1.0, 0.0, 0.0, 0.0, 0.0,  // exponents
        &convertLength);
    *ier = KIM::ModelDriverCreate::ConvertUnit(
        fromLength, fromEnergy, fromCharge, fromTemperature, fromTime,
        requestedLengthUnit, requestedEnergyUnit, requestedChargeUnit,
        requestedTemperatureUnit, requestedTimeUnit,
        0.0, 1.0, 0.0, 0.0, 0.0,  // exponents
        &convertEnergy);
    // here conversion constant (last arg) = (length factor)^ first exponent * 
    //   									 (energy factor)^ second exponent * ...
    
    
    // Apply conversions
    cutoff *= convertLength;
    sigma *= convertLength;
    epsilon *= convertEnergy;
    
    // Step 5: Register parameters with KIM
    // These parameters will be saved with details, and can be queried for training
    // or other purposes.
    *ier = modelDriverCreate->SetParameterPointer(
        1, &cutoff, "cutoff", "Cutoff of the LJ model");
    
    // Step 6: Configure neighbor lists
    // See kim api docs for details on influence distance vs cutoff distance
    // modelWillNotRequestNeighborsOfNoncontributingParticles_ does what is says!
    // if set to false, KIM-API will also ask the simulator to compute the neighbors
    // of non contributing atoms.
    modelDriverCreate->SetInfluenceDistancePointer(&cutoff);
    modelDriverCreate->SetNeighborListPointers(
        1, &cutoff, &modelWillNotRequestNeighborsOfNoncontributingParticles_);
    
    // Step 7: Register callback functions
    // This static function will be called for compute.
    // You need to provide other static functions as well, for example
    // Compute arguemens functions. See the example for more details.
    KIM::ModelComputeFunction * compute = LennardJones612::Compute;
    *ier = modelDriverCreate->SetRoutinePointer(
        KIM::MODEL_ROUTINE_NAME::Compute,
        KIM::LANGUAGE_NAME::cpp, true,
        reinterpret_cast(compute));
}

Each step here addresses multi-language compatibility:

Numbering: Fortran arrays typically start at 1, C/C++ at 0. This setting tells KIM how to number particles and arrays.
Units: Different codes may use different unit systems. KIM handles the conversion.
Function Registration: We register static functions that can be called from any language.

3. The Compute Function - Where Physics Happens

This is the heart of your model, called repeatedly during simulations. Let me show you the complete function with all its important parts, bet before that you would notice ModelComputeArguments pointers. These are supposed to point to various memory locations which contain, or are supposed to contain the information to/from the simulator. Namely, coordinates, species, contributing/non-contributing classification, energy, and forces, etc. You can access these pointers using GetArgumentPointer function.

int LennardJones612::Compute(
    KIM::ModelCompute const * const modelCompute,
    KIM::ModelComputeArguments const * const modelComputeArguments) {
    
    // Retrieve our model object
    LennardJones612 * modelObject = NULL;
    modelCompute->GetModelBufferPointer(reinterpret_cast(&modelObject));
    
    // Get pointers to simulation data
    int const * numberOfParticles;
    int const * particleContributing;
    double const * coordinates;
    double * forces = NULL;
    double * energy = NULL;
    
    // Request the data we need
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::numberOfParticles, &numberOfParticles);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::particleContributing, &particleContributing);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::coordinates, &coordinates);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::partialForces, &forces);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::partialEnergy, &energy);
    
    // Initialize outputs
    if (energy != NULL) *energy = 0.0;
    if (forces != NULL) {
        for (int i = 0; i < *numberOfParticles; ++i) {
            forces[3*i + 0] = 0.0;
            forces[3*i + 1] = 0.0;
            forces[3*i + 2] = 0.0;
        }
    }
    
    // --- Retrieve Model Parameters ---
    // Get the Lennard-Jones parameters (sigma, epsilon, cutoff) from our modelObject.
    // These were read from the parameter file and converted to consistent units during construction.
    double sigma_val = modelObject->sigma;
    double epsilon_val = modelObject->epsilon;
    double cutoff_val = modelObject->cutoff;
    double cutoff_sq = cutoff_val * cutoff_val;
    
    // Main computation loop
    for (int i = 0; i < *numberOfParticles; ++i) {
        if (particleContributing[i] == 0) continue;
        
        // Get neighbors for particle i
        int numnei;
        int const * neighbors;
        modelComputeArguments->GetNeighborList(0, i, &numnei, &neighbors);
        
        // Loop over neighbors
        for (int jj = 0; jj < numnei; ++jj) {
            int j = neighbors[jj];
            
            // Skip if we've already processed this pair from j's side
            if (j < i && particleContributing[j] == 1) continue;
            
            // Calculate distance
            double dx = coordinates[3*i + 0] - coordinates[3*j + 0];
            double dy = coordinates[3*i + 1] - coordinates[3*j + 1];
            double dz = coordinates[3*i + 2] - coordinates[3*j + 2];
            double r2 = dx*dx + dy*dy + dz*dz;
            
            if (r2 > cutoff_sq) continue;
            
            // Lennard-Jones calculations
            double sigma_sq_div_r2 = (sigma_val * sigma_val) / r2;
            double sigma6_div_r6 = sigma_sq_div_r2 * sigma_sq_div_r2 * sigma_sq_div_r2;
            double sigma12_div_r12 = sigma6_div_r6 * sigma6_div_r6;
            
            double pair_energy = 4.0 * epsilon_val * (sigma12_div_r12 - sigma6_div_r6);
            double f_over_r = (24.0 * epsilon_val / r2) * 
                              (2.0 * sigma12_div_r12 - sigma6_div_r6);
            
            // Handle ghost atoms correctly
            if (particleContributing[j] == 0) {
                f_over_r *= 0.5;  // Ghost atoms get half force
            }
            
            // Apply forces and write it back to required pointer
            // Check if it is a nullptr first. If it is a nullptr, it means
            // that the simulator has not requested the forces, and you are
            // now writing it to undefined space -- segfault.
            if (forces != nullptr) {
                forces[3*i + 0] += f_over_r * dx;
                forces[3*i + 1] += f_over_r * dy;
                forces[3*i + 2] += f_over_r * dz;
                forces[3*j + 0] -= f_over_r * dx;
                forces[3*j + 1] -= f_over_r * dy;
                forces[3*j + 2] -= f_over_r * dz;
            }
            
            // Add energy contribution
            if (energy != nullptr) {
                if (particleContributing[j] == 1) {
                    *energy += pair_energy;
                } else {
                    *energy += 0.5 * pair_energy;  // Ghost atoms contribute half
                }
            }
        }
    }
    
    return 0;
}

The compute function demonstrates several important concepts:

Model Object Retrieval: We use reinterpret_cast to get our C++ object from KIM's generic storage
Parameter Access: We retrieve the model parameters (sigma, epsilon, cutoff) from our object
Optional Outputs: We check if forces and energy pointers are NULL -- the simulator might not need both
Ghost Atom Handling: Non-contributing particles need special treatment to avoid double-counting

The Build Configuration (CMakeLists.txt)

This ensures that KIM-API correctly handles the cmake build process. Just use this as a template.

cmake_minimum_required(VERSION 3.10)
list(APPEND CMAKE_PREFIX_PATH $ENV{KIM_API_CMAKE_PREFIX_DIR})
find_package(KIM-API 2.0 REQUIRED CONFIG)

if(NOT TARGET kim-api)
  enable_testing()
  project("${KIM_API_PROJECT_NAME}" VERSION "${KIM_API_VERSION}"
    LANGUAGES CXX C Fortran)
endif()

set(MODEL_DRIVER_NAME "MyLJ__MD_000000000000_000")

add_kim_api_model_driver_library(
  NAME                    ${MODEL_DRIVER_NAME}
  CREATE_ROUTINE_NAME     "model_driver_create"
  CREATE_ROUTINE_LANGUAGE "cpp"
)

target_sources(${MODEL_DRIVER_NAME} PRIVATE MyLJ.cpp)

Key Takeaways and Best Practices

As you develop your own model drivers, keep these points in mind:

Always check return codes: Every KIM function returns an error code. Check it!
Handle optional arguments gracefully: Not all simulators will request all possible outputs. Always check if pointers are nullptr before using them.
Think about performance: The Compute function is called millions of times. Every optimization matters. Consider learning about compute dispatch for production code.
Use the adapter pattern: Static member functions that retrieve the model object and forward to regular methods keep your code clean and compatible with KIM's C-style callbacks.
Remember multi-language users: Your model might be called from Fortran, C, or other languages. Stick to KIM's patterns to ensure compatibility.
Debug systematically: If something goes wrong, check kim.log first. Compile KIM-API in debug mode for more detailed error messages.

Common Pitfalls and How to Avoid Them

Ghost Atoms and Contributing Particles

Remember that simulators may include "ghost" atoms for periodic boundaries. Always check the particleContributing array and handle non-contributing particles correctly. This is especially important for parallel simulations where ghost atoms ensure correct forces across processor boundaries.

Unit Conversions

Always convert your parameters to the requested units. KIM provides conversion functions -- use them! Different simulators and different countries use different unit systems, and KIM handles this complexity for you.

Array Indexing

Remember that if you set NUMBERING::zeroBased, particle indices run from 0 to N-1. If you set NUMBERING::oneBased (common for Fortran models), they run from 1 to N. Be consistent!

Beyond the Basics

Once you're comfortable with basic model drivers, you can explore advanced techniques:

Compute Dispatch: Use templates to generate optimized versions of your compute function for different combinations of outputs
PIMPL Pattern: Separate your interface from implementation for cleaner code and better encapsulation
Parameter Files: Support multiple parameter sets for different materials
Callbacks: Implement advanced features like stress tensors and virial calculations
Multi-Species Support: Extend beyond single-element systems
Parallel Optimization: Use OpenMP or vectorization in your compute loops

Final words

Once you have created this model driver, simply install it as

kim-api-collections-management install user

in above example it would be something like

kim-api-collections-management install user MyLJ__MD_000000000000_000

Given that you kept all your code in a folder named MyLJ__MD_000000000000_000.

Creating KIM-API model drivers might seem daunting at first, but it's really about understanding a few key patterns and concepts. The C++ features we explored -- reinterpret_cast, templates, extern "C", static functions -- all serve specific purposes in creating a flexible, performant interface between your physics and simulators written in any language.

Remember that KIM-API's design choices all stem from its goal of true portability. Every time you see something that seems unnecessarily complex, ask yourself: "How would this work if called from Fortran?" The answer usually explains the design choice.

The example we walked through is intentionally minimal -- no PIMPL pattern, no compute dispatch, just the essentials. This is perfect for learning and for simple potentials. As your models grow more complex, you can gradually adopt more advanced patterns. Don't get too caught up in the complexity. Start with a working example, modify it step by step, and build your understanding through practice.

The beauty of KIM-API is that once you've written your model driver, it works everywhere. Your carefully crafted potential becomes a portable, reusable piece of scientific software that others can use and build upon. Whether someone calls it from a Fortran code written in the 1990s or a cutting-edge C++ simulator, your model will work seamlessly. That's the power of standardization, and you're now equipped to be part of it.

Happy coding, and welcome to the KIM community, oh and please submit your driver to https://openkim.org !

Additional Resources

KIM-API Documentation
OpenKIM Repository - Browse existing models for examples
KIM-API GitHub - Source code and examples
The example code from this tutorial

Appendix: Understanding Models vs Model Drivers

Now that you understand how to create a model driver, let me introduce you to a crucial concept that makes KIM-API particularly powerful: the separation between model drivers and models. This distinction might seem like unnecessary complexity at first, but it's actually a brilliant design that promotes code reuse and scientific reproducibility.

The Model-Driver Relationship

Think of this relationship like a recipe book versus actual meals. A model driver is like a recipe for "pasta with sauce" -- it describes the general process, the steps involved, and what ingredients (parameters) are needed. A model, on the other hand, is like "spaghetti carbonara" -- it's a specific instance that uses the pasta recipe with particular ingredients (bacon, eggs, parmesan, specific cooking times). Another useful analogy is the classes in C++/Python, where model driver is the class, where as the model is a particular instantiation of this class with desired set of parameters.

In KIM-API terms, your Lennard-Jones model driver that we just built is the recipe. It knows how to calculate forces and energies given three parameters: sigma, epsilon, and cutoff. But it doesn't know what those values should be for any particular material. That's where models come in.

Creating a Silicon Lennard-Jones Model

Let's look at a concrete example. Suppose we want to create a Lennard-Jones model for silicon. We don't need to write any new C++ code -- we just need to tell KIM to use our existing driver with silicon-specific parameters.

Here's the model's CMakeLists.txt file:

cmake_minimum_required(VERSION 3.10)
list(APPEND CMAKE_PREFIX_PATH $ENV{KIM_API_CMAKE_PREFIX_DIR})
find_package(KIM-API 2.0 REQUIRED CONFIG)
if(NOT TARGET kim-api)
  enable_testing()
  project("${KIM_API_PROJECT_NAME}" VERSION "${KIM_API_VERSION}"
    LANGUAGES CXX C Fortran)
endif()

add_kim_api_model_library(
  NAME            "LJSi_MO_111111111110_000"
  DRIVER_NAME     "MyLJ_MD_111111111111_000"
  PARAMETER_FILES "si.param"
)

Notice how different this is from the model driver's CMakeLists.txt. Instead of add_kim_api_model_driver_library, we use add_kim_api_model_library. The key line is DRIVER_NAME -- this tells KIM which driver to use for this model. We're essentially saying, "Create a model called LJSi that uses the MyLJ driver with the parameters found in si.param."

The Parameter File

The parameter file for our silicon model (si.param) contains:

Si  7.9111800  3.1743100  1.9778000

Let me break down what each number represents based on how our driver reads the file:

Si: The chemical species (silicon)
7.9111800: The cutoff distance in Angstroms
3.1743100: The epsilon parameter in eV (the depth of the potential well)
1.9778000: The sigma parameter in Angstroms (the distance at which the potential is zero)

Remember the code in our driver's constructor that reads these values:

buffer >> species;   // Reads "Si"
buffer >> cutoff;    // Reads 7.9111800
buffer >> epsilon;   // Reads 3.1743100
buffer >> sigma;     // Reads 1.9778000

The Beauty of Parameter File Flexibility

Here's something important to understand: KIM-API places no restrictions on the format of parameter files. Your model driver decides how to read and interpret them. This flexibility means you could:

Use JSON or XML for more complex parameter sets
Include additional parameters like temperature-dependent corrections
Add comments and documentation within the parameter file
Store parameters for multiple species in a single file

For example, a more sophisticated parameter file might look like:

# Lennard-Jones parameters for various elements
# Format: Element cutoff epsilon sigma [optional: comments]
Si  7.9111800  3.1743100  1.9778000  # Fitted to crystalline silicon
Ge  8.1234500  3.2456700  2.0123400  # Fitted to germanium

Your driver would need to parse this format appropriately, perhaps skipping comment lines and handling multiple elements.

Creating Multiple Models from One Driver

The real power of this separation becomes clear when you consider creating models for different materials. With our single Lennard-Jones driver, we can create models for:

Silicon (LJSi_MO_111111111110_000)
Argon (LJAr_MO_111111111110_000)
Any other element or compound that can be approximated with Lennard-Jones

Each model would have its own parameter file (toy example below, do not use them in actual simulations):

ar.param:

Ar  8.5000000  0.0104000  3.4000000

ne.param:

Ne  5.5000000  0.0031000  2.7400000

Without writing a single line of additional C++ code, we've created three different interatomic potentials! This is the essence of code reuse in scientific software.

The Model Identification System

You might have noticed the cryptic numbers in model names like "LJSi_MO_111111111110_000". This is KIM's systematic naming convention:

LJSi: Human-readable identifier (Lennard-Jones for Silicon)
MO: Indicates this is a Model (not a Model Driver, MD for model driver)
111111111110: A unique identifier (like a serial number)
000: Version number

This naming system ensures that every model in the KIM repository has a unique identifier, making scientific results reproducible. When someone publishes a paper using "LJSi_MO_111111111110_000", anyone can get exactly the same model and reproduce their results.

Why This Separation Matters

The model/driver separation embodies several important software engineering principles:

Don't Repeat Yourself (DRY): The physics implementation exists in one place (the driver), while parameters can vary across many models.
Separation of Concerns: The driver handles the physics and algorithms; the model handles the material-specific parameters.
Scalability: One driver can support hundreds of models without code duplication.
Maintainability: Bug fixes or improvements to the driver automatically benefit all models that use it.
Scientific Reproducibility: Each model has a unique identifier and fixed parameters, ensuring results can be reproduced exactly.

This design also makes it easier for different communities to collaborate. A physicist might develop a sophisticated driver implementing new theoretical insights, while materials scientists can create models by fitting parameters to their specific materials of interest. Neither group needs to understand the other's domain deeply -- the interface between driver and model provides a clean separation.

Remember, every model in the KIM repository started just like this -- someone implemented a driver, someone (possibly the same person) determined appropriate parameters, and they combined them into a model that others can now use. Your contributions, whether drivers or models, become part of this growing ecosystem of scientific software.

Setting up Clojure and SCIMUtils as an absolute beginner

Amit Gupta — Mon, 22 Aug 2022 15:34:29 GMT

Clojure setup

So recently I was starting up on Clojure for SCIMUtils library, which is an excellent port of original scimutils written in MIT-Scheme. Clojure version provides overall better documentation and tooling experience than the Scheme version.

However Clojure is bit more eccentric for my taste and I was having hard time getting started. Shout out to user pmonks from the Discord Clojure chatroom Discljord for being patient and helpful enough to guide me though its first steps. Following is the conversation with him. Most of it is replicated verbatim, except few changes in structure, and editing where required. Plus it is really really hard to get help on MIT-Scheme

Bit of background

So the first thing that’s a bit odd is that you would normally use either lein or clj, but rarely both together. They’re basically competing build tools / environments.
Sort of - it’s supposed to be a more modern take on a Clojure build tool, and is published by Cognitect (the creators of Clojure), so has more “cachet”. Whether it’s “better” than Leiningen or not at this point is a matter of debate. clj just wraps Clojure in rlwrap - they are otherwise identical. So yes, for interactive REPLs, you’d usually want clj.

Structure of Clojure project

Anyway, one of the first things to understand about package management / libraries in Clojure (which is actually a limitation of the JVM) is that hot-loading libraries is a bit fraught. There are hacky ways to partially do it, but I would not recommend them unless you’re already fairly familiar with how the JVM loads code. So what we have to do instead is declare our dependencies up front, then start a REPL (and if the dependencies change, generally speaking the best bet is to quit and restart the REPL).

For clj & clojure, dependencies are expressed in a file that (by default) is called deps.edn. To declare a dependency on SICMUtils, the contents of that file would be

{:deps {sicmutils/sicmutils {:mvn/version "0.22.0"}}}

This is same as given on Clojars page clj column. Clojars “knows” about the clj/clojure tools, so they provide this as a handy copypasta.

Also the clj tool assumes source code is housed under a sub directory called src. So the directory structure should be:

new_project/
|
+- deps.edn
|
+- src/
    |
    +- core.clj

Though if your namespace is new-project.core, that should actually be:

new_project/
|
+- deps.edn
|
+- src/
    |
    +- new_project/
        |
        +- core.clj

The core.clj contains simple hello world program.

(ns new_project.core)

(defn hello
  []
  (println "Hello World"))

Once you have that file in an otherwise empty directory, if you start a clj / clojure REPL in that directory, you should see it download the library (first time only) then start a REPL where that library is available.

All downloaded packages are by default kept in a “per user” cache in ~/.m2.
That means that if you use the same dependency in multiple separate projects you’re not downloading the same libraries over and over again. This is standard practice across many/most JVM-hosted languages.
Clojure simply leverages the underlying JVM ecosystem.

Running the Program

To load code from a file you would normally require that file’s namespace, then use the vars it declares from your own namespace (user in the case of the REPL).
Now to load code from a file using require, there is a naming convention for that file’s name that you have to follow (since Clojure has a particular mapping from ns symbol to file-on-disk).

As an example, if your file declares a namespace called my.cool.namespace, the file would need to be called ./my/cool/namespace.clj (yes that’s a directory structure).

Oh and one JVM oddity to watch out for - if your namespace name includes hyphens anywhere (“-“), those would be replaced with an underscore (“_”) in the filename. This is another JVM oddity that Clojure has to workaround.

Once you have your core.clj named and located properly, to load and use it in the REPL, you’d just require the namespace e.g. in above case it will be

(require '[new_project.core as cr]).

vars in that namespace are then available via the cr prefix. e.g. (cr/hello)

Oh and one nice thing that Clojure can do for Clojure code, is that if your file changes you can reload it without exiting the REPL. To do that:

(require '[new_project.core as cr] :reload-all)

So from inside your poject folder you should have final structure as:

$ tree .
.
├── deps.edn
└── src
    └── new_project
        └── core.clj

2 directories, 2 files

And the clj commandline looks like:

$ clj                               
Clojure 1.11.1
user=> (require '[new_project.core :as cr])
nil
user=> (cr/hello)
Hello World
nil
user=>

Thats it! Now you can head to SCIMutils docs and start doing what the page tells you to.

Automatic Differentiation

Amit Gupta — Fri, 04 Mar 2022 05:53:21 GMT

This is a small presentation I gave in lab meeting, introducing principles of automatic differentiation. Among various AD frameworks, Autograd, Autodiff, and Enzyme were benchmarked over Stillinger-Weber potential of Si atoms. Stillinger-Weber was chosen as most benchmarks and frameworks focus on traditional linear algebra kind problems, where as I am more interested on ML applications on more scientific problems. All the code would be uploaded in github soon. Presentation can be downloaded here. Sources for figures and content are given in the end.