KIM-API Model Drivers from scratch I: using C++

Introduction

This is supposed to be a two/three part series on how to write OpenKIM model driver using C++, C and Fortran (Fortran if I ever got to use it!). Majority of codes/slides etc can be accessed here: https://github.com/ipcamit/kim-api-tutorial .

Have you ever wanted to create portable interatomic potentials that work seamlessly across different molecular dynamics simulators? The Knowledgebase of Interatomic Models (KIM) API makes this possible, but diving into model driver development can feel overwhelming, especially if you're not deeply familiar with C++.

This guide takes you from the fundamental C++ concepts you'll encounter in KIM-API code all the way to understanding a complete Lennard-Jones model driver implementation. Think of this as a friendly companion that demystifies the technical aspects while giving you practical insights into how everything fits together.

Why KIM-API?

Before we dive into the code, let's understand what problem KIM-API solves. Traditionally, if you developed an interatomic potential, you'd need to implement it separately for each simulator (LAMMPS, ASE, DL_POLY, etc.). KIM-API provides a standardized interface that allows you to write your potential once and have it work everywhere. The magic happens through a combination of smart API design and some C++ techniques we'll explore together.

What makes KIM-API particularly impressive is its multi-language support. While we're focusing on C++ in this tutorial, the same model can be called from simulators written in C, Fortran, Python, or other languages (The model driver in C post is coming out "soon" !). This language-agnostic design influences many of the technical choices we'll see, from the use of extern "C" linkage to the way functions are registered through pointers.

Prerequisites and Setup

To follow along with this tutorial, you'll need:

  • A Unix-like environment (preferably Linux)
  • A C++ compiler (g++)
  • CMake
  • Basic familiarity with C/C++ programming
  • (Recommended) VS Code with remote development extensions

For the easiest setup, you can use the KIM Developer Platform Docker container:

# Pull the Docker image
docker pull ghcr.io/openkim/developer-platform:latest-minimal

# Run the container
docker run -it --name kim_dev -v `pwd`:/home/openkim/tutorial ghcr.io/openkim/developer-platform:latest-minimal bash

# In VS Code, use "Attach to Running Container" to connect
cd tutorial

Essential C++ Concepts for KIM-API

Let me introduce you to five C++ concepts that appear frequently in KIM-API code. Don't worry if these seem complex at first -- we'll break each one down with examples. Each of these concepts exists for a specific reason related to KIM-API's goal of being a truly multi-language framework. If you are comfortable with C++, you can skip.

1. reinterpret_cast: The "I Know What I'm Doing" Cast

Casting usually means changing datatype of a variable. Some of the type changes are easier to understand, for example floating points to integers, some changes might not be feasible or disallowed . Normally casting includes set of rules on how to change one object to the desired one. In the float to int example above you might desire that while converting you round off to nearest integer, or simply truncate the floating part (akin to floor function, the usual C++ way). Which one do you want depends on your use case. For complex objects (like C++ classes) this conversion is even more tricky as it involves multiple variables, allocated memory blocks etc.

Think of reinterpret_cast as telling the compiler: "Trust me, I want to look at this memory differently." It's like having a box labeled "Books" but deciding to treat it as "General Storage" -- the contents don't change, just how you interpret them. Normally (static_cast) wont allow you to do it, but reinterpret_cast will ensure that it just starts treating the object as you asked, like an obedient disciple.

In KIM-API, we use reinterpret_cast to convert between specific model objects and generic void* pointers. This allows the API to handle different model types uniformly while maintaining C compatibility. Why does this matter? Because C doesn't have classes or templates, it only understands basic pointers. By converting everything to void*, we create a common ground that all languages can work with.

#include <iostream>

struct Data { 
    int a; 
    double b; 
};

int main() {
    Data myData = {10, 3.14};
    
    // Treat the Data object's memory as raw bytes
    char* bytePtr = reinterpret_cast<char*>(&myData);
    
    // We can now examine the raw memory
    std::cout << "First few bytes of Data object:\n";
    for (int i = 0; i < sizeof(Data); ++i) {
        std::cout << std::hex << (int)(unsigned char)bytePtr[i] << " ";
    }
    
    return 0;
}

Why it matters for KIM-API: The framework needs to store pointers to your model objects in a generic way that works across languages. When KIM calls your functions, you'll retrieve your model object using reinterpret_cast to convert the generic pointer back to your specific type (Adapter pattern discussed below).

2. Templates: Asking the Compiler to Write Code for You

Templates are like recipe cards where you leave some ingredients blank. When you use the recipe, you fill in the specific ingredients, and the compiler creates the actual code for you. It is an extremely powerful feature of the C++ language, where the template variables you leave are autocompleted by the compiler based on use cases.

#include <iostream>

// Function template to find the maximum of two values
template <typename T>  // T is a placeholder for any type
T maximum(T a, T b) {
    return (a > b) ? a : b;
}

int main() {
    std::cout << "Max(5, 10): " << maximum(5, 10) << std::endl;         // T becomes int
    std::cout << "Max(3.14, 2.71): " << maximum(3.14, 2.71) << std::endl; // T becomes double
    
    return 0;
}

In the above example, you need not provide integer/floating point interpretation of your maximum function. The compiler saw that you first called the maximum function with two floating point numbers (therefore T is float) and it created a float maximum (float a, float b) function for you. In the second call compiler understood that now T is an int so it created a int maximum (int a, int b) function.

What if you had a call like maximum(float, int)?
The compiler will fail to deduce the correct type and you should get the following compile time error:
tmp.cpp:11:56: error: no matching function for call to ‘maximum(double, int)’
   11 |     std::cout << "Max(3.14, 2.71): " << maximum(3.14, 2) << std::endl; // T becomes double
      |                                                        ^
tmp.cpp:5:3: note: candidate: ‘template<class T> T maximum(T, T)’
    5 | T maximum(T a, T b) {

Advanced KIM-API Usage - Compute Dispatch: One of the most powerful uses of templates in KIM-API is the "compute dispatch" pattern. Instead of having runtime checks inside your inner loops (which can kill performance), you use templates to generate multiple versions of your compute function at compile time:

// Template parameters control what calculations are performed
template<bool doForces, bool doEnergy, bool doStress>
int ComputeImplementation(/* parameters */) {
    // Loop over particles
    for (int i = 0; i < nParticles; ++i) {
        // Calculate distance, etc.
        
        // These if statements are evaluated at compile time!
        if (doEnergy) {
            *energy += calculateEnergy(r);
        }
        
        if (doForces) {
            calculateForces(r, forces);
        }
        
        if (doStress) {
            calculateStress(r, stress);
        }
    }
}

// In your main Compute function:
int Compute(/* parameters */) {
    // Determine what the simulator requested
    bool hasForces = (forces != NULL);
    bool hasEnergy = (energy != NULL);
    bool hasStress = (stress != NULL);
    
    // Dispatch to the right template instantiation
    if (hasForces && hasEnergy && !hasStress) {
        return ComputeImplementation<true, true, false>(/* parameters */);
    } else if (hasForces && !hasEnergy && !hasStress) {
        return ComputeImplementation<true, false, false>(/* parameters */);
    }
    // ... handle all 8 combinations
}

The beauty of this approach is that the compiler removes all the if statements inside the loops, creating highly optimized code for each specific case. You write the logic once, but get multiple optimized versions automatically.

3. extern "C": Universal Language Compatibility

C++ "mangles" function names to support features like overloading. For example, process_data(int) might become something like _Z12process_datai internally. The extern "C" directive tells the compiler to use C-style naming, keeping function names unchanged.

This is absolutely critical for KIM-API because it needs to work with Fortran simulators, C-based codes, and other languages. These languages don't understand C++ name mangling -- they expect functions to have simple, predictable names.

// Without extern "C" - C++ will mangle these names
void process_data(int data) {
    volatile int x = data;
}

// With extern "C" - name stays as "process_data_for_c"
extern "C" void process_data_for_c(int data) {
    volatile int z = data;
}

Why it matters for KIM-API: The entry point function model_driver_create must have extern "C" linkage so that KIM can find it regardless of what language the simulator is written in. This is the bridge that makes multi-language support possible.

4. Static: Keep Only One Copy

The static keyword in C++ has different meanings depending on context. For class methods, it means "this function doesn't need an object instance" -- similar to Python's @staticmethod.

Understanding why KIM-API uses static functions requires thinking about function pointers across languages. In C and Fortran, you can only create pointers to regular functions, not to member functions of objects (which have an implicit this parameter, similar to python self). Static member functions don't have a this parameter, making them compatible with C-style function pointers.

class LennardJones {
public:
    // Static member function - no 'this' pointer
    static int Compute(/* parameters */) {
        // Can't access non-static member variables here
        // Must retrieve the object from KIM's storage
        return 0;
    }
    
    // Regular member function - has implicit 'this' pointer
    int RegularCompute(/* parameters */) {
        // Actual signature int RegularCompute(LennardJones* this, /* paramerters */)
        // Can access member variables through 'this'
        // But can't be used as a C-style function pointer!
        return 0;
    }
};

Why it matters for KIM-API: KIM needs to store pointers to your functions in a way that works across all languages. Static member functions can be treated as regular C functions, making them perfect for this purpose.

5. PIMPL Pattern: Hide the Implementation Details

PIMPL (Pointer to Implementation) is like having a public reception desk that handles all requests while the actual work happens in a private office behind the scenes. This pattern separates the interface from the implementation. Lot of C++ KIM-API drivers follow this pattern so it is useful to know it if you want to look for implementation of various drivers.

Important Note: The example we'll walk through is a minimal implementation that does NOT use PIMPL. We're keeping things simple to focus on the core concepts. In production model drivers, you might see PIMPL used to keep the public interface stable while allowing the implementation to change.

KIM-API's Multi-Language Design Philosophy

Before we dive into the code, let's understand how KIM-API's design choices enable multi-language support:

  1. Zero-Based vs One-Based Indexing: Different languages have different conventions. C and C++ use zero-based arrays, while Fortran traditionally uses one-based arrays. KIM-API lets you specify which convention your model uses through SetModelNumbering().
  2. Function Pointers Instead of Virtual Functions: Virtual functions are a C++ feature that doesn't translate to C or Fortran. By using function pointers registered through the API, KIM maintains language neutrality.
  3. Opaque Pointers: The void* pointer approach means that each language only needs to understand basic pointer types, not complex C++ objects.
  4. Explicit Memory Management: While modern C++ might use smart pointers, KIM-API uses explicit new and delete to maintain compatibility with C-style memory management that all languages understand.

Understanding the KIM-API Model Driver Structure

Now that we've covered the C++ basics and understood the multi-language design philosophy, let's see how a KIM-API model driver fits together. A model driver consists of:

  1. Implementation (the driver): The physics and algorithms
  2. Parameters (the model): Specific values like epsilon and sigma for Lennard-Jones
  3. Compute Arguments: The interface with the simulator

The basic flow looks like this:

Simulator (any language) → KIM-API → Your Model Driver → Calculations → Results back to Simulator

Walking Through a Lennard-Jones Implementation

Let's examine a complete, working Lennard-Jones model driver. This is a minimal implementation designed for clarity -- it doesn't use advanced patterns like PIMPL or compute dispatch.

The Header File (MyLJ.hpp)

#ifndef LJ_HPP_ 
#define LJ_HPP_ 

#include "KIM_ModelDriverHeaders.hpp"

// Entry point function with C linkage
extern "C" {
int model_driver_create(KIM::ModelDriverCreate * const modelDriverCreate,
                        KIM::LengthUnit const requestedLengthUnit,
                        KIM::EnergyUnit const requestedEnergyUnit,
                        KIM::ChargeUnit const requestedChargeUnit,
                        KIM::TemperatureUnit const requestedTemperatureUnit,
                        KIM::TimeUnit const requestedTimeUnit);
}

class LennardJones612
{
 public:
  // Constructor and destructor
  LennardJones612(KIM::ModelDriverCreate * const modelDriverCreate,
                  KIM::LengthUnit const requestedLengthUnit,
                  KIM::EnergyUnit const requestedEnergyUnit,
                  KIM::ChargeUnit const requestedChargeUnit,
                  KIM::TemperatureUnit const requestedTemperatureUnit,
                  KIM::TimeUnit const requestedTimeUnit,
                  int * const ier);
  ~LennardJones612();

  // Static member functions for KIM callbacks
  static int Destroy(KIM::ModelDestroy * const modelDestroy);
  static int Refresh(KIM::ModelRefresh * const modelRefresh);
  static int Compute(KIM::ModelCompute const * const modelCompute,
                     KIM::ModelComputeArguments const * const modelComputeArguments);
  static int ComputeArgumentsCreate(
      KIM::ModelCompute const * const modelCompute,
      KIM::ModelComputeArgumentsCreate * const modelComputeArgumentsCreate);
  static int ComputeArgumentsDestroy(
      KIM::ModelCompute const * const modelCompute,
      KIM::ModelComputeArgumentsDestroy * const modelComputeArgumentsDestroy);

  // Model parameters - stored directly in the class (no PIMPL)
  std::string species;
  double cutoff, sigma, epsilon;
};

#endif  // LJ_HPP_

Notice how all the KIM callback functions are declared as static. This is the adapter pattern in action -- these static functions will retrieve the actual model object and forward calls to it. This design allows C and Fortran codes to call these functions through simple function pointers.

The Implementation File (MyLJ.cpp) - Key Sections

Let's walk through the implementation step by step.

1. The Entry Point

extern "C" {
// universal C like function to be called in all languages
int model_driver_create(KIM::ModelDriverCreate * const modelDriverCreate,
                        KIM::LengthUnit const requestedLengthUnit,
                        KIM::EnergyUnit const requestedEnergyUnit,
                        KIM::ChargeUnit const requestedChargeUnit,
                        KIM::TemperatureUnit const requestedTemperatureUnit,
                        KIM::TimeUnit const requestedTimeUnit)
{
    int ier;
    
    // Create our model object
    LennardJones612 * modelObject;
    // allocate a LennardJones driver object using `new`
    modelObject = new LennardJones612(modelDriverCreate,
                                      requestedLengthUnit,
                                      requestedEnergyUnit,
                                      requestedChargeUnit,
                                      requestedTemperatureUnit,
                                      requestedTimeUnit,
                                      &ier);
    // delete if failed
    if (ier != 0) {
        delete modelObject;
        return ier;
    }
    
    // Store the pointer in KIM's system
    modelDriverCreate->SetModelBufferPointer(static_cast<void *>(modelObject));
    
    return 0;
}
}

This function is the first thing KIM calls when creating your model. The extern "C" wrapper ensures that whether KIM is being called from a Fortran MD code or a C++ simulator, it can always find this function by name.

2. The Constructor - Setting Up the Model

The constructor does several important tasks, each designed with multi-language compatibility in mind:

LennardJones612::LennardJones612(/* parameters */) {
    *ier = 0;
    
    // Step 1: Set the numbering convention (0-based vs 1-based arrays)
    // This is crucial for Fortran compatibility!
    *ier = modelDriverCreate->SetModelNumbering(KIM::NUMBERING::zeroBased);
    
    // Step 2: Define units for calculations
    // you will usually get the requested units from the simulator
    // Some drivers, mostly the ML ones, raise error as they can support
    // multiple units, where as others transform their parameters accordingly
    *ier = modelDriverCreate->SetUnits(requestedLengthUnit,
                                      requestedEnergyUnit,
                                      KIM::CHARGE_UNIT::unused,
                                      KIM::TEMPERATURE_UNIT::unused,
                                      KIM::TIME_UNIT::unused);
    
    // Step 3: Read parameters from file
    // ... (code to read sigma, epsilon, cutoff, species)
    
    // Step 4: Unit conversion
    // If you want your model so support unit conversion, you can get the
    // conversion constants as shown below
    double convertLength = 1.0;
    double convertEnergy = 1.0;
    *ier = KIM::ModelDriverCreate::ConvertUnit(
        fromLength, fromEnergy, fromCharge, fromTemperature, fromTime,
        requestedLengthUnit, requestedEnergyUnit, requestedChargeUnit,
        requestedTemperatureUnit, requestedTimeUnit,
        1.0, 0.0, 0.0, 0.0, 0.0,  // exponents
        &convertLength);
    *ier = KIM::ModelDriverCreate::ConvertUnit(
        fromLength, fromEnergy, fromCharge, fromTemperature, fromTime,
        requestedLengthUnit, requestedEnergyUnit, requestedChargeUnit,
        requestedTemperatureUnit, requestedTimeUnit,
        0.0, 1.0, 0.0, 0.0, 0.0,  // exponents
        &convertEnergy);
    // here conversion constant (last arg) = (length factor)^ first exponent * 
    //   									 (energy factor)^ second exponent * ...
    
    
    // Apply conversions
    cutoff *= convertLength;
    sigma *= convertLength;
    epsilon *= convertEnergy;
    
    // Step 5: Register parameters with KIM
    // These parameters will be saved with details, and can be queried for training
    // or other purposes.
    *ier = modelDriverCreate->SetParameterPointer(
        1, &cutoff, "cutoff", "Cutoff of the LJ model");
    
    // Step 6: Configure neighbor lists
    // See kim api docs for details on influence distance vs cutoff distance
    // modelWillNotRequestNeighborsOfNoncontributingParticles_ does what is says!
    // if set to false, KIM-API will also ask the simulator to compute the neighbors
    // of non contributing atoms.
    modelDriverCreate->SetInfluenceDistancePointer(&cutoff);
    modelDriverCreate->SetNeighborListPointers(
        1, &cutoff, &modelWillNotRequestNeighborsOfNoncontributingParticles_);
    
    // Step 7: Register callback functions
    // This static function will be called for compute.
    // You need to provide other static functions as well, for example
    // Compute arguemens functions. See the example for more details.
    KIM::ModelComputeFunction * compute = LennardJones612::Compute;
    *ier = modelDriverCreate->SetRoutinePointer(
        KIM::MODEL_ROUTINE_NAME::Compute,
        KIM::LANGUAGE_NAME::cpp, true,
        reinterpret_cast<KIM::Function *>(compute));
}

Each step here addresses multi-language compatibility:

  • Numbering: Fortran arrays typically start at 1, C/C++ at 0. This setting tells KIM how to number particles and arrays.
  • Units: Different codes may use different unit systems. KIM handles the conversion.
  • Function Registration: We register static functions that can be called from any language.

3. The Compute Function - Where Physics Happens

This is the heart of your model, called repeatedly during simulations. Let me show you the complete function with all its important parts, bet before that you would notice ModelComputeArguments pointers. These are supposed to point to various memory locations which contain, or are supposed to contain the information to/from the simulator. Namely, coordinates, species, contributing/non-contributing classification, energy, and forces, etc. You can access these pointers using GetArgumentPointer function.

int LennardJones612::Compute(
    KIM::ModelCompute const * const modelCompute,
    KIM::ModelComputeArguments const * const modelComputeArguments) {
    
    // Retrieve our model object
    LennardJones612 * modelObject = NULL;
    modelCompute->GetModelBufferPointer(reinterpret_cast<void **>(&modelObject));
    
    // Get pointers to simulation data
    int const * numberOfParticles;
    int const * particleContributing;
    double const * coordinates;
    double * forces = NULL;
    double * energy = NULL;
    
    // Request the data we need
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::numberOfParticles, &numberOfParticles);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::particleContributing, &particleContributing);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::coordinates, &coordinates);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::partialForces, &forces);
    modelComputeArguments->GetArgumentPointer(
        KIM::COMPUTE_ARGUMENT_NAME::partialEnergy, &energy);
    
    // Initialize outputs
    if (energy != NULL) *energy = 0.0;
    if (forces != NULL) {
        for (int i = 0; i < *numberOfParticles; ++i) {
            forces[3*i + 0] = 0.0;
            forces[3*i + 1] = 0.0;
            forces[3*i + 2] = 0.0;
        }
    }
    
    // --- Retrieve Model Parameters ---
    // Get the Lennard-Jones parameters (sigma, epsilon, cutoff) from our modelObject.
    // These were read from the parameter file and converted to consistent units during construction.
    double sigma_val = modelObject->sigma;
    double epsilon_val = modelObject->epsilon;
    double cutoff_val = modelObject->cutoff;
    double cutoff_sq = cutoff_val * cutoff_val;
    
    // Main computation loop
    for (int i = 0; i < *numberOfParticles; ++i) {
        if (particleContributing[i] == 0) continue;
        
        // Get neighbors for particle i
        int numnei;
        int const * neighbors;
        modelComputeArguments->GetNeighborList(0, i, &numnei, &neighbors);
        
        // Loop over neighbors
        for (int jj = 0; jj < numnei; ++jj) {
            int j = neighbors[jj];
            
            // Skip if we've already processed this pair from j's side
            if (j < i && particleContributing[j] == 1) continue;
            
            // Calculate distance
            double dx = coordinates[3*i + 0] - coordinates[3*j + 0];
            double dy = coordinates[3*i + 1] - coordinates[3*j + 1];
            double dz = coordinates[3*i + 2] - coordinates[3*j + 2];
            double r2 = dx*dx + dy*dy + dz*dz;
            
            if (r2 > cutoff_sq) continue;
            
            // Lennard-Jones calculations
            double sigma_sq_div_r2 = (sigma_val * sigma_val) / r2;
            double sigma6_div_r6 = sigma_sq_div_r2 * sigma_sq_div_r2 * sigma_sq_div_r2;
            double sigma12_div_r12 = sigma6_div_r6 * sigma6_div_r6;
            
            double pair_energy = 4.0 * epsilon_val * (sigma12_div_r12 - sigma6_div_r6);
            double f_over_r = (24.0 * epsilon_val / r2) * 
                              (2.0 * sigma12_div_r12 - sigma6_div_r6);
            
            // Handle ghost atoms correctly
            if (particleContributing[j] == 0) {
                f_over_r *= 0.5;  // Ghost atoms get half force
            }
            
            // Apply forces and write it back to required pointer
            // Check if it is a nullptr first. If it is a nullptr, it means
            // that the simulator has not requested the forces, and you are
            // now writing it to undefined space -- segfault.
            if (forces != nullptr) {
                forces[3*i + 0] += f_over_r * dx;
                forces[3*i + 1] += f_over_r * dy;
                forces[3*i + 2] += f_over_r * dz;
                forces[3*j + 0] -= f_over_r * dx;
                forces[3*j + 1] -= f_over_r * dy;
                forces[3*j + 2] -= f_over_r * dz;
            }
            
            // Add energy contribution
            if (energy != nullptr) {
                if (particleContributing[j] == 1) {
                    *energy += pair_energy;
                } else {
                    *energy += 0.5 * pair_energy;  // Ghost atoms contribute half
                }
            }
        }
    }
    
    return 0;
}

The compute function demonstrates several important concepts:

  1. Model Object Retrieval: We use reinterpret_cast to get our C++ object from KIM's generic storage
  2. Parameter Access: We retrieve the model parameters (sigma, epsilon, cutoff) from our object
  3. Optional Outputs: We check if forces and energy pointers are NULL -- the simulator might not need both
  4. Ghost Atom Handling: Non-contributing particles need special treatment to avoid double-counting

The Build Configuration (CMakeLists.txt)

This ensures that KIM-API correctly handles the cmake build process. Just use this as a template.

cmake_minimum_required(VERSION 3.10)
list(APPEND CMAKE_PREFIX_PATH $ENV{KIM_API_CMAKE_PREFIX_DIR})
find_package(KIM-API 2.0 REQUIRED CONFIG)

if(NOT TARGET kim-api)
  enable_testing()
  project("${KIM_API_PROJECT_NAME}" VERSION "${KIM_API_VERSION}"
    LANGUAGES CXX C Fortran)
endif()

set(MODEL_DRIVER_NAME "MyLJ__MD_000000000000_000")

add_kim_api_model_driver_library(
  NAME                    ${MODEL_DRIVER_NAME}
  CREATE_ROUTINE_NAME     "model_driver_create"
  CREATE_ROUTINE_LANGUAGE "cpp"
)

target_sources(${MODEL_DRIVER_NAME} PRIVATE MyLJ.cpp)

Key Takeaways and Best Practices

As you develop your own model drivers, keep these points in mind:

  1. Always check return codes: Every KIM function returns an error code. Check it!
  2. Handle optional arguments gracefully: Not all simulators will request all possible outputs. Always check if pointers are nullptr before using them.
  3. Think about performance: The Compute function is called millions of times. Every optimization matters. Consider learning about compute dispatch for production code.
  4. Use the adapter pattern: Static member functions that retrieve the model object and forward to regular methods keep your code clean and compatible with KIM's C-style callbacks.
  5. Remember multi-language users: Your model might be called from Fortran, C, or other languages. Stick to KIM's patterns to ensure compatibility.
  6. Debug systematically: If something goes wrong, check kim.log first. Compile KIM-API in debug mode for more detailed error messages.

Common Pitfalls and How to Avoid Them

Ghost Atoms and Contributing Particles

Remember that simulators may include "ghost" atoms for periodic boundaries. Always check the particleContributing array and handle non-contributing particles correctly. This is especially important for parallel simulations where ghost atoms ensure correct forces across processor boundaries.

Unit Conversions

Always convert your parameters to the requested units. KIM provides conversion functions -- use them! Different simulators and different countries use different unit systems, and KIM handles this complexity for you.

Array Indexing

Remember that if you set NUMBERING::zeroBased, particle indices run from 0 to N-1. If you set NUMBERING::oneBased (common for Fortran models), they run from 1 to N. Be consistent!

Beyond the Basics

Once you're comfortable with basic model drivers, you can explore advanced techniques:

  • Compute Dispatch: Use templates to generate optimized versions of your compute function for different combinations of outputs
  • PIMPL Pattern: Separate your interface from implementation for cleaner code and better encapsulation
  • Parameter Files: Support multiple parameter sets for different materials
  • Callbacks: Implement advanced features like stress tensors and virial calculations
  • Multi-Species Support: Extend beyond single-element systems
  • Parallel Optimization: Use OpenMP or vectorization in your compute loops

Final words

Once you have created this model driver, simply install it as

kim-api-collections-management install user <path/to/your/driver/folder>

in above example it would be something like

kim-api-collections-management install user MyLJ__MD_000000000000_000

Given that you kept all your code in a folder named MyLJ__MD_000000000000_000.

Creating KIM-API model drivers might seem daunting at first, but it's really about understanding a few key patterns and concepts. The C++ features we explored -- reinterpret_cast, templates, extern "C", static functions -- all serve specific purposes in creating a flexible, performant interface between your physics and simulators written in any language.

Remember that KIM-API's design choices all stem from its goal of true portability. Every time you see something that seems unnecessarily complex, ask yourself: "How would this work if called from Fortran?" The answer usually explains the design choice.

The example we walked through is intentionally minimal -- no PIMPL pattern, no compute dispatch, just the essentials. This is perfect for learning and for simple potentials. As your models grow more complex, you can gradually adopt more advanced patterns. Don't get too caught up in the complexity. Start with a working example, modify it step by step, and build your understanding through practice.

The beauty of KIM-API is that once you've written your model driver, it works everywhere. Your carefully crafted potential becomes a portable, reusable piece of scientific software that others can use and build upon. Whether someone calls it from a Fortran code written in the 1990s or a cutting-edge C++ simulator, your model will work seamlessly. That's the power of standardization, and you're now equipped to be part of it.

Happy coding, and welcome to the KIM community, oh and please submit your driver to https://openkim.org !

Additional Resources


Appendix: Understanding Models vs Model Drivers

Now that you understand how to create a model driver, let me introduce you to a crucial concept that makes KIM-API particularly powerful: the separation between model drivers and models. This distinction might seem like unnecessary complexity at first, but it's actually a brilliant design that promotes code reuse and scientific reproducibility.

The Model-Driver Relationship

Think of this relationship like a recipe book versus actual meals. A model driver is like a recipe for "pasta with sauce" -- it describes the general process, the steps involved, and what ingredients (parameters) are needed. A model, on the other hand, is like "spaghetti carbonara" -- it's a specific instance that uses the pasta recipe with particular ingredients (bacon, eggs, parmesan, specific cooking times). Another useful analogy is the classes in C++/Python, where model driver is the class, where as the model is a particular instantiation of this class with desired set of parameters.

In KIM-API terms, your Lennard-Jones model driver that we just built is the recipe. It knows how to calculate forces and energies given three parameters: sigma, epsilon, and cutoff. But it doesn't know what those values should be for any particular material. That's where models come in.

Creating a Silicon Lennard-Jones Model

Let's look at a concrete example. Suppose we want to create a Lennard-Jones model for silicon. We don't need to write any new C++ code -- we just need to tell KIM to use our existing driver with silicon-specific parameters.

Here's the model's CMakeLists.txt file:

cmake_minimum_required(VERSION 3.10)
list(APPEND CMAKE_PREFIX_PATH $ENV{KIM_API_CMAKE_PREFIX_DIR})
find_package(KIM-API 2.0 REQUIRED CONFIG)
if(NOT TARGET kim-api)
  enable_testing()
  project("${KIM_API_PROJECT_NAME}" VERSION "${KIM_API_VERSION}"
    LANGUAGES CXX C Fortran)
endif()

add_kim_api_model_library(
  NAME            "LJSi_MO_111111111110_000"
  DRIVER_NAME     "MyLJ_MD_111111111111_000"
  PARAMETER_FILES "si.param"
)

Notice how different this is from the model driver's CMakeLists.txt. Instead of add_kim_api_model_driver_library, we use add_kim_api_model_library. The key line is DRIVER_NAME -- this tells KIM which driver to use for this model. We're essentially saying, "Create a model called LJSi that uses the MyLJ driver with the parameters found in si.param."

The Parameter File

The parameter file for our silicon model (si.param) contains:

Si  7.9111800  3.1743100  1.9778000

Let me break down what each number represents based on how our driver reads the file:

  • Si: The chemical species (silicon)
  • 7.9111800: The cutoff distance in Angstroms
  • 3.1743100: The epsilon parameter in eV (the depth of the potential well)
  • 1.9778000: The sigma parameter in Angstroms (the distance at which the potential is zero)

Remember the code in our driver's constructor that reads these values:

buffer >> species;   // Reads "Si"
buffer >> cutoff;    // Reads 7.9111800
buffer >> epsilon;   // Reads 3.1743100
buffer >> sigma;     // Reads 1.9778000

The Beauty of Parameter File Flexibility

Here's something important to understand: KIM-API places no restrictions on the format of parameter files. Your model driver decides how to read and interpret them. This flexibility means you could:

  • Use JSON or XML for more complex parameter sets
  • Include additional parameters like temperature-dependent corrections
  • Add comments and documentation within the parameter file
  • Store parameters for multiple species in a single file

For example, a more sophisticated parameter file might look like:

# Lennard-Jones parameters for various elements
# Format: Element cutoff epsilon sigma [optional: comments]
Si  7.9111800  3.1743100  1.9778000  # Fitted to crystalline silicon
Ge  8.1234500  3.2456700  2.0123400  # Fitted to germanium

Your driver would need to parse this format appropriately, perhaps skipping comment lines and handling multiple elements.

Creating Multiple Models from One Driver

The real power of this separation becomes clear when you consider creating models for different materials. With our single Lennard-Jones driver, we can create models for:

  • Silicon (LJSi_MO_111111111110_000)
  • Argon (LJAr_MO_111111111110_000)
  • Any other element or compound that can be approximated with Lennard-Jones

Each model would have its own parameter file (toy example below, do not use them in actual simulations):

ar.param:

Ar  8.5000000  0.0104000  3.4000000

ne.param:

Ne  5.5000000  0.0031000  2.7400000

Without writing a single line of additional C++ code, we've created three different interatomic potentials! This is the essence of code reuse in scientific software.

The Model Identification System

You might have noticed the cryptic numbers in model names like "LJSi_MO_111111111110_000". This is KIM's systematic naming convention:

  • LJSi: Human-readable identifier (Lennard-Jones for Silicon)
  • MO: Indicates this is a Model (not a Model Driver, MD for model driver)
  • 111111111110: A unique identifier (like a serial number)
  • 000: Version number

This naming system ensures that every model in the KIM repository has a unique identifier, making scientific results reproducible. When someone publishes a paper using "LJSi_MO_111111111110_000", anyone can get exactly the same model and reproduce their results.

Why This Separation Matters

The model/driver separation embodies several important software engineering principles:

  1. Don't Repeat Yourself (DRY): The physics implementation exists in one place (the driver), while parameters can vary across many models.
  2. Separation of Concerns: The driver handles the physics and algorithms; the model handles the material-specific parameters.
  3. Scalability: One driver can support hundreds of models without code duplication.
  4. Maintainability: Bug fixes or improvements to the driver automatically benefit all models that use it.
  5. Scientific Reproducibility: Each model has a unique identifier and fixed parameters, ensuring results can be reproduced exactly.

This design also makes it easier for different communities to collaborate. A physicist might develop a sophisticated driver implementing new theoretical insights, while materials scientists can create models by fitting parameters to their specific materials of interest. Neither group needs to understand the other's domain deeply -- the interface between driver and model provides a clean separation.

Remember, every model in the KIM repository started just like this -- someone implemented a driver, someone (possibly the same person) determined appropriate parameters, and they combined them into a model that others can now use. Your contributions, whether drivers or models, become part of this growing ecosystem of scientific software.