Converter: Multi-Format Structure Conversion and Filtering

Description & Scientific Utility

The FlexCryst: Converter is the core structural utility used to implement the Data Mining methodologies described in our recent publications. It enables the automated preparation of massive datasets required for Machine Learning in crystallography.

The module supports high-fidelity conversion between five formats:
*.cif, *.cssr, *.mol2, *.pdb, and *.poly (DL_POLY).

Strict Filter: AI-Compliance Standard

Data integrity is paramount for Neural Network Potential (NNP) training. The Strict Filter applies the "Cleanser" logic to identify 132,381 inconsistent records in the CSD v5.46.

By screening for Z' inconsistency and forbidden interatomic contacts (1.89 – 2.23 Å), this module ensures that empirical potentials are extracted only from physically validated crystal packings.

View Detailed Filter Criteria & Rejection Categories

Graphical User Interface

FlexCryst Converter GUI

Graphical user interface of FlexCryst Converter.

Workflow: Force Field Development Pipeline

The development of a general force field by machine learning follows a rigorous 5-step pipeline. The FlexCryst-Converter acts as the essential gatekeeper for data quality:

  1. Extraction: Retrieve structures from the CSD using ConQuest.
  2. Filtering: Cleanse data with the FlexCryst-Converter (Strict Filter).
  3. Optimization: Refine the force field with FlexCryst-Optimization.
  4. Validation: Verify accuracy using FlexCryst-Score and FlexCryst-Prediction.
  5. Applications: Co-crystal screening, solubility prediction, and thermodynamic analysis.
Methodological workflow for extracting and validating machine learning force fields

Figure 2: The 4-stage methodological pipeline for force field engineering.