Participant systems will be evaluated on the official test dataset using automated metrics designed to measure both syntactic correctness and semantic fidelity of generated Structured Text programs.
Task A evaluates the ability of a system to generate IEC 61131-3 Structured Text (ST) programs from natural language descriptions.
| Metric | Description |
|---|---|
| Cosine Similarity (CS) | Semantic similarity between generated and reference code representations using embedding-based cosine similarity. |
| Structural Match Score (SMS) | Measures the similarity of conditions, assignments, timers, loops, and control-flow structures between the generated and reference ST code. |
| Program Dependence Graph Similarity (PDG) | Evaluates the similarity of data and control dependencies between generated and reference programs using Program Dependence Graphs. |
Task B evaluates the ability of a system to generate Rust programs from natural language descriptions.
| Metric | Description |
|---|---|
| Cosine Similarity (CS) | Semantic similarity between generated and reference code representations using embedding-based cosine similarity. |
| Structural Match Score (SMS) | Measures the similarity of conditions, assignments, timers, loops, and control-flow structures between the generated and reference Rust code. |
| Program Dependence Graph Similarity (PDG) | Evaluates the similarity of data and control dependencies between generated and reference programs using Program Dependence Graphs. |
The final ranking of participating teams will be determined by evaluating both subtasks.