Writing a ST compiler

RocketTester · Aug 10, 2012

Peter,
I am shooting from the hip and out of my league, but offer a bit. I just reviewed some of my code in VB.net and TwinCAT (CodeSys variant).

In VB.net, you define a structure with:
Structure StructureName
element AS vartype
...
End Structure

You can then declare variables to be that structure type, and they can be arrays. Help says you can declare a structure within a structure (haven't tried). It also says structures can include procedures (subroutines), similar to classes (above my pay grade).

In TwinCAT, you define structures in the "Data Types" tab. Below is one of mine:
TYPE TraceHistory :
STRUCT
TseqTime : LREAL;
event : UINT;
CHjmp : USINT;
level : REAL;
END_STRUCT
END_TYPE

You can then define variables to be of that structure type, and they can be arrays. I don't know if you can put a structure within a structure. The variables can be declared local to the POU by putting them at the top of the program block as:
PROGRAM Sequence
Var
TraceCurr : TraceHistory;
END_VAR

or under "Global Variables" in the "Resources" tab:
VAR_GLOBAL
Trace : ARRAY [1..Ntrace] OF TraceHistory;
END_VAR

Variable declarations are the same whether using ST, LD, or any of the IEC61131-3 languages, though the syntax is like ST.

In sum, TwinCAT variable declarations appear similar to VB.net, but the nomeclature varies. As I recall, structures were declared in VB6 using TYPE ... END_TYPE. I am clueless about what all this means at the compiler level.

Glad you are developing an ST compiler. We use several of your Delta Motion controllers and several engineers have started writing code for them in Delta's current language. Having more standard PLC programming would be very helpful and ST is best for engineers with no prior exposure to PLC's.

Peter Nachtwey · Aug 11, 2012

RocketTester said:
You can then declare variables to be that structure type, and they can be arrays. Help says you can declare a structure within a structure (haven't tried). It also says structures can include procedures (subroutines), similar to classes (above my pay grade).

In TwinCAT, you define structures in the "Data Types" tab. Below is one of mine:
TYPE TraceHistory :
STRUCT
TseqTime : LREAL;
event : UINT;
CHjmp : USINT;
level : REAL;
END_STRUCT
END_TYPE

Yes, that is how you do it. If must be done in TYPE..END_TYPE block just as you have shown. The STRUCT and END_STRUCT is not used within a VAR..END_VAR block. In the VAR and END_VAR block you use the type name such as TraceHistory like this

VAR
MyHistory : TraceHistory;
END_VAR;

You can then define variables to be of that structure type, and they can be arrays. I don't know if you can put a structure within a structure.

Yes, you can but you must use the TYPE..END_TYPE blocks to define the structures,

Variable declarations are the same whether using ST, LD, or any of the IEC61131-3 languages, though the syntax is like ST.

Yes, defining and declaring variables is most of the compiler.

Glad you are developing an ST compiler.

Actually our current language is a subset of the ST compiler.

We use several of your Delta Motion controllers and several
engineers have started writing code for them in Delta's current language. Having more standard PLC programming would be very helpful and ST is best for engineers with no prior exposure to PLC's.

Currently we support a limited subset of ST that is forced to be in steps like SFC. Our initial goal was to make coding the RMC easy. We felt that simply showing a new user a blank scren would be too much for initial users. The new ST compiler is meant to be faster and for the advanced user that doesn't mind seeing a blank screen in front of them when they start and will be comfortable programming use Notepad++. Notepad++ is based in Scintilla and so is the RMCTools editor.

The key thing we must alway maintain is to be able to do what the PLCs cannot. We must be able to code synchronously to our motion loop. That is why we count the micro seconds per step. PLC scans are not deterministic.

seppoalanen · Aug 11, 2012

Peter Nachtwey said:
.. C#, Python, Pascal, VB? C# or other languages .....

C# is like "raped" Java.
Java works in Windows, Linux and Mac as well.
:book:Thats why I'm using Java with NetBeans, they both are free.

james_plc · Sep 3, 2012

Hi,
Does anyone know who uses Beckhoff technology in Alberta Canada?

Thanks in advance!

Peter Nachtwey · Sep 3, 2012

I think Java is a little heavy handed, same with ST

seppoalanen said:
C# is like "raped" Java.
Java works in Windows, Linux and Mac as well.
:book:Thats why I'm using Java with NetBeans, they both are free.

I like Netbeans too but it is very wordy compared to C or even C#. I think of C# as being a more pragmatic version of Java. We tried converting parts of our software from C to C#. C# was too slow.

Update:

My ST compiler is coming along nicely. I have type checking now and it is as strict as the specification. I am wondering how to do the type conversions now. There are many types, about 30, and to do all the different combinations of type conversions would required many hundreds of type conversion functions, over 800. I am limiting the type conversions a bit so there are about 50. I am not allowing any type conversions that do two things at once like change the size and change from a unsigned to signed. Sometimes a person will need to use two type conversions to get what they want but I think this is OK because I have found a problem with converting a USINT to a DINT. One can get two results if doing a USINT_TO_DINT. Do USINT_TO_DINT zero extend the USINT to a UDINT first then convert the UDINT to a DINT or does the USINT_TO_DINT first convert the USINT to a SINT and then sign extend the SINT to a DINT. It makes a difference when the most significant bit of the USINT is set. My fix is to not have a USINT_TO _DINT. I will make the user chose what conversion he wants to do first which most likely be
UDINT_TO_DINT(USINT_TO_UDINT(X));
The alternative is
SINT_TO_DINT(USINT_TO_SINT(X));
This is very wordy and I am not a fan of wordy languages but it will keep people out of trouble and reduce the number of conversion functions I need to support.

The way I did type checking not only reports the error well but it also makes it easy for the code generator part to know whether a multiply should be signed or unsigned. The same goes for other operators.

I have nice error reporting. I report the exact row(s) and column(s) where and error occurred.

Peter Nachtwey · Sep 20, 2012

Type conversions are done. There are about 70 of them. It works well because when I get to an operator I know what kind of data types are being used. For instance one can't use a floating point add on integer data. The CPUs usually have many different type of ADD. One for each size and certainly one for REALs and non REALs.

Now to something new.

What do you do when you want to allocate a DWORD variable on top of a DINT variable.
In the IEC specification one must locate both the DWORD variable and the DINT variable on a direct variable like %MD4. What I don't like about this is that absolute addresses MUST be used.
There is no way to allocate the DWORD variable on the DINT variable directly.
One cannot do this:

Code:

VAR
    I : DINT;
    W AT I : DWORD;
END_VAR

instead one must do this:

Code:

VAR
    I AT %MD4 : DINT;
    W AT %MD4 : DWORD;
END_VAR

This isn't very useful. How do the other compilers get around this problem. It requires an extension.

I know this isn't part of the specification but do any of the other ST compilers support UNIONs? Then I could write.

Code:

VAR
     U : UNION
              I :DINT;
              W : DWORD;
          END_UNION
END_VAR

Then I could access U as a DINT, U.I, or as a DWORD, U.W. Then I don't care where U is located.

Here is another thing that drove me nuts for a while. This is legal

Code:

PROGRAM TEST
    VAR
         I AT %MW4 : DINT
         J : DINT
    END_VAR
END_PROGRAM

Direct variables can't be in the same VAR..END_VAR block. It must be done this way.

Code:

PROGRAM TEST
    VAR
         I AT %MW4 : DINT
    END_VAR
    VAR
         J : DINT
    END_VAR

END_PROGRAM

The directly located variables must not be combined in the normal variables' VAR..END_VAR block.

I know I have seen bit arrays located on DWORDs in some threads. I can't see where that is legal according to the IEC specification unless direct variables are used.

ndzied1 · Sep 20, 2012

CoDeSys Just does a UNION Anyway

At the bottom is a snipped from the CoDeSys 3.5 Help File. They just created "extensions" for things they wanted to include in their version of the language. I have never read the IEC spec so I don't know if there is any official verbiage on extensions but as one of, if not the largest IEC61131 language kernel, perhaps you could shoot for CoDeSys compliant instead of IEC61131 complaint? (I know that is not what you want but just a thought).

Also, they do not put a restriction on you to separate declarations with and without absolute addresses. Here is a declaration block from a working CoDeSys 2.3.9 program

Code:

VAR

    sTest:    STRING;
    bTest:    BYTE;

    I: INT;

    rString        AT %MW0:     STRING;
    aBytes         AT %MW0:    ARRAY [0..31] OF BYTE;

    MCS_STX        AT %MB0:    SINT;
    MCS_LEN        AT %MB1:    SINT;
    MCS_ADR        AT %MB2:    SINT;
    MCS_NUM        AT %MB3:    SINT;
    MCS_FEA        AT %MD4:    DINT;
    MCS_FEB        AT %MD8:    DINT;
    MCS_FEC        AT %MD12:    DINT;

    MCS_NFED       AT %MD16:    DINT;
    MCS_NFEE       AT %MD20:    DINT;
    MCS_NFEF       AT %MD24:    DINT;

    MCS_TEMP       AT %MW28:    INT;
    MCS_CRC        AT %MB30:    USINT;
    MCS_ETX        AT %MB31:    USINT;


(* These values are used to create the string *)
    NEW_FEA:    DINT := 7;
    NEW_FEB:    DINT := 3;
    NEW_FEC:    DINT := 17;
    NEW_NFED:    DINT := 10;
    NEW_NFEE:    DINT := 123;
    NEW_NFEF:    DINT := 1021;
    NEW_TEMP:    SINT := 26;
    NEW_LEN:    SINT := 28;
    NEW_ADR:    SINT := 42;
    NEW_NUM:    SINT := 3;

    xPB_MakeString:    BOOL;

    xPB_ParseString:    BOOL;

    PB_MakeString:    R_TRIG;

    CRC_TEMP:    USINT;

(* These values are set by the function Block *)
    fbNEW_FEA:    DINT;
    fbNEW_FEB:    DINT;
    fbNEW_FEC:    DINT;
    fbNEW_NFED:    DINT;
    fbNEW_NFEE:    DINT;
    fbNEW_NFEF:    DINT;
    fbNEW_TEMP:    INT;

    MyParseFB:ParseString;

END_VAR

In CoDeSys 2.3 there was no UNION and you had to use the method of declaring different types at the same address.

Peter Nachtwey · Sep 20, 2012

Norm, the code for setting up a string or message example is ugly.
Who generated the list of extensions to the standard? Are those NEW extensions from the IEC?

JesperMP · Sep 20, 2012

Just for information, in STEP7 SCL you can actually do this

Code:

VAR 
  I : DINT; 
  W AT I : DWORD;
END_VAR

I find it very odd if you really must tie an AT to an absolute address. That is incredibly limiting the useability of AT.
I use AT quite a bit in STEP7 SCL.

ndzied1 · Sep 20, 2012

Peter Nachtwey said:
Norm, the code for setting up a string or message example is ugly.

Yes

Peter Nachtwey said:
Who generated the list of extensions to the standard? Are those NEW extensions from the IEC?

The list is a straight copy from the CoDeSys Help file so I believe they did that on their own. You can download the software for free from their site. You just don't get any target hardware files to allow you to download to a device without getting the software from the device manufacturer.

TurpoUrpo · Sep 20, 2012

I think it is most clean to just extend AT usability.

Peter Nachtwey · Sep 20, 2012

ndzied1 said:
Yes

The list is a straight copy from the CoDeSys Help file so I believe they did that on their own.

Good to know. I think the UNION is a good option and it is easy to implement.
References and BIT are also good but a low priority.
I don't want to implement pointers. Those are dangerous in the wrong hands and I can tell by how many problems people have with pointers on this forum it wouldn't be wise to implement something that causes more headaches and phone support calls.

BTW, here is an example of what I can do now.

Code:

PROGRAM TEST
	VAR
		Y,A,B,C,D,E,F,T: REAL;
		G: WORD;
		I,J : INT;
	END_VAR

	CASE I OF
	0: y:=a+b*t+c*t**2.0+d*t**3.0+e*t**4.0+f*t**5.0;
	1: y:=a+(b+(c+(d+(e+f*t)*t)*t)*t)*t;
	2: y:=a+t*(b+t*(c+t*(d+t*(e+t*f))));
	3: y:=((((f*t+e)*t+d)*t+c)*t+b)*t+a;
	ELSE
		Y:=0.0;
		J := 2**3;
	END_CASE;
END_PROGRAM

Here is the output. The output is a pseudo stack oriented assembly language similar to how we do it now. Eventually, I will have to rewrite the code generation part to output machine code instead of assembly language. I was testing how the compiler would handle various forms of nesting.

Code:

	LD	INT# I	; offset=34
	ST	TEMP
	LD	ANY_INT# 0
	LD	TEMP
	EQ
	JMPCN	L001
	LD	REAL# A	; offset=4
	LD	REAL# B	; offset=8
	LD	REAL# T	; offset=28
	MUL	; REAL
	ADD	; REAL
	LD	REAL# C	; offset=12
	LD	REAL# T	; offset=28
	LD	ANY_REAL# 2.000000
	EXPT	; REAL
	MUL	; REAL
	ADD	; REAL
	LD	REAL# D	; offset=16
	LD	REAL# T	; offset=28
	LD	ANY_REAL# 3.000000
	EXPT	; REAL
	MUL	; REAL
	ADD	; REAL
	LD	REAL# E	; offset=20
	LD	REAL# T	; offset=28
	LD	ANY_REAL# 4.000000
	EXPT	; REAL
	MUL	; REAL
	ADD	; REAL
	LD	REAL# F	; offset=24
	LD	REAL# T	; offset=28
	LD	ANY_REAL# 5.000000
	EXPT	; REAL
	MUL	; REAL
	ADD	; REAL
	ST	REAL# Y	; offset=0

	JMP	L000
L001:
	LD	ANY_INT# 1
	LD	TEMP
	EQ
	JMPCN	L002
	LD	REAL# A	; offset=4
	LD	REAL# B	; offset=8
	LD	REAL# C	; offset=12
	LD	REAL# D	; offset=16
	LD	REAL# E	; offset=20
	LD	REAL# F	; offset=24
	LD	REAL# T	; offset=28
	MUL	; REAL
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	ADD	; REAL
	ST	REAL# Y	; offset=0

	JMP	L000
L002:
	LD	ANY_INT# 2
	LD	TEMP
	EQ
	JMPCN	L003
	LD	REAL# A	; offset=4
	LD	REAL# T	; offset=28
	LD	REAL# B	; offset=8
	LD	REAL# T	; offset=28
	LD	REAL# C	; offset=12
	LD	REAL# T	; offset=28
	LD	REAL# D	; offset=16
	LD	REAL# T	; offset=28
	LD	REAL# E	; offset=20
	LD	REAL# T	; offset=28
	LD	REAL# F	; offset=24
	MUL	; REAL
	ADD	; REAL
	MUL	; REAL
	ADD	; REAL
	MUL	; REAL
	ADD	; REAL
	MUL	; REAL
	ADD	; REAL
	MUL	; REAL
	ADD	; REAL
	ST	REAL# Y	; offset=0

	JMP	L000
L003:
	LD	ANY_INT# 3
	LD	TEMP
	EQ
	JMPCN	L004
	LD	REAL# F	; offset=24
	LD	REAL# T	; offset=28
	MUL	; REAL
	LD	REAL# E	; offset=20
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	LD	REAL# D	; offset=16
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	LD	REAL# C	; offset=12
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	LD	REAL# B	; offset=8
	ADD	; REAL
	LD	REAL# T	; offset=28
	MUL	; REAL
	LD	REAL# A	; offset=4
	ADD	; REAL
	ST	REAL# Y	; offset=0

	JMP	L000
L004:
	LD	ANY_REAL# 0.000000
	ST	REAL# Y	; offset=0

	LD	ANY_INT# 2
	LD	ANY_INT# 3
	EXPT	; ANY_INT
	ST	INT# J	; offset=36

L000:

The next challenge is to implement inline functions like ABS, SHR etc. These functions are too short to make a subroutine call. It is best, fastest, if inline code is generated.

Notice that the exponents must match the time of the base number. This is wasteful. 2^3 usually requires a call to a subroutine but I am to check if the exponent is an integer and if so use simple multiplies.

Thomas_v2 · Sep 20, 2012

Peter Nachtwey said:
I don't want to implement pointers. Those are dangerous in the wrong hands and I can tell by how many problems people have with pointers on this forum it wouldn't be wise to implement something that causes more headaches and phone support calls.

The dangerous thing on pointers is mostly doing pointer arithmetics. The Siemens-SCL AT view is a kind of safe pointer, because you can't do pointer arithmetics with it, and it checks that the sizes of the AT view and where the AT view 'points' to have the same size.
I think you can also realize this with the data type pointer as it is used in codesys. When you don't allow pointer arithmetics and check at that the data sizes are equal, you will get a kind of 'safe pointer' mechanism.

btw:
Do you use any kind of intermediate language at compile process?

My 'private' SCL compiler builds an abstract syntax tree (AST) as intermediate language. On this tree all type conversions and code optimizations are done. All nodes hold also the position (line, column) in the source code to generate informative error messages.

For an example a simplified AST for an SCL expression like this:

Code:

FUNCTION testfunc : VOID

VAR_TEMP
        Y,A,B,C,D,E,F,T: REAL;
        I,J : INT;
END_VAR

BEGIN
i := 123;
a := 5.6;
b := 21;
y:= ( a + i ) * b;

END_FUNCTION

will generate a AST like this:

Code:

+--POU name: testfunc
+--VarTemp adr: 0 name: Y typ: REAL
+--VarTemp adr: 4 name: A typ: REAL
+--VarTemp adr: 8 name: B typ: REAL
+--VarTemp adr: 12 name: C typ: REAL
+--VarTemp adr: 16 name: D typ: REAL
+--VarTemp adr: 20 name: E typ: REAL
+--VarTemp adr: 24 name: F typ: REAL
+--VarTemp adr: 28 name: T typ: REAL
+--VarTemp adr: 32 name: I typ: INT
+--VarTemp adr: 34 name: J typ: INT
+--Block
+--(
+--Assignment
|  +--left=
|  |  +--Ident
|  |  |  +--VarTemp adr: 32 name: I typ: INT
|  +--expr= TypeOut: INT
|  |  +--ValCon Wert: 123 Typ: INT
+--Assignment
|  +--left=
|  |  +--Ident
|  |  |  +--VarTemp adr: 4 name: A typ: REAL
|  +--expr= TypeOut: REAL
|  |  +--ValCon Wert: 5.6 Typ: REAL
+--Assignment
|  +--left=
|  |  +--Ident
|  |  |  +--VarTemp adr: 8 name: B typ: REAL
|  +--expr= TypeOut: REAL
|  |  +--TypeConvExpr TypeOut: REAL
|  |  |  |  +--ValCon Wert: 21 Typ: INT
+--Assignment
|  +--left=
|  |  +--Ident
|  |  |  +--VarTemp adr: 0 name: Y typ: REAL
|  +--expr= TypeOut: REAL
|  |  +--BinExpr TypeOut: REAL
|  |  |  +--left = TypeOut: REAL
|  |  |  |  +--BinExpr TypeOut: REAL
|  |  |  |  |  +--left = TypeOut: REAL
|  |  |  |  |  |  +--Ident
|  |  |  |  |  |  |  +--VarTemp adr: 4 name: A typ: REAL
|  |  |  |  |  +--op   =ADD
|  |  |  |  |  +--right= TypeOut: REAL
|  |  |  |  |  |  +--TypeConvExpr TypeOut: REAL
|  |  |  |  |  |  |  |  +--Ident
|  |  |  |  |  |  |  |  |  +--VarTemp adr: 32 name: I typ: INT
|  |  |  +--op   =MUL
|  |  |  +--right= TypeOut: REAL
|  |  |  |  +--Ident
|  |  |  |  |  +--VarTemp adr: 8 name: B typ: REAL
+--)

Implicit type conversion are inserted in the AST when doing the parse process.

Whith this intermediate code it's relatively easy to change to another destination CPU architecture.

When the destination architecture is a stack machine, on code generation I go from the root node to the leaf nodes.
When the destination architecture is a accumulator machine (like a S7) I go from leaf nodes up to the root node (and insert temporary variables when neccessary).

Peter Nachtwey · Sep 20, 2012

Thomas, I am using Flex and Bison

Are you a compiler writer? The AST code you show is something that would only make sense to the compiler writer. I am a controls engineer that knew little about compiler writing but I have always wanted to right a compiler. Now I have my chance.

I have something that looks like your AST code. I call it a parse tree. I haven't thought about the parse tree as intermediate code. I have a subroutine that is recursive that walks the parse tree to generate code. Inside this subroutine is the equivalent of a switch or case statement. There is one case for each type of node in the parse tree. I am sure you are doing something similar. I intend to change the code in each of these cases eventually. I would like to start with an interpreter and when I have all the higher level functions debugged I will then change the code generator to output machine code.

I have another similar recursive subroutine for TYPE..END_TYPE and another for the various VAR..END_VAR symbol declarations.

Yes, it is relatively easy to change the code generator part if one is not optimizing too much. Right now I am not worrying about optimizing but eventually I will make a stack frame and have a simple method of keeping the most recently used variables in the registers. This will avoid having to load T from memory over and over again as in the example above. However, the CPU I am using has a cache so there really should be much of a problem anyway.

I can but I am not doing implicit type conversions because it isn't part of the IEC specification. However, I really hate having to write all those INT_TO_DINT and DINT_TO_REAL conversion because it make the code look messy and more complicated than it needs to be. I may say #$%^ it and do the type conversions. Most type conversions don't do anything anyway except the conversions between REALs and other formats.

I have never written a compiler before or taken classes on how to do it. I am learning on-the-fly and having fun doing the compiler seems to work. So far the compiler is very clean and I am using the exact names used in the specification with little deviation. The person that inherits this compiler will find it easy to support.

I am looking forward to the code generation and optimization part.

Thomas_v2 · Sep 21, 2012

Peter Nachtwey said:
Are you a compiler writer? The AST code you show is something that would only make sense to the compiler writer. I am a controls engineer that knew little about compiler writing but I have always wanted to right a compiler. Now I have my chance.

No, and yes ;-)
I wrote documentation tool for ST and reused some parts of this to write a simple compiler, just for fun (I had an bike accident, and for this reason I had some time to read books and websites about this, I think the standard book of compiler design is the so called dragon book) .
It isn't a complete ST implementation. But if you got a good basic structure for the parser/compiler/code generator the rest will be largely busywork (e.g. the 70 explicit type conversions you mentioned).

Peter Nachtwey said:
I have something that looks like your AST code. I call it a parse tree. I haven't thought about the parse tree as intermediate code. I have a subroutine that is recursive that walks the parse tree to generate code. Inside this subroutine is the equivalent of a switch or case statement. There is one case for each type of node in the parse tree. I am sure you are doing something similar.

My parser builds the parse tree. Then I've got different visitors which walk through the parse tree an do things like code optimization like constant folding and the code generation.

To do really good code optimization, the compiler needs always to know the content of the working registers to avoid unnecessary loading of values. But I think this can't be done on the parse tree - it needs another intermediate code. As far as I know the gcc uses three levels of internal intermediate code...

Writing a ST compiler

RocketTester

Member

Peter Nachtwey

Member

seppoalanen

Member

james_plc

Member

Peter Nachtwey

Member

Peter Nachtwey

Member

ndzied1

Lifetime Supporting Member

Peter Nachtwey

Member

JesperMP

Lifetime Supporting Member + Moderator

ndzied1

Lifetime Supporting Member

TurpoUrpo

Lifetime Supporting Member

Peter Nachtwey

Member

Thomas_v2

Member

Peter Nachtwey

Member

Thomas_v2

Member

Similar Topics