Sphinx C-- documentation

Updated September 14, 2002. Under construction.

Partially based on a document by Peter Cellik, May 1995.
I used Peter's document as a starting point, but ended up totally rewriting it. There are several paragraphs remaining that can be traced back to Peter's document, and due acknowledgement is given here.

I am using C-- to develop applications for the MenuetOS operating system, so the main thrust of this page is 32-bit FLAT memory coding. However, the compiler itself runs at the DOS commandline, and can produce output for DOS 16- or 32-bit or for Windows 32-bit apps or DLLs. The official Sphinx C-- project page has example code, libraries, etc: http://www.sheker.chat.ru/index_e.htm. My own C-- introductory page is: http://www.goosee.com/cmm/. My MenuetOS page is: http://www.goosee.com/menuetos/.

Note that Michael Sheker's C-- "readme" file has been translated to English, and is included with the full package downloadable from his site. It is in places difficult to read, due to the fact that it is a translation, and also it is basically a historical record of changes introduced with each version. It is useful as a reference in conjunction with this document. I am sifting through it and have extracted the most important information to this page.

Michael has provided me with assistance. He has supplied some more translated files, and I have incorporated most of them  into this document, by rewriting them in a format that seems to me like readable English. Due acknowledgement is also given here of that source.

This document is based on my very limited experience with C--, and input is welcome.
Please use Mozilla Composer if you would like to contribute to updating this page! (see www.goosee.com/best/).
You might like to add text in some other color, such as red, prior to me vetting it and making it black.
-- email any updates to me. Find my email address at: http://www.goosee.com/bkauler/.


Contents

Introduction
Identifiers and symbols
Constants
Data types
Expressions
Declaring functions and macros
Conditional statements
Arrays
Structures
Pointers
Other syntax
Internal macros
Inline assembly
Directives and commandline options
Appendices
    How to install C--
    Further documentation


Introduction

C-- is kind of halfway between C and assembly language. It is designed to offer the benefits of a high-level language, with the efficiency and flexibility of low-level coding. C-- allows you to slide up and down as you wish between these two levels.


Identifiers and symbols

Identifier naming

C-- identifiers must start with either an underscore (_) or an upper or lower lower case letter. They then may be followed by any combination of underscores, upper or lower case letters or numerical digits (0 to 9). The total length of an identifier may not exceed 64 characters.

Some examples of valid C-- indentifiers are:

_DOG
CoW
loony12
HowdYBoys_AND_Girls
WOW___
x
Some examples of invalid C-- indentifiers are:

12bogus              /* cannot start an identifier with a numerical digit */
y_es sir /* spaces not allowed */
the-end /* hyphens not allowed */

Symbols

SYMBOL | FUNCTION               | EXAMPLE
------------------------------------------------------------------------
/* | start comment block | /* comment */
*/ | end comment block | /* comment */
| |
// | comment to end of line | // comment
| |
= | assignment | AX = 12;
+ | addition | AX = BX + 12;
- | subtraction | house = dog - church;
* | multiplication | x = y * z;
/ | division | x1 = dog / legs;
& | bitwise AND | polution = stupid & pointless;
| | bitwise inclusive OR | yes = i | mabe;
^ | bitwise exclusive OR | snap = got ^ power;
<< | bit shift left | x = y << z;
>> | bit shift right | x = y >> z;
| |
+= | addition | fox += 12; // fox = fox +12;
-= | subtraction | cow -= BX; // cow = cow - BX;
*= | multiplication | cow *= dog; // cow = cow * dog;
/= | division | cow /= dog; // cow = cow / dog;
&= | bitwise AND | p &= q; // p = p & q;
|= | bitwise inclusive OR | p |= z; // p = p | z;
^= | bitwise exclusive OR | u ^= s; // u = u ^ s;
<<= | bit shift left | x <<= z; // x = x << z
>>= | bit shift right | x >>= z; // x = x >> z
| |
>< | swap | x >< y; /* exchange values of x and y */
| |
== | equal to | IF(AX == 12)
> | greater than | IF(junk > BOGUS)
< | less than | if( x < y )
>= | greater or equal to | if(AX >= 12)
<= | less than or equal to | IF(BL <= CH)
!= | not equal to | IF(girl != boy)
<> | different than | IF(cat <> dog) /* same function as != */
| |
@ | insert code | @ COLDBOOT(); /* insert COLDBOOT code */
: | dynamic procedure | : functionname () // declare functionname
$ | assembly operation | $ PUSH AX /* push AX onto stack */
# | offset address of | loc = #cow; /* loc = address of cow */
# | compiler directive | #define cow dog;
! | NOT operator | !x_var; if(!xflag)
... |any number of parameters| void proc(...);
:: |allowing of visibility | ::var=0;
| |
~ | | This symbol is currently unused.

Reserved words

The following is a list of C-- reserved identifiers which can not be used as general identifiers for they have already been defined or reserved for the compiler (for C-- v0.238).
This list can be obtained from the C-- compiler at anytime by running it with the /WORDS command line option:

BREAK        CARRYFLAG     CASE       CONTINUE      ELSE
EXTRACT FALSE FOR FROM GOTO
IF LOOPNZ MINUSFLAG NOTCARRYFLAG NOTOVERFLOW
NOTZEROFLAG OVERFLOW PLUSFLAG RETURN SWITCH
TRUE WHILE ZEROFLAG
__CODEPTR__ __COMPILER__ __DATAPTR__ __DATESTR__ __DATE__
__DAY__ __FILE__ __HOUR__ __LINE__ __MINUTE__
__MONTH__ __POSTPTR__ __SECOND__ __TIME__ __VER1__
__VER2__ __WEEKDAY__ __YEAR__
_export asm break byte case
cdecl char continue default do
dword else enum extern far
fastcall float for goto if
inline int interrupt long loop
loopnz pascal return short signed
sizeof static stdcall struct switch
union unsigned void while word
ESCHAR ESBYTE ESINT ESWORD ESLONG
ESDWORD ESFLOAT
CSCHAR CSBYTE CSINT CSWORD CSLONG
CSDWORD CSFLOAT
SSCHAR SSBYTE SSINT SSWORD SSLONG
SSDWORD SSFLOAT
DSCHAR DSBYTE DSINT DSWORD DSLONG
DSDWORD DSFLOAT
FSCHAR FSBYTE FSINT FSWORD FSLONG
FSDWORD FSFLOAT
GSCHAR GSBYTE GSINT GSWORD GSLONG
GSDWORD GSFLOAT
AX CX DX BX SP BP SI DI
EAX ECX EDX EBX ESP EBP ESI EDI
AL CL DL BL AH CH DH BH
ES CS SS DS FS GS ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)
ST
st(0) st(1) st(2) st(3) st(4) st(5) st(6) st(7)
st
If you use at compilation the option "/ia" on the command line (or "#pragma option ia" in the source code), which allows use of the assembly instructions without the "asm" or "$" keywords, all names of the assembly instructions become reserved words.

Note, register names have to be upper-case when explicitly used in high-level C-- statements. However, for inline asm code, upper or lower-case may be used. This means that all of the following are also reserved identifiers within assembly language instructions:

ax   cx   dx   bx   sp   bp   si   di
eax  ecx  edx  ebx  esp  ebp  esi  edi
al   cl   dl   bl   ah   ch   dh   bh
es   cs   ss   ds   fs   gs
DR0   DR1   DR2   DR3   DR4   DR5   DR6   DR7
CR0   CR1   CR2   CR3   CR4   CR5   CR6   CR7
TR0   TR1   TR2   TR3   TR4   TR5   TR6   TR7
MM0   MM1   MM2   MM3   MM4   MM5   MM6   MM7
XMM0  XMM1  XMM2  XMM3  XMM4  XMM5  XMM6  XMM7
dr0   dr1   dr2   dr3   dr4   dr5   dr6   dr7
cr0   cr1   cr2   cr3   cr4   cr5   cr6   cr7
tr0   tr1   tr2   tr3   tr4   tr5   tr6   tr7
mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7
xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7

Automatic registers

At creation of library procedures it may be necessary to write variants of the procedure for operation in 16-bit and 32-bit modes, which differ from each other only by use of 16-bit or the 32-bit registers depending on whether compilation is being done for 16-bit or 32-bit code. It is possible to write only one the procedure, using in it the following syntax for the registers:
  (E)AX = 0;
The compiler will use at compilation either AX for 16-bit code, or the register EAX for 32-bit code.
Use of the automatic registers will allow simplification of library files and make by their meaning more clear.

Predetermined identifiers

Identifiers, defined by the compiler depending on the mode of compilation:
__TLS__     there is a compilation under windows (w32, w32c, dll).
__DLL__     there is a compilation of a dll.
__CONSOLE__ there is a compilation of the console windows application.
__WIN32__   there is a compilation of a GUI application.
__FLAT__    the 32-bit code is compiled.
__MSDOS__   the 16-bit code is compiled.
__TINY__    the memory model tiny in 16-bit mode is used.
__SMALL__   the memory model small in 16-bit mode is used.
__DOS32__   the 32-bit code for DOS (d32) is compiled.
__COM__     the com-file is compiled.
__SYS__     the sys-file is compiled.
__ROM__     the rom-file is compiled.
__OBJ__     the obj-file is compiled
__TEXE__    the exe-file of a model tiny is compiled.
__EXE__     the exe-file of a model small is compiled.
codesize    compilation is carried on with optimization on the size of the code.
speed       compilation is carried on with optimization on speed of the code.
cpu         defines the type of the processor for which the compilation is carried on:
     0 - 8086
     1 - 80186
     2 - 80286
     3 - 80386
     4 - 80486
     5 - Pentium
     6 - Pentium MMX
     7 - Pentium II
These identifiers can be checked up by the directives "#ifdef" or "#ifndef". The identifier "cpu" can be used with operators as shown:
#ifdef cpu > 3   //if the type of the processor is more 80386

Constants

Numerical Constants

Expressing numerical constants in decimal (base 10) or hexadecimal (base 16) are the same as in C. To express a numerical constant in binary (base 2) notation, the sequence of 1's and 0's are preceded by 0b, with no spaces in between. To express a numerical constant in octal (base 8) notation, the sequence of octal digits (0 to 7) are preceded by 0o with no spaces.

Some examples:

    0b11111111     // binary.      same as 255 decimal.
0x00F // hexadecimal. same as 15 decimal.
0o10 // octal. same as 8 decimal.
This notation may be used both in normal high-level expressions and in inline assembly instructions. Also, for those familiar with traditional assemblers such as MASM, hexadecimal numbers may also be written with a trailing "h" or "H", as for example:
    000Fh
127AH
Note that it is possible to define a constant as being of a particular datatype, by means of the "L", "U" and "F" postfixes, however the compiler currently ignores this information. Upper or lowercase letters may be used. Some examples:
#define DEF1 1023L    //"long" (signed 32-bit integer) value.
#define DEF2 2561Lu //"unsigned long" (unsigned 32-bit integer).
#define DEF3 3.02F //"float" (32-bit floating point).

Character Constants

Single character constants are, like in C, enclosed in single quotes ('). Also as in C, special characters are expressed by a back slash (\) followed by the key letter or letters. Special characters supported are:

    '\a'    /* beep (same as in C) */
'\b' /* back space (same as C)*/
'\f' /* form feed (same as C)*/
'\l' /* line feed */
'\n' /* carriage return (in C called "newline")*/
'\r' /* carriage return (same as C)*/
'\t' /* horizontal tab */
'\x??' /* ASCII character formed from the ?? which would be two
hexadecimal digits for the character value */
'\???' /* ASCII character formed from the ??? which would be three
decimal digits for the character value */
Any other character following a back slash is just accepted. This allows the single quote to be included by '\'', for '' is the NULL character. Note that NULL is the numeric value zero.

Multiple character constants are also supported by C--. Some examples of multiple character constants are:

	'ab'
'the'
'this is large'
There is no limit to the number of characters in a character constant, but only the last 4 characters are significant. This is the maximum that can be stored in a 32 bit variable. For example, 'this is large' would be equivalent to 'arge'.
C-- treats all character constants as a numeric value of the ASCII value of the character. For multiple character constants, the first character is the most significant, thus the value for 'ab' is 'a'*256+'b'.

String Constants

String constants, like in C, are inclosed in double quotes ("). Special characters are expressed within strings the same way as in character constants. All the special characters are the same as in character constants with the exception of \n which inserts both a carrage return and a line feed.

The current maximum length of string constants is 2048 characters including the 0 terminator (NULL), thus a maximum of 2047 characters.

Constant Expressions

A constant expression is single numerical constant or a list of numerical constants linked together by operators which are evaluated at compile time to a single constant value.
Like all expressions in C--, constant expressions are always evaluated from left to right, regardless of operations! This is quite different that most other languages, and care must be used to remember that 2 + 3 * 2 = 10 and not 8. In other words, there is no such thing as "operator precedence".

Some examples of constant expressions are:

45 & 1 + 3         // equals 4
14 - 1 / 2 // equals 6
1 * 2 * 3 / 2 + 4 // equals 7
The above examples are integers, which is why "14-1/2" gives an answer of "6".
However, C-- supports the "float" datatype. The compiler recognises a constant if it has a fractional component. For example:
float y;      //define 32-bit floating-point variable.
y=80+0.375;
This will load the 32-bit value "0x42A0C000" into "y", where this value is the floating point representation of the value "80.375".
These are acceptable formats for float constants:
   0.98
   -15.75
   3.14e2
   1.234567E-20
"E" means "exponent", so the second-last example is actually 3.14*(10^2). The compiler will automatically recognise any of these as floating point numbers.


Data types

Types of variables

There eight memory variable types in C--, they are byte, word, dword, char, short, int, long, and float. The following table shows the size and range of each of the variable types:

   NAME   | SIZE  |        VALUE RANGE          |        VALUE RANGE
|(bytes)| (decimal) | (hex)
----------------------------------------------------------------------------
byte | 1 | 0 to 255 | 0x00 to 0xFF
word | 2 | 0 to 65535 | 0x0000 to 0xFFFF
dword | 4 | 0 to 4294967295 | 0x00000000 to 0xFFFFFFFF
char | 1 | -128 to 127 | 0x80 to 0x7F
short | 2 | -32768 to 32767 | 0x8000 to 0x7FFF
long | 4 | -2147483648 to 2147483647 | 0x80000000 to 0x7FFFFFFF
float | 4 | +/-1.17E-38 to +/-3.37E38

Equivalents:
byte | unsigned char
int | short or long <<<CAREFUL HERE!
word | unsigned short
dword | unsigned long
Note that the default is that the compiler aligns 16-bit variables on even addresses. In the example below, "vchar" is padded by an extra memory location so that "vshort" can be on an even address. 32-bit variables align on address multiples of 4, so in the example "vshort" is padded with an extra 16 bits as "vint" (32-bit) follows it. This results in less compact code.
cpuspeed.c-- 22: char vchar=0;
00000036 0000 db 0,0
cpuspeed.c-- 23: short vshort=0;
00000038 00000000 dw 0,0
cpuspeed.c-- 24: int vint=0;
0000003C 00000000 dd 0
cpuspeed.c-- 25: long vlong=0;
00000040 00000000 dd 0
cpuspeed.c-- 26: word vword=0;
00000044 0000 dw 0

Warning about 'int'

Michael has decided to follow standard C practice, in which the size of "int" is whatever the default register size is. For example, for DOS 16-bit applications, "int" will be 16 bits. For 32-bit coding, "int" will be 32-bits. "int" is a signed integer number, and for 32-bit coding it will have the same meaning as "long".

If you don't like this chameleon-like nature of "int", then don't use it. You can use "short" for 16-bit signed and "long" for 32-bit signed, and of course "word" and "dword" for unsigned 16-bit and 32-bit.
In fact, there is another reason for avoiding use of "int" -- it is also an instruction, and if you use the "/ia" commandline option, all assembly language instructions become C-- keywords. Hence "int" is now a keyword meaning the interrupt instruction and must not be used as a datatype declaration.

Declaring variables

The syntax for declaring variables is as follows:

variable-type identifier;
Where 'variable-type' is any one of char, byte, int, word, long, dword or float. Several identifers may be declared of the same type as follows:

variable-type identifier1, identifier2, ... , identifierN;
One dimensional arrays may be declared as follows:

variable-type identifier[elements];
Where 'elements' is a constant expression for the amount of entries of that variable type to be in the array.

Initialized arrays can be declared without the indication of number units. Thus the array on actual number of units will be created:
variable-type identifier [] = {const1, const2};
These syntaxes are allowed for string arrays:
char msg1[]="this is a string";
char msg1="this is a string";

Some examples of global declarations:

byte  i,j;       /* declare i and j to be of type byte */
word ss[10]; /* declare ss to be an array of 10 word's */
short h,x[27]; /* declare h to be of type short and declare x to
be an array of 27 short's */
long zz = 0; /* the variable zz of a type long
is declared and it is assigned value 0. */

Compilation and scope of variables

Global variables

The scope of a global variable is below where it is declared. That is, it is visible to the entire program below it. You cannot reference a variable that is declared below the code where it is being referenced -- this will give a compiler error, as the compiler does not know about any variables declared after thay are accessed.

Note however, that the compiler does allow forward referencing of code labels.

Apart from scope, it is useful to know just where in the output runtime executable file the variable is compiled to. This is irrelevant from the programming point of view, but may have some relevance if the structure of the executable file needs to be understood.
Initialised global variables compile (instantiate) where they are declared, whereas uninitialised global variables are compiled at the end of the file. This example illustrates:
char smsg1="CPU speed:";    //compiles right here. Note, square brackets not needed.
dword cpuspeed; //compiles at end of file.

void draw_window(void)
{
sys_window_redraw(1);
sys_draw_window(100<<16+300,100<<16+200,0x001111CC,0x8099BBFF,0x00FFFFFF);
sys_write_text(8<<16+8,0xFFFFFF,"C-- CPU speed example for MenuetOS",34);
sys_draw_button(281<<16+12,5<<16+12,1,0x5599CC);
sys_write_number(4<<16,cpuspeed,25<<16+40,0xFFFF80);
sys_window_redraw(2);
}
//the function "sys_write_text()" passes address of a string. The string is
//compiled immediately after the function.

dword test1=0; //compiles right here.
dword test2; //compiles at end of file.

void main(void)
{
dword test3=0; //temporary, created on stack.
cpuspeed=sys_service(5,0)/1000000; //get CPU speed.
draw_window();
while(1)
{
switch(sys_wait_event())
{
case 1:
draw_window();
continue;
case 2:
/*sys_get_key();*/
continue;
case 3:
if(sys_get_button_id()==1) sys_exit_process();
continue;
}
}
}
However, it is possible to force all global variables to compile in the place in which they are defined. This is done by:
# initallvar TRUE
Or the "/iv" option on the command line.

Local variables

All variables declared inside a function are local to that function and only exist while the function is executing. They are created temporarily on the stack.
They may also be initialised -- see the example "test3" above.

C-- supports the "static" prefix, allowing a local variable to become permanent, that is it is not created on the stack. The syntax is:
static dword zz;
static long xx=0;
"static" can also be used and for global objects (variables, structures, procedures). Such objects will be visible only in that file in which they are declared, allowing use of their names in other files for other purposes.

Dynamic variables

Dynamic variable are only compiled into an application if they are actually referenced by the code. I don't know if this applies to local variables.
Here are global dynamic variables:
 : dword zz;       //the colon prefix makes it dynamic.
 : struct RECT xx;
Of what use are these? ... possibly in header files (include files) or library files. You may also have dynamic procedures (see below).
One further point about dynamic variables -- if compiled into the executable, they will be somewhere at the end of the file, after any dynamic procedures.

Expressions

Constant expressions are introduced above. Now we look at expressions involving constants, variables, and registers.
In particular, there is the most important question about the mixing of explicit register names within a C-- expression -- isn't this risky?

Use of registers in C-- expressions

This code is valid:
    EAX=y+100<<7;
EBX=EAX<<2;
offset1=EAX+EBX+x;
What is particularly interesting about this, is assigning the result of the first statement to EAX means that a temporary variable doesn't have to be defined.
In the third line, notice how registers are mixed with a normal variable, "x", result assigned to variable "offset1".

EAX, EDX

I experimented with a C statement that has no explicit registers in it:
long y,x=5;
void main(void) {
long time=1;
y=100+90*cos(time*1.5+2.0)/x;
The mathematical statement compiles to:
cpuspeed.c-- 52: y=100+90*cos(time*1.5+2.0)/x;
00000131 DB45FC fild [ebp-4]
00000134 D80DB8010000 fmul [1B8h]
0000013A D805BC010000 fadd [1BCh]
00000140 D9FF fcos
00000142 50 push eax
00000143 DB1C24 fistp [esp]
00000146 58 pop eax
00000147 69C0BE000000 imul eax,eax,0BEh
0000014D 99 cwd
0000014E F73D04010000 idiv dword ptr [104h]
00000154 A3D0010000 mov [1D0h],eax
Notice something very important -- only the EAX register is used (altered).
So, what would happen if we put an explicit EAX in the statement? let's do it:
cpuspeed.c-- 52: y=100+90*EAX*cos(time*1.5+2.0)/x;
00000131 69C0BE000000 imul eax,eax,0BEh
00000137 50 push eax
00000138 DB45FC fild [ebp-4]
0000013B D80DBC010000 fmul [1BCh]
00000141 D805C0010000 fadd [1C0h]
00000147 D9FF fcos
00000149 50 push eax
0000014A DB1C24 fistp [esp]
0000014D 58 pop eax
0000014E 5A pop edx
0000014F F7EA imul edx
00000151 99 cwd
00000152 F73D04010000 idiv dword ptr [104h]
00000158 A3D4010000 mov [1D4h],eax
Wow, this compiler is clever! It is smart enough to avoid a clash, even though I deliberately placed the "EAX" in a position in the statement that could have upset the computation. But no, the very first line performs "190*EAX", then pushes it on the stack, and gets it back with the "pop edx".

Now I'm going to torture the compiler further, by including "EDX" in the statement, in a position calculated to do most damage:
cpuspeed.c-- 52: y=100+90*EAX*cos(time*1.5+2.0)*EDX/x;
00000131 69C0BE000000 imul eax,eax,0BEh
00000137 50 push eax
00000138 DB45FC fild [ebp-4]
0000013B D80DBC010000 fmul [1BCh]
00000141 D805C0010000 fadd [1C0h]
00000147 D9FF fcos
00000149 50 push eax
0000014A DB1C24 fistp [esp]
0000014D 58 pop eax
0000014E 5A pop edx
0000014F F7EA imul edx
00000151 F7EA imul edx
00000153 99 cwd
00000154 F73D04010000 idiv dword ptr [104h]
0000015A A3D4010000 mov [1D4h],eax
Bingo! I broke it!
Line 151 is wrong, as it is using the value of EDX computed by line 14F. Line 14F, "imul edx" performs a signed-integer multiply of EAX*EDX, leaving the 64-bit result in the register pair EDX:EAX.

ECX

What about ECX? It would seem that the C-- "loop" control is likely to use ECX, however, it doesn't:
cpuspeed.c-- 52: loop(count) y=100+90*cos(time*1.5+2.0)/x;
00000138 DB45F8 fild [ebp-8]
0000013B D80DC4010000 fmul [1C4h]
00000141 D805C8010000 fadd [1C8h]
00000147 D9FF fcos
00000149 50 push eax
0000014A DB1C24 fistp [esp]
0000014D 58 pop eax
0000014E 69C0BE000000 imul eax,eax,0BEh
00000154 99 cwd
00000155 F73D04010000 idiv dword ptr [104h]
0000015B A3DC010000 mov [1DCh],eax
00000160 FF4DFC dec dword ptr [ebp-4]
00000163 75D3 jne 138h
However, I found that if I use ECX explicitly:
  loop(ECX) y=100+90*cos(time*1.5+2.0)/x;
then the compiler does use the assembly language "loop" instruction. So, the loop will exit with ECX=0.

EBX, ESI, EDI

What we still haven't considered though, are the ESI, EBX and EDI registers. These are likely to be used when pointer or array arithmetic is involved. For example:
cpuspeed.c-- 55: EBX=*z+EAX<<2/x;
00000178 8B1D18020000 mov ebx,[218h]
0000017E 8B1B mov ebx,[ebx]
00000180 01C3 add ebx,eax
00000182 B102 mov cl,2
00000184 D3E3 shl ebx,cl
00000186 93 xchg ebx,eax
00000187 31D2 xor edx,edx
00000189 F73504010000 div dword ptr [104h]
0000018F 93 xchg ebx,eax
And another example:
cpuspeed.c-- 53: *z=100+90*cos(time*1.5+2.0)/x;
00000138 DB45F8 fild [ebp-8]
0000013B D80DC0010000 fmul [1C0h]
00000141 D805C4010000 fadd [1C4h]
00000147 D9FF fcos
00000149 50 push eax
0000014A DB1C24 fistp [esp]
0000014D 58 pop eax
0000014E 69C0BE000000 imul eax,eax,0BEh
00000154 99 cwd
00000155 F73D04010000 idiv dword ptr [104h]
0000015B 8B1DDC010000 mov ebx,[1DCh]
00000161 8903 mov [ebx],eax
Yes, it uses EBX. So, can we throw a spanner in the works?
Note, that bit of code with "fistp [esp]" looks weird, but in fact it is clever. The "fist/fistp" instruction cannot store a value direct to a register, only to a memory location, so the three instructions "push eax, fistp [esp], pop eax" achieve a store from the FPU to EAX.
I was able to break the statement using EBX very easily:
cpuspeed.c-- 55: EBX=*z+EBX+EAX<<2/x; //stuffs up the value in EBX!
Can we get the compiler to use ESI or EDI? It may have to do so to handle multidimensional arrays or multiple pointers in the same statement. For example, let's try two pointers, "z" and "zz":
cpuspeed.c-- 57: EBX=*z+*zz;
00000138 8B1DB8010000 mov ebx,[1B8h]
0000013E 8B1B mov ebx,[ebx]
00000140 8B35BC010000 mov esi,[1BCh]
00000146 031E add ebx,[esi]
Push this further, with three pointers:
cpuspeed.c-- 58: EBX=*z+*zz+*zzz;
00000138 8B1DC0010000 mov ebx,[1C0h]
0000013E 8B1B mov ebx,[ebx]
00000140 8B35C4010000 mov esi,[1C4h]
00000146 031E add ebx,[esi]
00000148 8B35C8010000 mov esi,[1C8h]
0000014E 031E add ebx,[esi]
So far I can't get the compiler to use EDI. Also, it tries to avoid using ESI -- it only does so above because I have put EBX in the statement explicitly. If I have this:
cpuspeed.c-- 58: y=*z+*zz+*zzz;
00000138 8B05C8010000 mov eax,[1C8h]
0000013E 8B00 mov eax,[eax]
00000140 8B1DCC010000 mov ebx,[1CCh]
00000146 0303 add eax,[ebx]
00000148 8B1DD0010000 mov ebx,[1D0h]
0000014E 0303 add eax,[ebx]
00000150 A3C4010000 mov [1C4h],eax
...you see, only using EAX and EBX.

However, there are expressions that do use ESI and EDI even though I have not explicitly used EBX. For example:
struct test {short a;char b[8];long c;} rr[5];
dword i,k;

cpuspeed.c-- 71: rr[i].b[k] >< rr[i+1].b[k+2];
00000127 8B352C020000 mov esi,[22Ch]
0000012D 8B3D28020000 mov edi,[228h]
00000133 6BFF0E imul edi,edi,0Eh
00000136 8A843EE4010000 mov al,[esi+edi+1E4h]
0000013D 8B3D2C020000 mov edi,[22Ch]
00000143 83C702 add edi,2
00000146 8B1528020000 mov edx,[228h]
0000014C 42 inc edx
0000014D 6BD20E imul edx,edx,0Eh
00000150 868417E4010000 xchg [edi+edx+1E4h],al
00000157 88843EE4010000 mov [esi+edi+1E4h],al
The "><" operator exchanges the contents of each parameter. Here we have ESI, EDI, EAX, and EDX used.

So, can we make any general recommendations about using registers explicitly in C-- expressions. Yes: don't.
The compiler tries to use only EAX, failing that it will also use EDX, failing that it will also use EBX, and finally will resort to using ESI then EDI. The compiler only seems to apply intelligence to avoid a clash when I explicitly use EAX.

Therefore, it would seem safe to explicitly use EAX and ECX, though you can expect their values to be destroyed by the end of execution of the expression.
For the vast majority of expressions ESI and especially EDI won't be used, which means that you can explicitly use them and they will come out of the expression unchanged.
Do not explicitly use EBX and EDX as a clash is highly likely.

I can't guarantee this, so let me know if you discover any exceptions.
Of course, if functions or macros are called in the expression, it depends on what the function/macro does with registers.

Coming back to an earlier example:
    EAX=y+100<<7;       //do NOT insert another statement between...
EBX=EAX<<2; //do NOT always expect EAX to be preserved!
offset1=EAX+EBX+x; //AVOID using EBX in an expression!
There is no problem here because a value assigned to a register is read immediately in the next statement. Obviously, inserting a statement between the second and last line may cause a problem. However, this fragment of code, that I got out of an existing program, is fraught with danger.
Notice that the first line assigns a value to EAX, and it is accessed in the last statement -- DANGER!
In the last line, EBX is used in the expression on the right-side -- DANGER!

If you study this section, you will understand conditions under which various registers can safely be explicitly used. For example, as long as statements don't explicitly use ECX for a counter in a "loop" type of operation, ECX can be explicitly used and will be preserved.

However, if you are concerned about this issue and want to be sure that your code with explicit use of registers is "ok", the compiler can be told to generate warnings about all use of registers. To make these warnings visible it is necessary
to place the option of the "/w" on the command line, or in the source code:
#warning TRUE
To redirect these warnings from output on the screen to a file, it is necessary to use an option on the command line:
wf=file_name

Conditional Expressions

Conditional expressions are expressions which are used for generating a 'yes' or 'no' for 'if' statements and 'do {} while' loops.

There are two types of conditional expressions, simple and complex.

Simple Conditional Expressions

Simple conditional expressions are a single token or expression that will be taken as a 'yes' if the calculated value is non-zero, or a 'no' if the calculated value is zero.

Complex Conditional Expressions

Complex conditional expressions are of the following form:

    ( leftside compare_op rightside )
Where:

    "leftside"   is any AL/AX/EAX or constant expression.  The expression
type will be determined by the first token (register or
variable): default is "word" for 16-bit code, "dword" for
32-bit code. If an other type is desired, the keyword
"byte", "char", "int", "word", "long", "dword" or "float"
can preceed the expression to specify its type.
"compare_op" is any one of "==", "!=", "<>", "<", ">", "<=", or ">=".
"rightside" is any single register, variable or constant expression.
Some examples of valid complex conditional expressions:

    ( x+y > z )
(int CX*DX <= 12*3 )
(byte first*second+hold == cnumber )
Some examples of invalid complex conditional expressions:

    ( x+y >= x-y ) //rightside is not a single token or constant expression.
( z = y ) //"==", not "=" must be used.

Datatype override of registers

In standard C there is the concept of casting in which the compiler is explicitly told to treat a variable as a datatype other than what it is defined as or as will be treated by default -- this may involve an actual conversion of the data to the new type. However, here we are concerned not with changing the data but in how it is interpreted by the compiler. As the data in the variable is itself not changed, just its interpretation, I have called this "datatype override".

You can see above that the "leftside" of a conditional expression may have the datatype qualifiers or overrides "byte", "char", "int", "word", "long", "dword" or "float". However, this section considers the particular problem of explicit use of registers.

With C--, as registers can be used as though they are variables in a high-level statement, there may be a problem, as the compiler does not know what datatype they contain.
The default is that the compiler treats them as unsigned integer numbers, of size determined by the register. For example, AX is unsigned 16-bit integer. However, it is possible to override this, as these examples show:
float f = 1.0;

void PROC ()
{
IF (f < signed ECX) // in the register ECX is sign number
IF (unsigned EBX > f) // in the register EBX is unsigned number
IF (f == float EAX) // in EAX is number of a format float. WARNING: EAX destroyed.
}
The allowed modifiers are 'signed', 'unsigned' and 'float'.

Note that in standard C, casting has the syntax "if(x==(float)zz)", where the cast is enclosed in brackets. This syntax, and generic casting, is not supported in C--.

Datatype conversion

The C-- compiler will convert values to the appropriate target datatype.
For example, where the variable "a" is type float, "j" is type long and "i" is type short, consider this:
cpuspeed.c-- 59: i=a+j;
0000011F 50 push eax //convert "a" to integer.
00000120 D905C0010000 fld [1C0h] // /
00000126 DB1C24 fistp [esp] // /
00000129 58 pop eax // /
0000012A 660305BC010000 add ax,[1BCh] //evaluation done in short.
00000131 66A3B8010000 mov [1B8h],ax
What you can see here is that the values on the right side are converted to short datatype before the calculation is performed, then the result assigned to variable "i".

Another example is this statement, where "z" is unsigned short:
z = a+j;
WARNING:
This will generate the same code as above and there is no sign-conversion. Thus, if the result of evaluating the expression on the rightside is a negative number, it will be assigned unchanged to "z", which is incorrect. This is because signed numbers are in two's complement notation, in which the positive-sign numbers are the same as unsigned, but not the negative range.

The conversion of values to the same datatype as the target (leftside) variable prior to evaluation may not be satifactory. You may want the calculation on the rightside to be evaluated at a higher level of precision and/or utilising the fractional component of floating point.
C-- allows you to specify the datatype that the elements of the rightside will be converted to before the calculation takes place. This example illustrates, where "a" is float, "j" is long and "i" is short:
cpuspeed.c-- 59: i= float a+j;
0000011F D905BC010000 fld [1BCh]
00000125 DA05B8010000 fiadd [1B8h]
0000012B DF1DB4010000 fistpw [1B4h]
00000131 9B fwait
In this example, the parameters on the rightside are converted to float format prior to calculation. Note that the generated asm code achieves this by loading both "a" and "j" into the FPU which does all internal calculations in floating point format.

Assignment to a register

Explicitly named registers in a C-- statement are treated by the compiler as unsigned integers. The section on conditional expressions explains how this default may be overridden (see above).
The above notes on datatype conversion also apply to a statement in which a register is the target (leftside). For example:
EAX = a*j;  //a is float, j is long.
Variables "a" and "j" will both be converted to long values prior to calculation, but as they are being treated as unsigned, an unsigned multiply instruction "mul" will be generated by the compiler.
On the other hand:
EAX = long a*j;  //a is float, j is long.
This will cause the compiler to treat the computation on the rightside as signed, so an "imul" instruction will be generated. HOWEVER, as per the warning above, no actual sign-conversion is done, so the result of the calculation on the rightside is simply assigned as-is to the leftside.

Multiple assignment

If it is necessary for you to assign many variables identical values:
  var1 = 0;
  var2 = 0;
  var3 = 0;
That can be written down in a short form:
  var1 = var2 = var3 = 0;
This format, as well as being brief as source code, will also generate more compact asm code.


Declaring functions and macros

General

Parameters may be passed to a function via the stack, which we call a stack procedure/function, or via registers, which we call a register procedure/function.

Parameters for stack procedures, if any, may be of any type (specified by 'byte', 'char', 'word', 'int', 'dword' or 'long'). Parameters are passed using a Pascal-like calling convention, that is, the first parameter is pushed first and the second parameter is pushed second, and so on. The Pascal-calling convention does not support variable number of parameters, so you have to be sure to pass the proper number of parameters to a stack procedure.

The following example stack procedure returns the sum as a 'word' of all its parameters, which are of different types:

    word add_them_all (short a,b; long c,d)
word x;
dword y;
{
return( a+b+c+d );
}
This is what the stack looks like at entry to the function "add_them_all()":
    EBP - 6    Local variable "y".
EBP - 2 Local variable "x".
EBP + 0 Saved EBP.
EBP + 4 Return address.
EBP + 8 Rightmost passed parameter, "d".
EBP + 12 Parameter "c".
EBP + 16 Parameter "b".
EBP + 20 Parameter "a".
In the case of 32-bit code, the compiler passes all function parameters as 32-bit quantities, regardless of datatype. The reason that EBP is used here, is because the compiler places this asm code at the beginning of procedures:
    push ebp       //save EBP.
mov ebp,esp //use EBP to access stack frame.
sub esp,value //value=number of bytes allocated for local variables.
Then, at the end of the procedure, the compiler places:
    mov ebp,esp    //restore EBP.
pop esp //restore ESP.
ret 12 //dump saved-EBP, local "x", local "y".
For the Pascal calling-convention, the local variables are removed by the operand of the "ret" instruction, which in this example removes 12 bytes from the stack prior to a return.

The datatype of passed parameters may be declared in an efficient manner. For example:
void afunc(word a, b, c; dword d, e; byte f);                 //shorthand format.
void afunc(word a, word b, word c, dword d, dword e, byte f); //full format.
A single datatype declaration "word" applies to all comma-separated parameters, until a semicolon denotes a change of datatype. There are two parameters of type "dword" and finally "f" is of type "byte".

The parameters (if any) for a register procedure are passed via registers. Register procedures have a maximum of 6 parameters. The registers used if the parameters are of type 'int' or 'word', in order, are AX, BX, CX, DX, DI, and SI. The first four parameters can also be of the type 'char' or 'byte', in this case AL, BL, CL and DL are used respectively. Any of the six parameters can be of type 'long' or 'dword', in which case EAX, EBX, ECX, EDX, EDI, or ESI would be used.

A macro is an inline register procedure. The keyword "inline" is described below.

Return values from functions are returned via registers, below is a table showing what register is used for each return type:

	  return type  |  register returned in
----------------------------------------
byte | AL
word | AX
dword | EAX
char | AL
short | AX
long | EAX
The easiest way to return a value from a function is to use the "return()" command, but the appropriate register can also be assigned the required return value instead. For example, the following two functions return the same value:

	dword proc_one ()
{
return( 42 );
}

dword proc_two ()
{
EAX = 42;
}
Strictly speaking, a procedure does not return a value, whereas a function does (in C-- this value is returned in the EAX register). However, I do not bother with this distinction and use the words procedure and function interchangeably.

It is also possible to return a flag. This example returns the Carry flag:
long CARRYFLAG fopen(); 	  //the declaration of the procedure.

if(fopen()) { code here } //evaluates returned Carry flag TRUE/FALSE.
if(handle=fopen()) { code here } //still evaluates Carry flag.
The conditional expression will evaluate as true if the function has returned with the Carry flag set.
We are accustomed to the return value in EAX being evaluated, so how do we know which it is going to be, EAX or Carry flag? Any explicit use of a comparison operator will cause the compiler to revert to a comparison based on the returned value in EAX. For example:
    if(fopen() == 5) { code here }  //reverts to return value.
You can also use OVERFLOW and ZEROFLAG flags.

Calling convention

Michael introduced a mechanism for specifying the calling convention, in this general format:
rettype modif procname ();
where "modif" can be "pascal", "cdecl", "stdcall", or "fastcall".
Calling
convention

Explanation
cdecl
This type of call of procedures is the default for the language C. It is characterized by the parameters of the procedure being transferred (pushed onto the stack) in the order right-to-left. The clearing of the stack of parameters is made after return from the procedure. This way of calling procedures is very convenient for procedures with variable number of parameters, but is inefficient.
pascal
This type of call assumes, that the parameters are transferred (pushed) in the order left-to-right, that is, in which they are written down in the program. The procedure itself removes the parameters from the stack, by use of the "ret n" instruction. This type of call is more compact than 'cdecl'.
stdcall
This type of call is a hybrid first two. The parameters are transferred to the procedure in the order right-to-left (reverse of how they are written). The parameters are removed from the stack by the procedure itself.
fastcall
/register
For this type of call, parameters are passed via registers. Thus no parameters need be released from the stack. A maximum of six parameters may be passed, using registers EAX, EBX, ECX, EDX, EDI, and ESI. 16-bit parameters can be passed in AX, BX, CX and DX. 8-bit parameters can be passed in AL, BL, CL and DL.

Default calling conventions

"register" -- although the default convention is "pascal", if a function has all upper-case letters it is by default "fastcall" (or "register").
"pascal" -- any function with one or more letters lower-case.
"stdcall" -- if the commandline has "/w32", "/w32c" or "/DLL" (a MS Windows application or DLL) .

Variable number of parameters

The standard C calling convention allows this. This is how to declare a function with a variable number of parameters:
void cdecl printf (word,...);
This example has at least one passed parameter, of datatype "word", but may have more.

Inline stack functions

This defines an inline procedure:
inline void sys_write_text(dword EBX, ECX, EDX, ESI){
 EAX = 4;
 $int 0x40;
}
The keyword "inline" achieves this.
The generated inline code for this function call is (cpuspeed is an application, msys is the system library file):
cpuspeed.c-- 26: sys_write_text(8<<16+8,0xFFFFFF,"C-- CPU speed example for MenuetOS",34);
0000005F B808000800 mov eax,80008h
00000064 50 push eax
00000065 68FFFFFF00 push 0FFFFFFh
0000006A B8E0000000 mov eax,0E0h
0000006F 50 push eax
00000070 6A22 push 22h
msys.h-- 99: EAX = 4;
00000072 B804000000 mov eax,4
msys.h-- 100: $int 0x40;
00000077 CD40 int 40h
00000079 83C410 add esp,10h
If this looks a bit useless, that's because it is. Despite registers being specified in the function definition, they aren't used -- they are just treated as dummy names (which is logical, and in keeping with standard C).
So, it will be far less confusing to give meaningful names to the dummy parameter names, as for example:
inline void sys_write_text(dword xystart, color, #message, length){
 EAX = 4;
 $int 0x40;
}
Note that it is allowed to have an explicit "return;" statement in an inline function. The compiler is intelligent enough to realise that it should not generate a "ret" instruction, just remove local stack parameters if required.

Inline register functions

This defines an inline procedure that actually does pass parameters via registers:
inline fastcall void sys_write_text(dword EBX, ECX, EDX, ESI){
 EAX = 4;
 $int 0x40;
}
The keyword "fastcall" causes the passed parameters to be loaded into EBX, ECX, EDX, and ESI (in this example).
Note, you can place the modifier "fastcall" between the returntype and the function name.
The generated inline code for this function call:
cpuspeed.c-- 26: sys_write_text(8<<16+8,0xFFFFFF,"C-- CPU speed example for MenuetOS",34);
0000005F BB08000800 mov ebx,80008h
00000064 B9FFFFFF00 mov ecx,0FFFFFFh
00000069 BAE0000000 mov edx,0E0h
0000006E BE22000000 mov esi,22h
msys.h-- 99: EAX = 4;
00000073 B804000000 mov eax,4
msys.h-- 100: $int 0x40;
00000078 CD40 int 40h
Personally, I don't like the choice of the name "fastcall" -- it would be more appropriate to use the keyword "register". You can do this by a simple #define:
#define register fastcall
Inline procedures are dynamic, that is, only inserted in the code when needed.

Non-inline register functions

So, it follows that:
fastcall void sys_write_text(dword EBX, ECX, EDX, ESI){
 EAX = 4;
 $int 0x40;
}
must pass parameters in registers, but the function itself is not compiled inline. The compiled code will look like this:
cpuspeed.c-- 26: sys_write_text(8<<16+8,0xFFFFFF,"C-- CPU speed example for MenuetOS",34);
00000067 BB08000800 mov ebx,80008h
0000006C B9FFFFFF00 mov ecx,0FFFFFFh
00000071 BAE4000000 mov edx,0E4h
00000076 BE22000000 mov esi,22h
0000007B E8A4FFFFFF call 24h
where the actual function is compiled elsewhere:
msys.h-- 99: EAX = 4;
00000024 B804000000 mov eax,4
msys.h-- 100: $int 0x40;
00000029 CD40 int 40h
0000002B C3 ret
This example is not a dynamic function, as it is always compiled into the program.

Non-inline stack functions

So, it follows that:
void sys_write_text(dword xystart, color, #message, length){
 EAX = 4;
 $int 0x40;
}
is a conventional non-inline function, in which parameters are passed via the stack.
Note that C-- uses the PASCAL calling convention by default, in which the leftmost parameter is pushed first, and parameters are removed from the stack before the function returns.

Dynamic non-inline functions

Non-inline functions are normally compiled into the final program regardless of whether they are actually called or not.
A colon ":" before the function definition makes it dynamic:
 : sys_write_text(dword xystart, color, #message, length){
 EAX = 4;
 $int 0x40;
}
that is, the function will only be compiled into the program if called. Note that the colon is allowed to be on column one.
A non-inline register function is also allowed to have this colon prefix.

Interrupt Procedures

Interrupt procedures, procedures which are used as handles for interrupts, are defined in the following manner:

	interrupt procedure_name ()
{
// put code here
}
Interrupt procedures do not automatically preserve any registers, and no registers are modified before the interrupt gains control, therefore it is your responsibility to 'push' and 'pop' registers to save and restore them.
For example:
	interrupt safe_handle ()
{
$PUSHAD //save all 32-bit registers.
/* do your thing here */
$POPAD
}

Conditional statements

'if' and 'else'

Selection statements, better known as 'if' statements, are similar to those in C. C-- has two selections statements. 'if' and 'IF'. 'if' does a near jump, and 'IF' does a short jump. 'IF' executes faster, and can save up to 3 bytes in code size but can only jump over 127 bytes of code.

Selection statements, like in C can be followed by either a single command, or a block of many commands enclosed within '{' and '}'. C-- selection statements are restricted to C-- conditional expressions (as described in section 1.4 Expressions).

If more than 127 bytes of code follow an 'IF' statement, the compiler will issue the following error message:

	IF jump distance too far, use if.
This can be simply remeded by changing the offending 'IF' statement to 'if'.

'else' and 'ELSE' statements are used just like the 'else' command in C, except that 'ELSE' has the same 127 byte jump restriction as 'IF' of 127 bytes. 'else' generates 1 more byte of code than 'ELSE'.

'IF' and 'else', and 'if' and 'ELSE' may be mixed freely, such as the following example:

	if( x == 2 )
WRITESTR("Two");
ELSE{WRITESTR("not two.");
printmorestuff();
}
If more than 127 bytes of code follow an 'ELSE' statement, the compiler will issue the following error message:

	ELSE jump distance too far, use else.
Simply change the 'ELSE' statement to 'else' to correct the error.

'do {} while' loops

'do {} while' loops repeat a block of code while a certain conditional statement remains true. The block of code will be executed at least once. An example of a 'do {} while' loop that loops five times follows:

	count = 0;
do {
count++;
WRITEWORD(count);
WRITELN();
} while (count < 5);
The conditional expression in the 'do {} while' statement must conform to the same rules as 'IF' and 'if' statements.

'loop' loops

'loop' loops repeat a block of code while the specified variable or register is different than zero. At the end of executing the block of code, the given variable or register is decremented by one, then tested if equal to zero. If the variable is not equal to zero, the block of code will be executed again, and the process repeated. An example of a 'loop' loop using a variable count as the loop counter:

	count = 5;
loop( count )
{WRITEWORD(count);
WRITELN();
}
Use of the register CX for small code block loops will yield the greatest code size efficiency for a 'loop', for the loop will be implemented by the use of the machine language 'LOOP' command.

If the loop counter is zero before starting the 'loop' command, the loop will be executed the maximum number of times for the range of the variable. 256 times for a 8 bit (byte or char) counter, 65536 for a 16 bit (word or int) counter, and 4294967296 for a 32 bit (dword or long) loop counter. For example, the following loop will execute 256 times:

	BH = 0;
loop( BH )
{
}
If no loop counter is given, the loop will loop forever. The following example will write *'s to the screen forever:

	loop()
WRITE('*');
The programmer may, if he or she wishes to, use and/or change the value of the loop counter variable within the loop. For example the following loop will only execute 3 times:

	CX = 1000;
loop( CX )
{
IF( CX > 3 )
CX = 3;
}

'for' loop

The standard C for loop is supported. Also FOR is allowed, for jumps within plus or minus 127 memory locations.
I found this code in an example application:
    FOR(ECX=0;ECX<200;ECX++){
EAX=ECX<<7;
EBX=EAX<<2;
EDI=EAX+EBX;
FOR(EDX=0;EDX<320;EDX++,EDI++){
ESI=EDI+offset1;
c1=flowers[ESI];
ESI=EDI+offset2;
c2=flowers[ESI];
ESI=EDI+offset3;
c3=flowers[ESI];
EAX=ECX<<6;
EBX=EAX<<2;
ESI=EAX+EBX+EDX;
screen[ESI]=c1+c2+c3;
}
}
The limited gotos, "break", "continue", "BREAK" and "CONTINUE" are allowed, the latter for short (+/-127) jumps.

'switch' case

The standard C switch case conditional works, as shown in this example:
   switch(sys_wait_event())
{
case 1:
draw_window();
continue; //go back up and re-evaluate switch.
case 2:
/*sys_get_key();*/
break; //exit from switch.
case 3:
if(sys_get_button_id()==1) sys_exit_process();
continue;
}
The limited gotos, "break", "continue", "BREAK" and "CONTINUE" are allowed, the latter for short (+/-127) jumps.

Arrays

Array indexing

The 1996 version of C--, managed by Peter Cellik, required that arrays be referenced by the byte-offset from the start of the array, regardless of the datatype.
For example:
dword array[10];
dword i;

EBX=4;
array[EBX]=0; //*byte* offset.
array[4]=0; //*byte* offset.
However, Michael Sheker has improved things so arrays can also be referenced by an index determined by datatype. For example:
  i=1;
array[i]=0; //index determined by datatype.
This last example also accesses the second element of the array. Here you can see what gets compiled:
cpuspeed.c-- 66: EBX=4;
0000011F BB04000000 mov ebx,4
cpuspeed.c-- 67: array[EBX]=0;
00000124 C783C401000000000000 mov dword ptr [ebx+1C4h],0
cpuspeed.c-- 68: array[4]=0;
0000012E C705C801000000000000 mov dword ptr [1C8h],0
cpuspeed.c-- 69: i=1;
00000138 C705EC01000001000000 mov dword ptr [1ECh],1
cpuspeed.c-- 70: array[i]=0;
00000142 8B35EC010000 mov esi,[1ECh]
00000148 C704B5C401000000000000 mov dword ptr [1C4h+esi*4],0
The array starts at address 1C4h, and you can see that the first two cases compute a byte offset from start of the array. However, by using a variable as the index, the variable correctly indexes the array, as in normal C.

To access an array by byte-offset, you can have these formats, where "index" is a numerical value:

	variable[index]
variable[index+EBX+ESI]
variable[index+EBX+EDI]
variable[index+EBP+ESI]
variable[index+EBP+EDI]
variable[index+ESI]
variable[index+EDI]
variable[index+EBP]
variable[index+EBX]
Note that nesting of arrays is allowed, and it will generate the correct index as determined by datatype, as this example shows:
cpuspeed.c-- 69: buf[array[i]]=0;
0000013A 8B35C0010000 mov esi,[1C0h]
00000140 8B34B514020000 mov esi,[214h+esi*4]
00000147 C704B5C401000000000000 mov dword ptr [1C4h+esi*4],0
In the above example, both "buf[]" and "array[]" are defined with datatype "dword". You can see the "*4" in both cases.

The dichotomy of byte-offset addressing of arrays and the correct indexing by datatype as in C, is a worry. Michael decided to introduce this:
  array[*i] = 0;   //here variable i will contain an absolute byte-displacement in
//array, instead of number of an element (index).
...hmmm.

Arrays may be fields of structures, and there may be arrays of structures. This is covered in the section "Structures" below.

Structures

Structure syntax

This is the format for defining a structure:
  struct <tag> {<fields>};
where
  <tag> is the name of the structure definition (not an instantiation),
<fields> are the definitions of the fields of the structure.
These are the two allowed formats for instantiating a structure:
  struct [<tag>] {<fields>} <name>[,<name>...];
[struct] <tag> <name> [, <name>...];
where
  [ ] square brackets denote optional,
<name> is the name of an instantiation of the structure.
The first format both defines and instantiates a structure. Here are examples of the first format:
struct rect {long x; long y;} myrect;
struct {long x; long y;} myrect;
struct {long x; long y;} rect1, rect2; //two structures instantiated.
Note that the <tag>, being the name of the structure definition, is optional if all instantiations are declared within this format.

The second format defines the structure separately from the instantiations. Here are examples of the second format:
struct rect {long x; long y;};         //definition of a structure.
struct rect myrect; //instantiation of rect.
struct FileInfo{dword read,firstBlock,qnBlockRead,retPtr,Work; byte filedir;};
struct FileInfo myfile; //instantiation of FileInfo.
Note that fields of a structure are not aligned in any way -- only the start of an instantiated structure may be aligned, if the compiler has alignment turned on.

Here is another example, to show how arrays may be used with structures:
struct test {
  int a;
  char b [8];
  long c;
} rr, ff [4];
In this example there are instantiations of the structure "test" with the name "rr" and and array of four structures with the name "ff".
The structure "test" may also be instantiated by the second format, as for example:
struct test dd;
Furthermore, it is permissible to leave off the "struct" keyword:
  test dd;

Initialisation of structures at the declaration

Structures cannot be initialised when defined, only when instantiated. Furthermore, only global structures can be initialised when instantiated. C-- supports some ways of initialisation of structures at their declaration (instantiation):

1. One value
       struct test dd = 2;
In this example the memory area of the structure "dd" is filled with the value 2. By default the value is 8-bit unless there is a datatype override, so in this example every memory location of the structure will be asigned the value 2.

2. Array of values
       struct test dd = {1,2,,6};
In this example to the first field of structure "dd" the value 1 is assigned, second has value 2, fourth field has value 6. Missed and noninitialised fields will be assigned the value zero.

3. FROM command
       struct test dd = FROM "file.dat";
In this example at the place where the structure "dd" is located at compilation (instantiation), the contents of the <filename> file are loaded. If the file size is more than the size of the structure, the superfluous bytes will overflow into the code of the program. If the file size is less than the size of the structure, the missed bytes of the structure will be filled in zero.

4. EXTRACT command
       struct test dd = EXTRACT "file.dat",24,10;
In this example at the place where the structure "dd" is located at compilation will be inserted a fragment from the <filename> of length 10 bytes from offset 24. The missed bytes will be filled with zero.

Initialization of structures at execution of the program.

Values can be assigned to fields of a structure during execution. Examples:
struct test {short a;char b[8];long c;};
void proc() {
struct test aa[5], rr;
short i;
  aa[0] = 0x78; //1. all memory filled with 0x78.
  aa[0] = 0x12345678; //2. ditto
  aa[i] = int 0x12345678; //3. all memory filled with 0x5678.
  aa = long 0x12345678; //4. entire array filled with 0x12345678.
  rr = i; //5. 16-bit value fills the memory.
In the first example memory occupied by the first structure of the array (of 5 structures) will be filled in byte 0x78 (by default). In the second example, the higher part of the value is ignored.
In the third example memory occupied the (i + 1)th structure of the array will be filled by a word value 0x5678.
In the fourth example the memory occupied by all 5 structures in the array will be filled a 32-bit long value 0x12345678.
In the fifth example the memory occupied by structure "rr" will be filled contents of 16-bit variable i.

It is possible also to copy contents of one structure to another. For example:
  rr = aa [2];
Contents of the third structure of the array of structures "aa" in will be copied to structure "rr".

Structure access

Accessing arrays of structures and arrays in structures

Here are examples:
struct test {short a;char b[8];long c;} rr[5];
rr.a = 1;
rr.b[i] = 2;
rr[i].c = 3;
rr[j].b[i] = 4;
Note:
For operations in which a field of a structure is an array and in which the array index is a variable, the compiler can use
the registers ESI and EDI, and in some situations (for example: rr[i].b[j] >< rr[i+1].b[j+2]) will also involve the register EDX.

Address of structure

It is possible to obtain the address of a structure and any fields of a structure. Here are examples:
struct bb        // tag of the structure.
{
  word b;     // the first field.
  dword c;      // the second field.
} ss;           // instantiation.
void proc ()
{
  EAX=#ss.b;   // get address of field "b" in structure "ss".
  EAX=#bb.b;  // get offset of the same field in a tag "bb".
}
There is an important difference between the above two examples. The statement "EAX=#ss.b;" gets the absolute address of the field, whereas the statement "EAX=#bb.b;" only gets the offset of the field from the start of the structure.

Nested structures

At the declaration of tags of structures it is possible to use tags of others declared beforehand. An example of nested structures:
struct RGB
{
  byte Red;
  byte Green;
  byte Blue;
  byte Reserved;
};

struct BMPINFO
{
  struct BMPHEADER header; //the description of this structure is missed.
  struct RGB color [256]; //array of RGB structures.
} info; //instantiation.
Let's assume it is necessary for you to receive contents of field "Red" in the tenth field of "color" array. It can be written down so:
  AL = info.color[10].Red;
But there is one limitation of use of nested structures in C--. It is not allowed to use more than one variable. Let's explain it with an example:
struct ABC {
  int a;
  int b;
  int c;
};
struct {
  struct ABC first [4]; // 4 copies of structure ABC
  int d;
} second [4];

int i, k;
void proc ()
{
  AX=second[i].first[k].a; //an error as variables used in two places.
  AX=second[2].first[k].a; //this syntax is allowable.
  AX=second[i].first[3].a; //allowable.
}

Bit fields of structures

The bit fields of structures are used for saving memory, as allow values to be packed densely, and for organization of convenient access to the registers of peripherals, in which the various bits can have independent functionality.

The declaration of a bit field has the following syntax:
<type>[<identifier>]:< a constant >;
Here is an example:
struct test {byte a:1;b:2;c:1;d:1;long vv;};
struct test rr=0;
The bit field consists of some number of bits, which is set by the numerical expression <constant>. The value should be a whole positive number and must not exceed the numbers of bits appropriate to <type>.
In C-- the bit fields can contain only unsigned of value.
It is impossible to use the arrays of bit fields, pointers on bit fields.
Here are examples of access:
cpuspeed.c-- 73: EAX=rr.a;
0000012D A104010000 mov eax,[104h]
00000132 83E001 and eax,1

cpuspeed.c-- 74: EAX=rr.b;
00000135 A104010000 mov eax,[104h]
0000013A 83E006 and eax,6
0000013D D1E8 shr eax,1

cpuspeed.c-- 75: EAX=rr.c;
0000013F A104010000 mov eax,[104h]
00000144 83E008 and eax,8
00000147 C1E803 shr eax,3

cpuspeed.c-- 75: rr.c=1;
0000013F 800D0401000008 or byte ptr [104h],8
As you can see from the generated asm code, the bit fields "a", "b", "c" and "d" are compacted to all fit in one "byte" memory location.

<identifier> names a bit field. Its presence is unessential. The indeterminate bit field means skip of appropriate number of bits before placement of the following field of the structure. An indeterminate bit field, for which the zero size is indicated, has special assignment: it guarantees, that the memory for the following bit field will begin on to boundary of that
type, which is given for an indeterminate bit field. That is, the alignment of a bit field on 8/16/32 bits will be made.

In C-- all bit fields are packed one behind another irrespective of boundaries such as identifiers. If the consequent field is not a bit field, then up to the boundary of the byte the bits will not be used. The maximum size bit field is equal 32 bits for a type dword/long, 16 bits for a type word/short and 8 Bit for a type byte/char. The bit fields can be united, i.e. to use them in an operator 'union'. 'Sizeof' applied to a bit field will return the size o this field in bits. At use of a bit field, it contents will extend in the register as an unsigned integer.

The declaration of procedures in structures

C-- support of the declaration of the procedure in structures is similar to the concept of classes in C++. That is, such a procedure becomes a method of the class. An example:
struct Point           //the declaration of the class.
{
     int x; //data items
     int y; // of the class of a type Point.
     void SetX (int); //the declaration of methods
     void SetY (int); // of the Point class.
};

void Point::SetX (int _x) //definition of the procedure of the Point class.
{
    IF((_x>=0)&&(_x<=MAX_X)) x=_x;
//The variables x, y are the members of this class and consequently access
//to them from procedures of the same class is carried out directly.
}

void main ()
Point p; //structure p instantiated in the stack.
{
  p.y = p.x = 0;
  p.SetX(1);
}
The call to "p.SetX(1);" also transfers the address of the structure (class) via the stack to the called procedure (method). This extra paramter is not specified explicitly in the source statement. In the  procedure this address is available through the name of a parametric variable "this".

If the procedure is declarated with the "static" keyword, the variable "this" is not transferred to the procedure and not available for use in the procedure.

The procedure declared in structure can be dynamic. For this purpose, at it definition, in it the beginning, is necessary is to be written with character of a colon ':' (as well as for usual dynamic procedures). But such dynamic procedure cannot be used as a macro.

Inheriting

C-- supports the mechanisms of simple and multiple inheriting. The declaration of structures with inheriting has the following syntax:
struct Derived: Base1, Base2... Basen
{
  long x0;
};
The number of base structures is not limited.

For multiple inheriting the structure can inherit two and more copies of base structures. Thus there is an ambiguity. An example:
struct A
{
  long x, y;
  ...
};

struct B: A //the structure 'B' inherits 'A'.
{
  ...
};

struct C: A //the structure 'C' inherits 'A'.
{
  ...
};

struct D: B, C //the structure 'D' inherits 'B' and 'C'.
{
  ...
};

void main ()
D d; //structure 'D' instantiated as 'd' on stack.
{
  d.x = 0;
In this example the structure "D" inherits two copies of structure "A" and in it is two fields with the "x" name. When C++ Compilers encounter this "d.x=0", they produce an error message.
C-- does not produce an error message and by default accesses the "x" field from last base structure having the field "x". To receive access to "x" in the first structure, it is necessary to apply this syntax:
  d.B::x=0;
From this it follows that:
  d.x=0;
and
  d.C::x=0;
are equivalent.

The use of pointers to access structures is described in the section "Pointers" below.

Pointers

This section gives a general introduction. The use of pointers is not as complete as is usual with C compilers.

Here are some examples of use of pointers in C--:
char *string [4] = {"string1", "string2", "string3", 0}; // a pointer array
char *str = "string4";

main ()
int i;
char *tstr;
{
    FOR (i = 0; string [i] != 0; i++) {
       WRITESTR (string [i]);
       WRITELN ();
    }
    FOR (tstr = str; byte *tstr != 0; tstr++) {
       WRITE (byte *tstr);
    }
}
Pointers can be passed as parameters to procedures.
Pointers can also be used in structures.
It is allowed to have a pointer to a pointer.

A pointer can point to a procedure, and this is the required syntax:
void (*ptr)(); // the declaration of a pointer to a procedure
In this example, the function returns void, ie., nothing.

The compiler does not perform any type checking when assignment is made to a pointer. For example:
char *z;
z = #main;
Although "z" is defined as a pointer to char datatype, the next line is assigning the address of a function to it. The compiler will not object, which may cause an error in your program if you are not careful. On the other hand, it gives you total freedom to assign whatever you want to pointers without being hassled by the compiler.

Pointers to structures

However, usage of pointers to reference structures is different from C as C-- does not have the "->" operator.
Nor would the compiler allow me to declare a pointer as follows:
struct FileInfo *pmyfile;  //unacceptable format.
Well, even if I can declare a pointer, I can't use the "->" operator, so how do I use a pointer to reference a structure? From example code, it seems that this is a situation where we must use an explicit register, as for example:
cpuspeed.c-- 73: ESI.FileInfo.read=0;
0000013E C70600000000 mov dword ptr [esi],0
Note the syntax of the source code, "ESI.FileInfo.read=0;".

By default, we are accessing the data segment, even though in the FLAT memory model we aren't normally concerned with segments. However, say that we use the ES or FS segments to address some area of physical memory, so we want the alternative segment to be encoded into the instruction. Take ES for example:
//this time I've initialised the elements when instantiating...
cpuspeed.c-- 56: struct FileInfo myfile={0,0,0,0,0,0};
00000104 000000000000000000000000 dd 0,0,0
00000110 0000000000000000 dd 0,0
00000118 00 db 0

//now, pretend that "myfile" is in the ES (extra segment):
cpuspeed.c-- 74: ESDWORD[#myfile.read]=0;
0000015A 26C7050401000000000000 mov dword ptr es:[104h],0

cpuspeed.c-- 75: ESDWORD[myfile.read]=0;
00000165 8B3504010000 mov esi,[104h]
0000016B 26C70600000000 mov dword ptr es:[esi],0
... I tried it with and without the "#" ...interesting.



Other syntax

Jump labels

Jump labels are used for labeling code locations for use with an inline assembly jump instruction, or at the high-level by the C "goto". Note, the capital letters "GOTO" are also allowed, for short (+/-127) jumps.

There are two types of jump labels, global and local. Global labels, as the name suggests, are labels which are 'visible' from anywhere in the program. Local labels are only 'visible' within their own procedure block and will be undefined outside the block.

Labels are defined by a identifier followed by a colon. If the identifier used contains one or more lower case letters, it is a global jump label, otherwise it is a local jump label.

Global jump labels must not be used within inline procedures, only local labels may be used. This is important to remember, for inline procedures are inserted inline maybe in multiple places, at compile time.

Swap Operator

C--, has an operator not found in any other language, the swap operator. The swap operator swaps two values. The symbol is '><'. The variables on either side of the swap operator must be of the same size, 8 bit and 8 bit, 16 bit and 16 bit, or 32 bit and 32 bit. Some examples follow:

	AX >< BX;      // store the value of BX in AX and the value of AX in BX
CH >< BL; // swap the values of CH and BL
dog >< cat; // swap the values of the variable dog and the variable cat
counter >< CX; // swap the values of counter and CX
If a swap is between two 8 bit memory variables, AL will be destroyed. If a swap is between two 16 bit memory variables, AX will be destroyed. If a swap is between 32 bit memory variables, EAX will be destroyed. In all other cases, such as a memory variable and a register, all register values will be preserved.

Neg Operator

C-- supports a quick syntax of toggling the sign of a variable, the Neg operator. By placing a '-' infront of a memory variable or register followed by a ';', the sign of the memory variable or register will be toggled. Some examples:

	-AX;     // same as 'AX = -AX;' but faster.
-tree; // same as 'tree = -tree;' but faster.
-BH; // toggle the sign of BH.

NOT Operator

C-- supports a quick syntax of doing a logical NOT toggling on a variable, the NOT operator. By placing a '!' in front of a memory variable or register followed by a ';', the value of the memory variable or register will be changed to the logical NOT of its current value. Some examples:

	!AX;     // same as 'AX ^= 0xFFFF;' but faster.
!node; // change the value of 'node' to its logical NOT.
!CL; // same as 'CL ^= 0xFF' but faster.

Special Conditional Expressions

C-- supports six special conditional expressions:

	CARRYFLAG
NOTCARRYFLAG
OVERFLOW
NOTOVERFLOW
ZEROFLAG
NOTZEROFLAG
These can be used in place of any normal conditional expressions. If for example you wish to execute a block of code only if the carry flag is set, then you would use the following code sequence:

	IF( CARRYFLAG )
{
// do some stuff here
}
If you wish to continuously execute a block of code until the overflow flag is set, you would use something like the following section of code:

	do {
// do your thing in here
} while( NOTOVERFLOW );

'sizeof'

The operation "sizeof" defines the size of memory, which corresponds to the object or type. The format is:
 sizeof (<a name of a type>)
The result is the size of memory in bytes. The operator can apply to a variable, registers, types variables, structures, text strings and files.

Examples:
  sizeof ("Test")    //result=5. includes terminating zero.
char a = "Test";
sizeof (a) //result=1. because "a" is char datatype.
sizeof (file "filename.dat") //result= size of the file.
sizeof (func1) //returns size of the procedure.
sizeof (ss.bb) //returns the size of "bb" member in structure "ss".
sizeof (FileInfo) //returns the size of structure "FileInfo".
Example of usage:
  z = sizeof (FileInfo.read);
In the case of obtaining the size of a procedure, it must have been defined earlier in the file. If it is a dynamic procedure, the size of zero will be returned.

Unions

Unions allow different variables to share a common memory.
The memory allocation is determined by the largest datatype in the union.
An example:
union
{
  dword regEAX;
  word regAX;
  byte regAL;
}; // have declared 3 variables located on same physical address.

void test ()
{
    regEAX = 0x2C;
    BL = regAL; //in the register BL there will be a value 0x2C.
}
It is possible to unite variables of various types, arrays, string variables and structures. The associations can be global and local, and also to settle down inside structures (while in associations inside structures it is impossible to use structures). The global associations can be initialized and non-initialized. To receive the initialized association needs to be initialized only first unit of association. If the first unit of association is not initialized, and the following units are initialized, it will cause a compiler error message.


Internal macros

These are macros that are built into the compiler, not in some external library. They are two groups, those for using the FPU (Floating Point Unit) mathematical instructions, and those for accessing the I/O ports.

Here are the FPU macros:
atan(x);     //calculate arctangent of number x.
atan2(x,y); //calculate arctangent of the attitude x/y.
cos(x); //return cosine of a corner x.
exp(x); //return to an exhibitor of number x (erects the
basis the natural logarithms in a degree x).
fabs(x); //calculate absolute value of number x.
log(x); //calculate the natural logarithm of number x.
log10(x); //calculate the decimal logarithm of number x.
sin(x); //return a sine of a corner x.
sqrt(x); //take a square root from among x.
tan(x); //return tangent of a corner x.

You can pass and return both floating point ("float") or signed integer ("long") to and from these macros. Note that the FPU internally does everything in floating point, but the load and store instructions can convert integer numbers.
Any parameter that is an angle must be in radians (not degrees).

Here are the port I/O macros:
    inp(port)      //read one byte from a port
inportb(port) //read one byte from a port
inport(port) //read a word from a port
inportd(port) //read a double word from a port
/*port - the "port" parameter is not essential. If the value
is not given, the generated instruction will be "in al,dx".
 If the port value is less than 256, the generated
instruction will be "in al,port".*/

outp(val,port) //writes byte "val" to a port
outportb(val,port) //writes byte "val" to a port
outport(val,port) //writes word "val" to a port
outportd(val,port) //writes double word to a port
/*val - written value
port - the word with the address of a port is not essential.
If the value is not given, the generated instruction will
be "out dx,al".
If the port value is less than 256, the instruction
will be "out port,al".*/

Inline assembly

C-- inline assembly supports all of the 8088/8086 assembly codes, plus the 80286, 80386, 80486 and Pentium to Pentium III enhanced instructions.

Here is some example inline assembly:

//at the C level, a variable declared:
dword cpuspeed;
 
$mov eax,cpuspeed //loads contents of "cpuspeed" (square brackets not allowed).
$mov edi,#smsg1+16 //the "#" means immediate-mode (address-of "smsg1").
$mov ecx,5
$newnum: //labels are allowed.
$xor edx,edx
$mov ebx,10
$div ebx
$add dl,48
$mov DSBYTE[edi],dl //keyword "DSBYTE" is required.
$sub edi,1
$loop newnum

Note that all instructions start with the "$" inline assembly specifier.

There is a problem for those familiar with NASM and FASM, as the instruction "mov eax,cpuspeed" loads the contents of variable "cpuspeed". You are not allowed to put square brackets to clarify this.
If you want to load the address of the variable, you put "mov eax,#cpuspeed".
Basically, the assembler syntax follows that of MASM (Microsoft assembler).

It is not allowed to have just "mov [edi],dl". You have got to specify which segment and the size, hence "DSBYTE" prefix.
Even though we are using the FLAT memory model with MenuetOS, in which CS=DS=SS (that is, point to the start of the virtual memory address space allocated for the program), the instruction still has encoded into it which segment it is accessing.

You are allowed to have a block of inline assembly. That is:

  asm {
//asm code here.
}

is allowed. For this you use the "asm" keyword. I found that "$ {   }" doesn't work.

Now for the ultimate integration of asm and C. If you put the commandline option as follows:

#pragma option ia     //"$" and "asm" keywords not required.

Then asm code does not have to be in asm blocks. That is, "$" and "asm { }" are not required. For example:

dword cpuspeed;
void main(void)
{
cpuspeed=sys_service(5,0)/1000000; //a C statement.
push eax //wow, no asm block.
mov eax,cpuspeed
pop eax; xor eax,eax
draw_window(); //back to normal C.

I have just put in some random code to illustrate. You throw in asm instructions just like C statements, and you can even use the standard ";" terminator if you wish, as I've done to place more than one instruction per line.

Some keywords may clash. For example, "int" is both an instruction and a datatype. However, C-- is in many cases able to distinguish between the different usage of "int" by the context in which it is placed.
To avoid a conflict between "int" used in header (include) files and the redefinition as an instruction mneumonic, it is prudent to place the "#pragma option ia" line after the "#include" lines.
Also, this example is a clash:

EAX = int 1 + a;

To clarify that inline asm is basically MASM-compatible, here are examples:

    $jmp  short place1   //"short" keyword, +/-127 locations.
$mov eax,cpuspeed //load contents of variable.
$mov eax,#cpuspeed //load address of variable.


Directives and command line options

Commandline arguments

You can find out the commandline arguments by typing "c--" at the DOS prompt and hit return without any parameters.
This will be displayed:
SPHINX C-- Compiler   Version 0.238   Jun 03 2002
USAGE: C-- [options] [FILE_NAME.INI] [SOURCE_FILE_NAME]

C-- COMPILER OPTIONS

OPTIMIZATION
/OC optimize for code size /DE enable temporary expansion variable
/OS optimize for speed /OST enable optimization string
/ON enable optimization number /AP[=n] align start procedure
/UST use startup code for variables /AC[=n] align start cycles

CODE GENERATION
/2 80286 code optimizations /SA=#### start code address
/3 80386 code optimizations /AL=## set value insert byte
/4 80486 code optimizations /WFA fast call API procedures
/5 pentium code optimizations /IV initial all variables
/A enable address alignment /SUV=#### start address variables

PREPROCESSOR
/IP=<path> include file path /IA assembly instructions as identifier
/D=<idname> defined identifier /CRI- not check include file on repeated
/MIF=<file> main input file /IND=<name> import name from dll

LINKING
/AT insert ATEXIT support block /STM startup code in main procedure
/ARGC insert parse command line /NS disable stub
/P insert parse command line /S=##### set stack size
/C insert CTRL<C> ignoring code /WIB=##### set image base address
/R insert resize memory block /WFU add Fix Up table (for Windows32)
/ENV insert variable with environ /WMB create windows mono block
/J0 disable initial jump to main() /WS=<name> set name stub file for win32
/J1 initial jump to main() short /WBSS set post data in bss section
/J2 initial jump to main() near /WO call API procedures on ordinals
/STUB= <name> set name stub file /CPA clear post area
/DOS4GW file running with DOS4GW

OUTPUT FILES
/TEXE DOS EXE file (model TINY) /D32 EXE file (32bit code for DOS)
/EXE DOS EXE file (model SMALL) /W32 EXE for Windows32 GUI
/OBJ OBJ output file /W32C EXE for Windows32 console
/SOBJ slave OBJ output file /DLL DLL for Windows32
/SYM COM file symbiosis /DBG creation debug information
/SYS device (SYS) file /LST creation assembly listing

MISCELLANEOUS
/HELP /H /? help, this info /WORDS list of C-- reserved words
/W enable warning /LAI list of assembler instructions
/WF=<file> direct warnings to a file /ME display my name and my address
/MER=## set maximum number errors /X disable SPHINXC-- header in output
/NW=## disable select warning

Compiler directives

Many of these command line options can be overridden by compiler directives in the source file. Compiler directives start with the letter "?" or "#". The "#pragma option" may be used to specify commandline arguments.
For example, this is what I place at the beginning of a program intended for MenuetOS:
#startaddress 0
#code32 TRUE
#pragma option X //disable sphinx header in output file.
#pragma option LST //generate asm listing file.
//#pragma option OC //optimisation for code size.(BK:makes run file bigger!!!)
//#pragma option 4 //for i80486.
//#pragma option A //align data from parity address.
#pragma option J0 //disable initial jump to main().
#resize 0 //disable mem. resizing code at start of output file.

# directive
Explanation
#includepath
Same as '/IP' on commandline. Tells the compiler where to look for files specified by the "#include" directive (see below). Example:
#includepath C:\progra~2\c--\inc
#include
This can be in two forms. Firstly:
#include "windows.h--"
An attempt at first is made to open the file in the current directory. If file is not present there, an attempt is made to open the file in the directory specified by the "#includepath" directive. If "#includepath" is not given or the specified file is not in this directory, an attempt is made to open the file in the directory by the option "/ip=path" in the command line. If this command is not given or the file is not in the indicated directory, an attempt is made to open the file in the directory indicated in the file "C--.INI" by the "ip=" command (note, "c--.ini" is placed in the current directory). If this command is not given or file is not there, an attempt is made to open the file in the directory specified by the C-- environment variables. Failing all that, a last attempt is made to open the file in directory in which the C-- compiler itself resides. Wow!
#include <windows.h-->
The search for the included file is made in the opposite direction to the above, except that search in the current folder is not made.
#inline
I'm not 100% sure how this works, but here is the translated explanation:
But sometimes it happens it is necessary by the included optimization on the size of the code, that the procedures were inserted into the code, instead of their call was done. For these objectives is entered the
instruction #inline TRUE. By same instruction ( #inline FALSE ), it is possible at optimization on speed to do calls of procedures, instead of their insert.
It is important to remember, that the status of the instruction #inline automatically varies at to mode change of optimization. At installation of optimization on speed the status the instructions #inline is installed in TRUE, and at mode change of optimization on to the size of the code, is installed in FALSE. Therefore apply the instruction #inline only after mode change of optimization.
One more change in the compiler: the instructions changing a mode of optimization #codesize, #speed and instruction #inline,, declared inside the procedure, are distributed only to the rest of the procedure, i.e.  they
become local. That the changes were global these instructions it necessary to declare outside of a body of the procedure.




more on this one day maybe, when I've figured it out
-- see "Further documentation" links below.

'C--.INI' file

The "c--.ini" file is intended to set the compile options of the compiler, as an alternative to setting options on the commandline or via "#pragma option" in the source file.
Syntax is the same as for the commandline, but without a "/" or minus. If the "c--.ini" file is located in the directories specified by the environment variable  "set c--=<path>", or if this variable is not defined and the "c--.ini" file resides in the same directory as the compiler c--.exe, then these parameters are distributed to all compiled programs.
If the file c--.ini is located in the current directory, parameters are read out only from this file and operate only for the current project.

Example "C--.INI" file:
R-
X
3  ;comments are allowed, preceded by a ";".
os
The ini-file can have any name (but the extension must be ".ini"). The name of this file with the extension should be transferred to the compiler by the command line. The file c--.ini is processed automatically before loading the file indicated in the command line.
Thus, the "*.ini" file can be used similarly to the make-file -- in it you can specify and name of the main compiled unit and, if necessary, any customization of the compilation.

Assembly language instructions

You can find out what assembly language instructions are supported by the compiler by typing "c-- /lai". For version 0.238 I got this:
SPHINX C-- Compiler   Version 0.238   Jun 03 2002
LIST OF SUPPORTED ASSEMBLER INSTRUCTIONS:
AAA AAD AAM AAS ADC ADD ADDPS ADDSS
ADRSIZE AND ANDNPS ANDPS ARPL
BOUND BSF BSR BSWAP BT BTC BTR BTS
CALL CALLF CBW CDQ CLC CLD CLI CLTS
CMC CMOVA CMOVAE CMOVB CMOVBE CMOVC CMOVE CMOVG
CMOVGE CMOVL CMOVLE CMOVNA CMOVNAE CMOVNB CMOVNBE CMOVNC
CMOVNE CMOVNG CMOVNGE CMOVNL CMOVNLE CMOVNO CMOVNP CMOVNS
CMOVNZ CMOVO CMOVP CMOVPE CMOVPO CMOVS CMOVZ CMP
CMPPS CMPSB CMPSD CMPSS CMPSW CMPXCHG CMPXCHG8B COMISS
CPUID CVTPI2PS CVTPS2PI CVTSI2SS CVTSS2SI CVTTPS2PI CVTTSS2SI CWD
CWDE
DAA DAS DB DD DEC DIV DIVPS DIVSS
DW
EMMS EMMX ENTER
F2XM1 FABS FADD FADDP FBLD FBSTP FCHS FCLEX
FCMOVB FCMOVBE FCMOVE FCMOVNB FCMOVNBE FCMOVNE FCMOVNU FCMOVU
FCOM FCOMI FCOMIP FCOMP FCOMPP FCOS FDECSTP FDISI
FDIV FDIVP FDIVR FDIVRP FENI FFREE FIADD FICOM
FICOMP FIDIV FIDIVR FILD FILDQ FIMUL FINCSTP FINIT
FIST FISTP FISUB FISUBR FLD FLD1 FLDCW FLDENV
FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ FMUL FMULP
FNCLEX FNDISI FNENI FNINIT FNOP FNSAVE FNSETPM FNSTCW
FNSTENV FNSTSW FPATAN FPREM FPREM1 FPTAN FRNDINT FRSTOR
FSAVE FSCALE FSETPM FSIN FSINCOS FSQRT FST FSTCW
FSTENV FSTP FSTSW FSUB FSUBP FSUBR FSUBRP FTST
FUCOM FUCOMI FUCOMIP FUCOMP FUCOMPP FWAIT FXAM FXCH
FXRSTOR FXSAVE FXTRACT FYL2X FYL2XP1
HALT HLT
IDIV IMUL IN INC INSB INSD INSW INT
INTO INVD INVLPD INVLPG IRET IRETD
JA JAE JB JBE JC JCXZ JE JECXZ
JG JGE JL JLE JMP JMPF JMPN JMPS
JNA JNAE JNB JNBE JNC JNE JNG JNGE
JNL JNLE JNO JNP JNS JNZ JO JP
JPE JPO JS JZ
LAHF LAR LDMXCSR LDS LEA LEAVE LES LFS
LGDT LGS LIDT LLDT LMSW LOADALL LOCK LODSB
LODSD LODSW LOOP LOOPD LOOPE LOOPNE LOOPNZ LOOPW
LOOPZ LSL LSS LTR
MASKMOVQ MAXPS MAXSS MINPS MINSS MOV MOVAPS MOVD
MOVHLPS MOVHPS MOVLHPS MOVLPS MOVMSKPS MOVNTPS MOVNTQ MOVQ
MOVSB MOVSD MOVSS MOVSW MOVSX MOVUPS MOVZX MUL
MULPS MULSS NEG NOP NOT
OPSIZE OR ORPS OUT OUTSB OUTSD OUTSW
PACKSSDW PACKSSWB PACKUS PACKUSWB PADDB PADDD PADDSB PADDSW
PADDUSB PADDUSW PADDW PAND PANDN PAVGB PAVGW PCMPEQB
PCMPEQD PCMPEQW PCMPGTB PCMPGTD PCMPGTW PEXTRW PINSRW PMADD
PMADDWD PMAXSW PMAXUB PMINSW PMINUB PMOVMSKB PMULH PMULHUW
PMULHW PMULL PMULLW POP POPA POPAD POPF POPFD
POR PREFETCHNTA PREFETCHT0 PREFETCHT1 PREFETCHT2 PSADBW PSHUFW
PSLLD PSLLQ PSLLW PSRAD PSRAW PSRLD PSRLQ PSRLW
PSUBB PSUBD PSUBSB PSUBSW PSUBUSB PSUBUSW PSUBW PUNPCKHBW
PUNPCKHDQ PUNPCKHWD PUNPCKLBW PUNPCKLDQ PUNPCKLWD PUSH PUSHA
PUSHAD PUSHF PUSHFD PXOR
RCL RCPPS RCPSS RCR RDMSR RDPMC RDTSC REP
REPE REPNE REPNZ REPZ RET RETF ROL ROR
RSM RSQRTPS RSQRTSS
SAHF SAL SAR SBB SBC SCASB SCASD SCASW
SETA SETAE SETALC SETB SETBE SETC SETE SETG
SETGE SETL SETLE SETNA SETNAE SETNB SETNBE SETNC
SETNE SETNG SETNGE SETNL SETNLE SETNO SETNP SETNS
SETNZ SETO SETP SETPE SETPO SETS SETZ SFENCE
SGDT SHL SHLD SHR SHRD SHUFPS SIDT SLDT
SMSW SQRTPS SQRTSS STC STD STI STMXCSR STOSB
STOSD STOSW STR SUB SUBPS SUBSS SYSENTER SYSEXIT
TEST
UCOMISS UD2 UNPCKHPS UNPCKLPS
VERR VERW
WAIT WBINVD WRMSR
XADD XCHG XLAT XLATB XOR XORPS


Appendices

How to install C-- on your computer.

To install C-- on your computer is very simple. Let's assume, that you have decided to install C-- on the C: drive.
  1. Create on the C drive the C-- folder ( for example by command: "MD C--").
  2. Then copy into this folder library files (files with the *.H-- extension).
  3. Copy into the same folder the files: C--.EXE; MAINLIB.LDP; STARTUP.H-- (the file C--.INI is not necessary.)
  4. Then in the "autoexec.bat" file add an environment variable: "SET C--=C:\C--"
  5. The files of libraries can be arranged in another folder, but then it will be necessary in the file "c--.ini" to add a line: "ip=c:\path_to_lib"
If the compiler is located in the working directory, the environment variable for C-- is not mandatory.

For further information on installation, see my example Windows application at http://www.goosee.com/cmm/.

Further documentation

Michael has provided me with documents translated to English, that require some rewording and inclusion in this document. For completeness, you may access them as-is:
code32.txt    32-bit programming.
comstr.txt Commandline parameters.
comstrd.txt Optimisation of numerical expressions.
directin.txt Conditional compilation.
ifloop.txt Conditonal instructions.
import.txt FROM and EXTRACT.
index.txt Index addressing.
instr.txt Compiler directives.
label.txt Labels of transition.
output.txt Output files.
sintc.txt Special operators.
sys.txt Compilation of device drivers.


(c) Barry Kauler 2002. May be distributed, in original form only. This copyright notice and link must be retained.
http://www.goosee.com/explorer/