Discussion:
[OpenWatcom] building a COM file without pulling in the Watcom standard library
(too old to reply)
Mateusz Viste
2023-07-31 15:28:58 UTC
Permalink
Hello all,

I am experimenting with OpenWatcom, trying to compile a simple C
file into a COM executable. The tricky part is that I'd like to avoid
pulling in watcom's standard library and startup code in the process.

I have such TEST.C file:

void main(void) {
static char *hello = "Hello$";
_asm {
mov ah, 9
mov dx, hello
int 0x21
}
}

I compile it into an object and pass to wlink using these commands:

wcc -0 -ms -od -s -d0 -zl -zls test.c
wlink @TEST.LNK

I filled TEST.LNK with the following directives:

FORMAT DOS COM
FILE test.obj
OPTION NODEFAULTLIBS
NAME TEST.COM
OPTION START=main_

I do get "something" out of this, but the binary file does not work
properly. The disassembled COM looks like that:

00000000 53 push bx
00000001 51 push cx
00000002 52 push dx
00000003 56 push si
00000004 57 push di
00000005 B409 mov ah,0x9
00000007 8B160C00 mov dx,[0xc] <-- this should be 0x11C
0000000B CD21 int 0x21
0000000D 5F pop di
0000000E 5E pop si
0000000F 5A pop dx
00000010 59 pop cx
00000011 5B pop bx
00000012 C3 ret
00000013 004865 add [bx+si+0x65],cl
00000016 6C insb
00000017 6C insb
00000018 6F outsw
00000019 2400 and al,0x0
0000001B 0004 add [si],al <-- this should be 0x114
0000001D 00 db 0x00

It appears that the COM file is not being originated at offset 0x100,
despite the "FORMAT DOS COM" wlink directive. It's also not
0-originated, so I am not sure how the offsets are calculated exactly.
Once I fix them with a hex editor, the executable works.

What am I missing here?

Mateusz
JJ
2023-08-01 11:38:45 UTC
Permalink
Post by Mateusz Viste
Hello all,
I am experimenting with OpenWatcom, trying to compile a simple C
file into a COM executable. The tricky part is that I'd like to avoid
pulling in watcom's standard library and startup code in the process.
void main(void) {
static char *hello = "Hello$";
_asm {
mov ah, 9
mov dx, hello
int 0x21
}
}
The `hello` from `mov dx, hello` in assembly's perspective, means the
content of the variable. Not its address. You'll need to use the `offset`
operator. i.e.

mov dx, offset hello
Post by Mateusz Viste
FORMAT DOS COM
FILE test.obj
OPTION NODEFAULTLIBS
NAME TEST.COM
OPTION START=main_
I'm not familiar with Watcom linker, but the linker user's guide says to use
these:

system com
option map
name app_name
file obj1, obj2, ...
library lib1, lib2, ...

https://open-watcom.github.io/open-watcom-v2-wikidocs/lguide.pdf

Section "2.2.2 Linking 16-bit x86 DOS .COM Executable Files". Page 8.
Post by Mateusz Viste
I do get "something" out of this, but the binary file does not work
00000000 53 push bx
00000001 51 push cx
00000002 52 push dx
00000003 56 push si
00000004 57 push di
00000005 B409 mov ah,0x9
00000007 8B160C00 mov dx,[0xc] <-- this should be 0x11C
0000000B CD21 int 0x21
0000000D 5F pop di
0000000E 5E pop si
0000000F 5A pop dx
00000010 59 pop cx
00000011 5B pop bx
00000012 C3 ret
00000013 004865 add [bx+si+0x65],cl
00000016 6C insb
00000017 6C insb
00000018 6F outsw
00000019 2400 and al,0x0
0000001B 0004 add [si],al <-- this should be 0x114
0000001D 00 db 0x00
It appears that the COM file is not being originated at offset 0x100,
despite the "FORMAT DOS COM" wlink directive. It's also not
0-originated, so I am not sure how the offsets are calculated exactly.
Once I fix them with a hex editor, the executable works.
If you disassembed a binary using a blind disassembler (which don't know
binary file format, and platform), all file bytes will be treated as code,
and will start at zero or at disassembler application's predefined address.

Also check the compiled binary. Make sure it doesn't start with "MZ", which
is an EXE binary. IOTW, you have an EXE binary named as a COM file. In this
case, if the code, the data, and the stack segments are all the same
(usually a Tiny memory model module), try using a tool like EXE2BIN to
extract only the EXE body (excluding header).
Mateusz Viste
2023-08-01 12:50:39 UTC
Permalink
Post by JJ
Post by Mateusz Viste
void main(void) {
static char *hello = "Hello$";
_asm {
mov ah, 9
mov dx, hello
int 0x21
}
}
The `hello` from `mov dx, hello` in assembly's perspective, means the
content of the variable. Not its address.
Sure, but the variable at hand is a pointer, so what I am effectively
interested in is where it points to (ie. its content). The variable
itself acts only as a convenient label.
Post by JJ
Post by Mateusz Viste
FORMAT DOS COM
FILE test.obj
OPTION NODEFAULTLIBS
NAME TEST.COM
OPTION START=main_
I'm not familiar with Watcom linker, but the linker user's guide says
system com
Yes, and the program I posted does work all right when linked as
"system com", but a side effect of "system com" is that it forces the
import of watcom's startup code and symbols. "format dos com" does not
(but yields a non-working binary where I have to fix addresses by hand
with a hex editor).
Post by JJ
If you disassembed a binary using a blind disassembler (which don't
know binary file format, and platform), all file bytes will be
treated as code, and will start at zero or at disassembler
application's predefined address.
This is a COM file, so it's raw code. The only data here is the "hello"
string, but it's placed at the end of the binary so it's easy to spot.
Post by JJ
Also check the compiled binary. Make sure it doesn't start with "MZ",
I posted the exact, full content of the binary. It's a COM file, no MZ.

Earlier today I stumbled upon an interesting stackoverflow discussion,
where Peter Szabo was trying to achieve something very similar to what
I am doing now, and Michael Petch provided some deep insight into the
matter:
https://stackoverflow.com/questions/62473231/small-model-dos-exe-compiled-and-linked-by-openwatcom-crashes

My understanding is that wlink needs the startup code to figure out how
to lay out the program's memory. Things seem to be much more convoluted
than I expected, even for building a simple COM image.

Peter Szabo ended up creating a specialized tool to solve the problem:
https://github.com/pts/dosmc/blob/master/dosmc.dir/dosmc.pl

This is not a road I am willing to take, since I was simply (naively,
perhaps) looking for a way of building minimalist COM files using the C
language, hoping that Open Watcom would be able to do so, if properly
instructed. I think now that I might be using the wrong tool for the
job.

Mateusz
T. Ment
2023-08-01 14:23:21 UTC
Permalink
Post by Mateusz Viste
looking for a way of building minimalist COM files using the C
language, hoping that Open Watcom would be able
Turbo C tiny model maybe. Never needed one myself though. IDK.
Alexei A. Frounze
2023-08-03 03:28:34 UTC
Permalink
Post by Mateusz Viste
Hello all,
I am experimenting with OpenWatcom, trying to compile a simple C
file into a COM executable. The tricky part is that I'd like to avoid
pulling in watcom's standard library and startup code in the process.
void main(void) {
static char *hello = "Hello$";
_asm {
mov ah, 9
mov dx, hello
int 0x21
}
}
wcc -0 -ms -od -s -d0 -zl -zls test.c
FORMAT DOS COM
FILE test.obj
OPTION NODEFAULTLIBS
NAME TEST.COM
OPTION START=main_
I do get "something" out of this, but the binary file does not work
00000000 53 push bx
00000001 51 push cx
00000002 52 push dx
00000003 56 push si
00000004 57 push di
00000005 B409 mov ah,0x9
00000007 8B160C00 mov dx,[0xc] <-- this should be 0x11C
0000000B CD21 int 0x21
0000000D 5F pop di
0000000E 5E pop si
0000000F 5A pop dx
00000010 59 pop cx
00000011 5B pop bx
00000012 C3 ret
00000013 004865 add [bx+si+0x65],cl
00000016 6C insb
00000017 6C insb
00000018 6F outsw
00000019 2400 and al,0x0
0000001B 0004 add [si],al <-- this should be 0x114
0000001D 00 db 0x00
It appears that the COM file is not being originated at offset 0x100,
despite the "FORMAT DOS COM" wlink directive. It's also not
0-originated, so I am not sure how the offsets are calculated exactly.
Once I fix them with a hex editor, the executable works.
What am I missing here?
If you make your own startup code with proper
----8<----
org 100h
_cstart_:
...
end _cstart_
----8<----
It may just work.

Alex
Mateusz Viste
2023-08-03 08:21:32 UTC
Permalink
Post by Alexei A. Frounze
If you make your own startup code with proper
----8<----
org 100h
...
end _cstart_
----8<----
It may just work.
Is org 100h really required in this context? Isn't it the job of the
linker to compute proper addresses?

I tried nonetheless, but nasm does not understand the "org" directive
when using the -f obj target. It's apparently only valid for -f bin.

Mateusz
Alexei A. Frounze
2023-08-04 02:41:07 UTC
Permalink
Post by Mateusz Viste
Post by Alexei A. Frounze
If you make your own startup code with proper
----8<----
org 100h
...
end _cstart_
----8<----
It may just work.
Is org 100h really required in this context? Isn't it the job of the
linker to compute proper addresses?
Perhaps, but that's how the OBJ/OMF format has worked for years
in TASM, MASM, WASM.
Post by Mateusz Viste
I tried nonetheless, but nasm does not understand the "org" directive
when using the -f obj target. It's apparently only valid for -f bin.
Why not just use WASM if you're already using WCC/WCL/WLINK/etc?

Anyhow, if you dig your beloved NASM's nasmdoc.txt, you'll find this:
----8<----
8.2.2 Using the `obj' Format To Generate `.COM' Files

If you are writing a `.COM' program as more than one module, you may
wish to assemble several `.OBJ' files and link them together into a
`.COM' program. You can do this, provided you have a linker capable
of outputting `.COM' files directly (TLINK does this), or
alternatively a converter program such as `EXE2BIN' to transform the
`.EXE' file output from the linker into a `.COM' file.

If you do this, you need to take care of several things:

(*) The first object file containing code should start its code
segment with a line like `RESB 100h'. This is to ensure that the
code begins at offset `100h' relative to the beginning of the
code segment, so that the linker or converter program does not
have to adjust address references within the file when
generating the `.COM' file. Other assemblers use an `ORG'
directive for this purpose, but `ORG' in NASM is a format-
specific directive to the `bin' output format, and does not mean
the same thing as it does in MASM-compatible assemblers.
----8<----

HTH,
Alex
Mateusz Viste
2023-08-04 12:03:02 UTC
Permalink
Post by Alexei A. Frounze
Post by Mateusz Viste
Is org 100h really required in this context? Isn't it the job of
the linker to compute proper addresses?
Perhaps, but that's how the OBJ/OMF format has worked for years
in TASM, MASM, WASM.
I didn't know that, I naively assumed the linker would recompute all
addresses.
Post by Alexei A. Frounze
The first object file containing code should start its code
segment with a line like `RESB 100h'. This is to ensure that
the code begins at offset `100h' relative to the beginning
of the code segment, so that the linker or converter program
does not have to adjust address references within the file
Thanks for your research, it does explain quite some things. For
starters, it means that there is no chance to end up with working
machine code without inserting some kind of startup code that would
enforce the 0x100 address of the entry point (and probably perform some
other magic, like setting up segments to whatever expectations the C
compiler has...).

Before trying to (ab)use Open Watcom I shortly played with SmlrC. It
looked very promising, but its inline assembly wasn't resolving C
symbols (ie. asm "mov ax, myvar" was literally writing "mov ax,
myvar" into the resulting asm file instead of resolving the address of
the myvar short int).

I also glanced at Dave Dunfield's micro C, but found it even more
limited. And then there were a couple of other things I checked, but
they were mostly limited to a subset of K&R C.

Are there any alternative that would be able to generate a tiny (no
ugly startup code or libc calls) COM file based on ANSI C code with
inline assembly?

The only one I still have to check out is DOSMC by Peter Szabo,
but sadly it does not work in DOS.

Mateusz
R.Wieser
2023-08-03 08:39:49 UTC
Permalink
Mateusz,
...
Post by Mateusz Viste
OPTION START=main_
A COM file *always* starts at 0x0100. Maybe this directive interferes with
it ?
Post by Mateusz Viste
00000007 8B160C00 mov dx,[0xc] <-- this should be 0x11C
the 0xC is almost the offset from that commands address to the string (off
by one). IOW, it looks like the resolving (by "wlink") didn't quite kick
in. Maybe the linker needs to be told that it is converting a COM style
program too ?
Post by Mateusz Viste
0000001B 0004 add [si],al <-- this should be 0x114
AFAIKS everything from address 0x1A is beyond your code/program. IOW, no
idea what the remark is about.

Regards,
Rudy Wieser
Mateusz Viste
2023-08-03 10:07:34 UTC
Permalink
Post by R.Wieser
Post by Mateusz Viste
OPTION START=main_
A COM file *always* starts at 0x0100. Maybe this directive
interferes with it ?
Without "OPTION START" the result is exactly the same, with the only
difference that wlink complains about "no starting address found".
Post by R.Wieser
Post by Mateusz Viste
00000007 8B160C00 mov dx,[0xc] <-- this should be 0x11C
the 0xC is almost the offset from that commands address to the string
(off by one). IOW, it looks like the resolving (by "wlink") didn't
quite kick in. Maybe the linker needs to be told that it is
converting a COM style program too ?
The only wlink options I found in this context are "FORMAT DOS COM" and
"SYSTEM COM". The former is not computing addresses properly (as
shown in this thread) and the latter forces watcom's startup code to be
pulled in, resulting in a kilobyte of bloat.

Looking at the DOSMC tool from Peter Szabo it looks like there is
quite some hoops to jump over: https://github.com/pts/dosmc
I was probably a bit naive to think that there would be a ready-to-go
wlink switch that would generate working COM files without the watcom
startup bloat.
Post by R.Wieser
Post by Mateusz Viste
0000001B 0004 add [si],al <-- this should be 0x114
AFAIKS everything from address 0x1A is beyond your code/program.
IOW, no idea what the remark is about.
You are correct that 0x1B is beyond code, but it is not beyond data.
My understanding is that the "04 00" value starting at 1Ch is a near
pointer that is supposed to be loaded by mov dx,[0x0c] (ie. "load DX
with the value at memory location 0x0C"). Both the MOV and the pointer
are badly addressed though, that is why I needed to fix them both by
hand to get a working executable. How the values 0x000C and 0x0004
have been computed exactly, this I have no idea.

Mateusz
R.Wieser
2023-08-03 12:35:29 UTC
Permalink
Mateusz,
Post by Mateusz Viste
Post by R.Wieser
A COM file *always* starts at 0x0100. Maybe this directive
interferes with it ?
Without "OPTION START" the result is exactly the same, with the only
difference that wlink complains about "no starting address found".
That strengthens my gut feeling that the linker program /also/ needs to be
told what kind of executable to generate - its faulty 0xC offset in the COM
program being a result of not exactly knowing what to do.
Post by Mateusz Viste
Post by R.Wieser
Post by Mateusz Viste
0000001B 0004 add [si],al <-- this should be 0x114
AFAIKS everything from address 0x1A is beyond your code/program.
IOW, no idea what the remark is about.
You are correct that 0x1B is beyond code, but it is not beyond data.
My understanding is that the "04 00" value starting at 1Ch is a near
pointer that is supposed to be loaded by mov dx,[0x0c]
Ackkk.... I overlooked that you are doing an indirect load. :-\

Hmmm. In that case your program seems to expect DS to be one segment (0x10
bytes) beyond the CS segment. Makes some sense, giving the data segment as
much free space as possible.

Its still strange for a pure COM file though. No idea how to determine that
DS-to-CS offset though (might be an internally-generated label)
Post by Mateusz Viste
How the values 0x000C and 0x0004 have been computed exactly, this
I have no idea.
See above. :-)

Regards,
Rudy Wieser
R.Wieser
2023-08-04 06:31:49 UTC
Permalink
Mateusz,

Something I overlooked/ignored while trying to figure out those wonkey
offsets :

[quote=me]
A COM file *always* starts at 0x0100.
[quote]
Post by Mateusz Viste
00000000 53 push bx
...

Either your disassembler is doing something funny, or it really thinks your
program starts at 0x0000 (and not 0x0100) ...

Could you add a command like "lea ax,main" and see which address "main" gets
translated too ? It should ofcourse show 0x0100. If it does not not than
the linker didn't generate a COM style file to begin with.

Also, have you checked the binary contents of your (supposed) .COM file ?
If it starts with "MZ" ...

Regards,
Rudy Wieser
Mateusz Viste
2023-08-04 12:03:24 UTC
Permalink
Post by R.Wieser
Could you add a command like "lea ax,main" and see which address
"main" gets translated too ? It should ofcourse show 0x0100. If it
does not not than the linker didn't generate a COM style file to
begin with.
Here it is:

00000005 8D060000 lea ax,[0x0]

Not unexpectedly, main() starts at offset 0 because the linker does not
compute the addresses with an extra +0x100. Which is the whole issue.
Post by R.Wieser
Also, have you checked the binary contents of your (supposed) .COM
file ? If it starts with "MZ" ...
The disassembly I posted in the initial message truly is the entirety
of the generated file. It starts with push bx. No "MZ" nor any other
header.

It appears that without startup code, wlink simply won't generate a
proper COM. Then one can either rely on the (huge! 1K) startup provided
by Watcom, by using the "SYSTEM COM" wlink directive, or hand-craft its
own startup code that would mimic whatever the original startup needs
to set up. In other words, I take there is no easy way to achieve what
I was looking for.

Mateusz
R.Wieser
2023-08-04 14:29:24 UTC
Permalink
Mateusz,
Post by Mateusz Viste
It appears that without startup code, wlink simply won't generate
a proper COM.
I did a quick DDG search for "OpenWatcom create COM style file", got
https://stackoverflow.com/questions/46408334/com-executables-with-open-watcom
and noticed "BlackJack"s response.

From there I did another search for "OpenWatcom set model tiny", and from
the DDG result (https://github.com/open-watcom/open-watcom-v2/issues/275)
and your initial post I noticed that you are compiling with the "-ms" switch
(small memory model), which is incompatible with a COM style executable.
Try "-mt" (tiny memory model) instead.

Regards,
Rudy Wieser
Mateusz Viste
2023-08-04 14:56:38 UTC
Permalink
Post by R.Wieser
I noticed that you are compiling with the "-ms" switch
(small memory model), which is incompatible with a COM style
executable. Try "-mt" (tiny memory model) instead.
"-ms" is the proper switch for compiling object files for COM
executables. In fact, the wcc compiler doesn't even understand -mt.

"-mt" is only a convenience switch for wcl (Watcom's "compile & link"
tool) so it knows that after executing wcc -ms it has to pass the
"SYSTEM COM" option to wlink.

Building a COM itself is well documented and hence easy to achieve. The
problem here is that I was trying to make Open Watcom build a tiny (as
in "very small") COM file by avoiding Watcom's libc and startup code,
ie. passing "OPTION NODEFAULTLIBS" to wlink. Then the COM file indeed
becomes very small, but it also ceases working, as the generated code
seems to expect to be executed within an environment prepared by
Watcom's startup routines.

Most probably my expectations towards Open Watcom were too high. It is
an awesome tool, but it's simply not designed to build minimalist COM
files without major hackery.

Such hackery have been done by Peter Szabo (aka pts). I tested just now
his DOSMC tool, and it compiled this program:

void main(void) {
static char *hello = "Hello$";
_asm {
lea ax, main
mov ah, 9
mov dx, hello
int 0x21
}
}

Into this:

00000000 E80400 call word 0x7
00000003 B44C mov ah,0x4c
00000005 CD21 int 0x21
00000007 53 push bx
00000008 51 push cx
00000009 52 push dx
0000000A 56 push si
0000000B 57 push di
0000000C 8D060701 lea ax,[0x107]
00000010 B409 mov ah,0x9
00000012 8B162501 mov dx,[0x125]
00000016 CD21 int 0x21
00000018 5F pop di
00000019 5E pop si
0000001A 5A pop dx
0000001B 59 pop cx
0000001C 5B pop bx
0000001D C3 ret
0000001E 48 dec ax
0000001F 656C gs insb
00000021 6C insb
00000022 6F outsw
00000023 2400 and al,0x0
00000025 1E push ds
00000026 01 db 0x01

Works perfectly, at least on this simple test example.
Too bad DOSMC is a perl Linux-only tool.

https://github.com/pts/dosmc


Mateusz
R.Wieser
2023-08-04 17:02:28 UTC
Permalink
Mateusz,
Post by Mateusz Viste
Post by R.Wieser
Try "-mt" (tiny memory model) instead.
"-ms" is the proper switch for compiling object files for COM
executables. In fact, the wcc compiler doesn't even understand -mt.
Thats too bad. At least you can't say I didn't try. :-)
Post by Mateusz Viste
Building a COM itself is well documented and hence easy to achieve.
The problem here is that I was trying to make Open Watcom build a tiny
(as in "very small") COM file by avoiding Watcom's libc and startup code,
I read your first message describing that. Don't worry.
Post by Mateusz Viste
Then the COM file indeed becomes very small, but it also ceases working,
as the generated code seems to expect to be executed within an environment
prepared by Watcom's startup routines.
Strange. Being able to specify a DOS COM output, but not actually getting
it. :-\

Regards,
Rudy Wieser
Mateusz Viste
2023-08-04 19:18:03 UTC
Permalink
Post by R.Wieser
Thats too bad. At least you can't say I didn't try. :-)
And your kind effort is very much appreciated. :)
Post by R.Wieser
Strange. Being able to specify a DOS COM output, but not actually
getting it. :-\
Indeed. It's COM all right, but only as long as one links to the
Watcom-supplied startup library (or equivalent). If not, then all bets
are off.

Mateusz
Mateusz Viste
2023-11-09 23:19:40 UTC
Permalink
Post by Mateusz Viste
It appears that the COM file is not being originated at offset 0x100,
despite the "FORMAT DOS COM" wlink directive. It's also not
0-originated, so I am not sure how the offsets are calculated exactly.
Once I fix them with a hex editor, the executable works.
What am I missing here?
Hello all,

I talked with Bernd Böckmann today and I was surprised to learn that
he tackled this very same problem recently. He was, however, far more
successful than me and kindly shared the piece of information that I
have missed all along.

Bernd said:
"Because in tiny memory model the code is in the same segment as the
data, the linker must be told to merge these segments to a single one
while linking, otherwise the addresses are messed up. This is done by
the GROUP directive in startup.asm, which includes _TEXT (as opposed to
the .EXE version)."

The need of a custom startup code was already hinted in this thread by
Alexei A. Frounze, and I did attempt to create such startup back then,
but the necessity of grouping segments was lost on me.

Bernd provided me with a working example of his startup code. With this
new bit of information I was able to adapt my proof of concept project
- and this time, it works! The resulting executable size is 45 bytes.
I am pasting here below all the files for posterity.

Mateusz


--- HELLO.LNK ---------------------------------------------

name hello
system dos com
option map
option nodefaultlibs
file startup
file hello

--- HELLO.C -----------------------------------------------

void main(void) {
char *hello = "Hello$";
_asm {
mov ah, 9
mov dx, hello
int 0x21
}
}

--- STARTUP.ASM -------------------------------------------

.8086

dgroup group _TEXT,_DATA,CONST,CONST2,_BSS,

extrn "C",main : near

; public _cstart_, _small_code_, __STK
public _cstart_, _small_code_

_TEXT segment word public 'CODE'
org 100h

_small_code_ label near

_cstart_:
call main
mov ah, 4ch
int 21h

; Stack overflow checking routine is absent. Remember to compile your
; programs with the -s option to avoid referencing __STK
;__STK:
; ret

_DATA segment word public 'DATA'
_DATA ends

CONST segment word public 'DATA'
CONST ends

CONST2 segment word public 'DATA'
CONST2 ends

_BSS segment word public 'BSS'
_BSS ends

_TEXT ends

end _cstart_

--- BUILD.BAT ---------------------------------------------

wasm startup.asm
wcc -os -zl -ms -s -bt=dos hello.c
wlink @hello.lnk

-----------------------------------------------------------

Loading...