Reusing code, reverse engineering and collaboration

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
post-thumb

Published on 30 June 2022 by Andrew Owen (16 minutes)

In this article I’m going to talk about code reuse, reverse engineering and the importance of collaboration. The example I’m going to use is my own hobbyist computer, which evolved into the Chloe 280SE. For simplicity’s sake, I’m going to lay it all out in chronological order, starting in the 1960s. This is going to be a long one. You might want to grab a cup of coffee and a box of donuts.

In 1964 at Dartmouth College in the United States, Hungarian-American John George Kemeny and Thomas Eugene Kurtz developed the BASIC (Beginner’s All-purpose Symbolic Instruction Code) programming language. Unlike existing high-level languages that were aimed at computer scientists and mathematicians, BASIC was designed to be used by everyone. That’s why its instruction set is based on English.

In July 1975 Microsoft released a version of BASIC for the MITS Altair 8800 hobbyist computer. It was written by Paul Allen and Bill Gates on an Intel 8080 emulator running on a DEC PDP-11. By then, Kemeny and Kurtz had addressed the main criticisms of BASIC; that it lacked structure and encourage bad programming habits. But Microsoft BASIC was inspired by DEC’s BASIC-PLUS which was essentially Dartmouth BASIC with the addition of MID$ and LEFT$ commands for string slicing.

“The less fortunate BASICs picked up bad habits and vulgar language. We would no sooner think of using street BASIC than we would think of using FORTRAN.”—Kemeny & Kurtz_

Early versions of Microsoft BASIC supported only integer math, but Mote Davidoff was convinced that floating-point was possible, and this led to the creation of Microsoft Binary format (MBF). This version was subsequently supplied with the majority of 8-bit microcomputers, and made Gates and Allen their first fortune. Notably, it was used in every Commodore 8-bit computer from the PET onward after Jack Tramiel negotiated a fixed-fee permanent license that didn’t require a Microsoft on-screen credit.

In 1978 when the ANSI Standard for Minimal BASIC (X3.60-1978) was launched, it was based mainly on the Microsoft version. In England the following year, Clive Sinclair put Jim Westwood in charge of creating the ZX80 microcomputer. Inspired by his son’s enjoyment of the TRS-80, Sinclair wanted to create something much more affordable. This precluded licensing Microsoft BASIC, and instead Sinclair turned to John Grant at Nine Tiles.

Grant didn’t stand to make much money on the deal, but he thought the project was worthwhile. He wrote the 4K integer BASIC in Zilog’s own assembler on its ZDS1 development system. While looking for ways to save memory space, he devised the keyword entry system, removing the need for a tokenizer. He also included on-entry syntax checking, a feature shared with Atari BASIC (also written in 1979) but absent from most other versions.

Even before the ZX80’s launch, Sinclair was already planning its successor, the ZX81 (also known as the TS1000). Ferranti had developed an early precursor to the field programmable gate array (FPGA) called the uncommitted logica array (ULA). Westwood replaced almost all the circuitry of the ZX80 with a single ULA. This almost halved the cost of the machine. Besides the ULA, the only chips on the circuit board were the CPU, RAM and ROM, making it a precursor of SoCs (system on a chip).

Sinclair again went to Grant for the BASIC, who in turn contracted Cambridge mathematician Steve Vickers to write it. Sinclair gave him an extra 4K to work with and the requirement that the math package be improved. Vickers retained much of Grant’s code, only making modifications where strictly necessary. In place of MID$ and LEFT$ he enabled the user to select a range of the string with s$ (fTOl), where f is the first character and l is the last character in the sub-string of s$.

For the floating point routines, Vickers used a format almost identical to the 40-bit version of MBF. But he added the ability to store integer numbers in the range -65535 to 65535. The user doesn’t have to specify variable types. And it’s faster than BASICs that only store numbers as floating-point, such as Commodore’s.

Despite some early production issues, the ZX81 went on to sell over 1.5 million units. As a result, and partly to try to win the contract to design a computer for the BBC, Sinclair decided to create a color version. Westwood was working on other projects, so the task of designing the hardware went to Richard Altwasser. This time, Vickers was given 16K for the BASIC. They worked closely with industrial designer Rick Dickinson because the keyboard requirements were closely linked to the BASIC.

Grant and Vickers would’ve liked to start over, but there wasn’t time. So the ZX81 BASIC was used as the starting point, with the addition of color, graphics and sound commands. Vickers further improved the math pacakge. And because the hardware design was incomplete, he created a hardware abstraction layer of channels and streams. But three months before launch, Vickers and Altwasser both resigned to form the Cantab company to produce the Jupiter ACE (essentially a ZX81 with Forth in place of BASIC).

The ZX Spectrum was launched in April 1982. But work on the RS232, networking and microcassette storage system was unfinished, and the BASIC was incomplete. Sinclair packaged this functionality in a hardware add-on called the Interface 1. It had its own firmware written by Ian Logan (who had taken Frank O’Hara’s disassembly of the ZX81 BASIC as a starting point and written a disassembly of the ZX Spectrum BASIC).

There was more than 1K of free space in the BASIC ROM. Using Logan’s disassembly, hackers with EEPROM burners were able to create custom versions of the ROM. The most common addition was a tokenizer. This removed the need to memorize key combinations or refer to the keyboard to enter keywords. Many of these tokenizers supported abbreviations using a period ( . ), another feature shared with Atari BASIC.

In 1983, Timex introduced the TS2068, its version of the ZX Spectrum, with improved sound and graphics hardware and an expanded 24K BASIC (with input from Vickers). In 1984 Timex Portugal introduced the TC2048, a cut-down version of the TS2068 without the sound chip and using the older 16K BASIC. In 1985, Investronica launched a version of the ZX Spectrum with 128K of RAM, the same sound chip as the TS2068 and an expanded 32K BASIC (written by Martin Brennan, Kevin Males and someone with the initials AT). Then in 1986, Unimor and Polbrit launched a machine in Poland derived from the Portugese version of TS2068.

In 1986, Sinclair’s company ran out of money, and it was sold to Amstrad. During due diligence, it was discovered that Sinclair had never paid Nine Tiles and Amstrad had to buy the rights to the ZX Spectrum BASIC. Nine Tiles retained the rights to the earlier versions.

As a result, during the development of the SAM Coupé, MGT approached Nine Tiles about licensing Vickers’ math package, but the asking price was too high. Andrew Wright wrote SAM BASIC by starting with the ZX Spectrum BASIC ROM and replacing one module at a time until he had an entirely new program that was broadly backwards compatible.

Fast-forward to 1999. By this time, there had been literally hundreds of ZX Spectrum clones created, particularly in the former Soviet nations. There were probably almost as many ZX Spectrum compatibles out there as Commodore 64s (the biggest selling 8-bit computer of all time). Discussion on Usenet indicated that there was an appetite for a modestly enhanced ZX Spectrum compatible machine that could run as much existing software as possible, and I got to thinking how this might be achieved.

In Poland, Jarek Adamski had already created a modification for the Timex models that enabled them to run software written for the Investronica machine, then the most popular model. I came up with the idea of connecting an additional 128K of RAM to the Timex’s memory management unit (MMU). In the TS2068, the MMU connected to the additional 8K of BASIC and a 64K cartridge port. In the TC2048, it didn’t connect to anything. Being able to have 32K of RAM in the area normally occupied by the ROM enabled the machine to emulate earlier systems such as the ZX80, ZX81, Jupiter ACE and TS2068.

The sound chip was fitted and connected to both sets of ports used by the Timex and Investronica models. We called it the ZX Spectrum SE. The prototype used a version of the 32K BASIC, modified for improved backwards compatibility with the original ZX Spectrum model. I also came up with a palette system to increase the available colors from 16 to 256, but implementing it in the prototype would’ve required replacing the Timex version of the ULA, which at the time was undocumented.

Having an actual prototype was enough to get the machine supported in the FUSE emulator. With the hardware done, I started work on the BASIC. Initially, this was a case of combining all my favorite patches from publicly available modified ROMs. These included Ian Collier’s extended LIST command, Jonathan Graham Hartson’s hexadecimal support, Jiri LeMac’s tokenzier, Milan Pikula’s improved cursor handling, and many features by Slavomír Lábsky and many publicly documented bug fixes.

Fast-forward again to 2008 and Chris Smith was developing his own ZX Spectrum clone in discrete logic. He was friendly with the boss of Datel, a company that’s best known for producing cheat cartridges for microcomputers and consoles. Datel laser sliced and scanned the ULA chip for Smith, enabling him to completely reverse engineer it. He wrote up his findings in a book called How to Design a Microcomputer which I edited. He also had the idea to create a plug-in replacement ULA based around a CPLD or FPGA and informed me that there would be room to support my palette enhancement.

I gathered together a team of enthusiasts to help me thrash out the specification for what would become ULAplus on Google Wave. It included hardware designers, emulator authors and end users. I’ll go into more detail on ULAplus in a future article on the Chloe 280SE hardware, but it was originally implemented by Alessandro Dorigatti as an FPGA core for the Turbo Chameleon 64 cartridge for the Commodore 64 in 2009.

In 2010 I created version 2.0 of SE Basic with support for ULAplus including the PALETTE command. By now I was using a cross-assembler instead of directly patching the original ROM, but it was still based on proprietary copyright code. It was licensed for use in emulation, but it couldn’t be used in hardware (current owner Comcast maintains this arrangement). It became clear a replacement firmware was required.

Matt Westcott had independently started coding the Open82 project, an open source replacement firmware for the ZX Spectrum. I wondered if Grant and Wright might be open to relicensing the ZX81 and SAM BASICs under an open source license. So I asked them and they said yes. Combining those routines with the non-derivative code in SE Basic and Open82, and using the cassette routines from a SAM Coupé peripheral called The Messenger got me very close to a fully working open source replacement firmware, but I was still missing some key routines.

Under UK law, which was applicable to ZX Spectrum BASIC at the time, it was legal to reverse engineer software for the purpose of interoperability without requiring a clean room (in a clean room, those who do the reverse engineering can’t be familiar with the code). The Z80 CPU provides multiple instructions that can create a different binary but achieve the same result (for example you can replace LD A, H; OR L with LD A, L; OR H). This is how The Messenger was created and so I reasoned it could be used on the missing code. The caveat was that it wasn’t legal to provide documentation, which meant that the source would remain uncommented.

In April 2011 OpenSE BASIC 3.0 was released with the approval of everyone whose code was included. This went on to be included in Debian’s main distribution, which enabled FUSE to also be included. At the end of the year I forked the code to create SE Basic IV which specifically targeted the ZX Spectrum SE hardware. It was a 32K ROM with two separate versions of BASIC; the standard version and an 80 column version using the Timex high resolution video mode. Swapping between the two was accomplished using a MODE command.

In Spain in 2013 a team of ZX Spectrum enthusiasts got together to design an FPGA-based clone (the ZX-Uno). Initially it aimed to support only the original models. But I persuaded Miguel Angel Rodríguez Jódar to add the ZX Spectrum SE specification with ULAplus. When the crowdfunded version was released in 2016 it included OpenSE BASIC 3 and SE Basic IV making it the first legal unlicensed ZX Spectrum clone.

In 2017 a licensed ZX Spectrum clone that was similar to but incompatible with the ZX-Uno was successfully crowdfunded on Kickstarter. At that point I realized that the ZX Spectrum SE’s window of opportunity had closed, and if I was going to carry on with the project, it needed a new direction. And thus the Chloe 280SE was conceived as an FPGA core for the ZX-Uno that would make the best possible use of the hardware.

There was a problem. The ZX-Uno supported an SD card interface, but the only operating system available for it was closed source. This made developing a new BASIC interpreter for it a huge headache. The solution was to reverse engineer the OS kernel, using the same techniques originally used for SE Basic. And thus I created UnoDOS 3 (named for the host hardware and a play on words in Spanish).

With the OS problem solved, in 2019 I finally got the ZX-Uno that I’d had since 2014 assembled, and I could begin developing SE Basic 4.2. In case you’re wondering, version 4.1 was a failed big bang attempt at refactoring the code. That approach was a mistake and so I started over with OpenSE Basic 3.

As an aside, in the Chloe, the first 16K of BASIC and the first 8K of the OS are stored in ROM. The remaining 7K of BASIC and 4K of OS are copied to RAM from a 16K boot ROM that initializes the system during a cold restart. The framebuffer is shared with main RAM, but in normal operation it is paged out. The area that used to hold the framebuffer now holds the last 7K of BASIC, including the filesystem commands that can’t be placed in the lower 16K of ROM (because they page in the OS ROM and RAM in that area). Some of the benefits of this approach are that the boot ROM can check for firmware updates, and the OS ROM doesn’t need to load anything from disk.

I had started using GitHub when I began development of SE Basic IV, but now I’d lean heavily into it. What had begun as a personal project in 1999 would become a portfolio piece; demonstrating modern development practices including APIs, cross-compilation, dev ops, docs as code, emulation, IDEs, JSON resources, localization, and so on.

With the infrastructure in place I got to work on the code. I stripped it back to something resembling ZX81 BASIC, with no sound, graphics, printer or filesystem commands. With backwards compatibility no-longer a concern, I decided to make the syntax as close as possible to Microsoft BASIC, on the basis that that’s what most people would be familiar with.

I did about a year of solid work on the project before I got the developer equivalent of writer’s block. Most of the core functionality was done, but the graphics and sound commands were still missing, and the code I had needed cleaning up. I needed a change of pace.

So I reached out to Translation Commons, where I had a previous relationship with as a volunteer editor, for help with localization. Thanks to a band of volunteer translators I was able to increase the coverage to over 20 languages. Now with the aid of machine translation, the number is closer to 50.

Meanwhile, Daniel Nagy had been working on his own ZX Spectrum firmware for several years. We’d been bouncing ideas off each other, and he’d used a few ideas from SE Basic IV in his project, but as yet he hadn’t contributed. He was eager, but he needed a Linux build environment.

For years I’d been using Simon Brattel’s Zeus assembler, mainly because he was always very good about feature requests. However, Apple had dropped support for 32-bit binaries, effectively killing my ability to run Windows apps like Zeus under WINE, so I was already looking for an alternative. Fortunately I discovered RASM, whose author Edouard Berge is equally amenable to feature requests.

One of the other things I did while I was blocked was working on the user documentation, basing it on Rob Hagemans’ docs for his version of PC-BASIC which are licensed under Creative Commons. This provided an invaluable reference for how new commands were supposed to function as Daniel was unfamiliar with the Microsoft dialect.

At the beginning of 2022, in the space of a month, Daniel made a huge contribution to the project, and I got my enthusiasm back. He converted the logic from Boolean to bitwise, added missing math functions, added MID$ and LEFT$ string slicing (while retaining Vickers’ method), changed the order of precedence, fixed the detokenizer, and added long variable name support.

Just having someone else to work with on the code gave my own productivity a huge boost. In the next release I refactored the filesystem code including channel and stream based file access, added a compatibility pre-parser, fixed macro support, and wrote a series of GitHub actions to automate build and deployment.

In the same release, Daniel added IFTHENELSE and WHILEWEND loops. And in the one after that I added a public API for the OS and an online help system based on a subset of Markdown and HTML. In the next release we’re going to tackle the sound and graphics.

And that’s the partial story of the Chloe firmware so far. There are many people who I haven’t named in this article who gave up their time; testing, writing tools, or supporting the project in other ways. I’m grateful to all of them and I hope the project stands as a testament to the value of code reuse, reverse engineering and, above all, collaboration.

Update

I released the final public beta of the BASIC in July 2023 and moved on to working on the high level operating system functions. One of the final pieces in the puzzle was writing a 40-bit floating point to 32-bit unsigned integer converter for use in random file access. And then the original development hardware died. But I have new hardware now. SE Basic IV is now its own unique dialect of BASIC. Its features include:

  • 40 column (16 color) and 80 column (2 color) paletted video modes.
  • Always-on expression evaluation (use variables as filenames).
  • Application package format with support for turning BASIC programs into apps.
  • Automatic typing of integer and floating point numbers.
  • Bitwise logic (AND, NOT, OR, XOR).
  • Built-in help system.
  • Choice of Microsoft (LEFT$, MID$ and RIGHT$) or Sinclair (TO) string slicing.
  • Composable characters (supports Vietnamese).
  • Disk-based file system (no tapes).
  • Double-byte memory manipulation (DEEK, DOKE).
  • Error handling (ON ERROR…, TRACE).
  • Flow control (IFTHENELSE, WHILEWEND).
  • Full random file access from BASIC (OPEN, CLOSE, SEEK).
  • Full-size keyboard support (DEL, HOME, END and so on).
  • Graphics commands in 40 column mode (CIRCLE, DRAW, PLOT).
  • Localization of character sets, error messages, and keyboard layouts.
  • Long variable names.
  • Motorola style number entry (%; binary, @; octal, $; hexadecimal).
  • Super BREAK (non-maskable interrupt).
  • On-entry syntax checking.
  • PLAY command with 6-channel PSG and MIDI support.
  • Recursive user-defined functions.
  • Token abbreviation and shortcuts (&; AND, ~; NOT; |; OR, ?; PRINT, '; REM’).
  • Undo NEW (OLD).
  • User-defined hardware channels.
  • User-defined character sets (256 characters).
  • User-defined macros.
  • User-defined screen modes.