Sections

2024-02-23

GISA - Getman Instruction Set Architecture

NOT FINAL

Why another ISA?

To have one that can be written like Protrackter MODs, that is the opcode, registers, and immediates are clearly apparent from the machine code without the need for a disassembler, and also the machine code can be written directly without the need for assembler, meaning it can be easily cross-checked. Back in 2018 I had to use all 3 attempts at the exam from MIPS assembler with pipelining, branch prediction, hazard detection and stalling, so I'll make my own ISA where I don't have to deal with this, making the scheduler by my own later.

Also I don't like RISC-V being primarily little endian and missing an official 16-bit variant. Little endian is confusingly notated in hexadecimal in mixed endian actually, so it's like reading Kharoshthi or Tamil. Network byte order is big endian, therefore a big endian ISA is better suited for devices connected to the Internet, or so called IoT. Microcontrollers don't need to be 32-bit, and while the 8 bit segment is served by the well-known 6502 and Z80 alredy, the 16 bit segment is somewhat lacking in widely famous architectures, mostly PIC, AVR, some TI chips, expanded 8-bit designs like 65816 and Z180, and proprietary console chips.

On the other side of the bit count, RISC-V is reluctant to fully specify its 128-bit variant, and is entertaining the idea of having near and far pointers again, after all that 80286 segment nonsense. In GISA, everything is the same amount of bits, so the floats would also be 128 bit, which is useful.

Furthermore, you need to pay membership fees to the RISC-V foundation to be able to have your products boast about using RISC-V, just like with Khronos Group and Vulkan. It might be cheaper to just copy LoongArch like Loongson did with MIPS. I prefer to call things what they really are, be it someone's trademark or not.

There are no privilege levels, there is no MMU, everything runs with full access to everything, just like YHWH/Gospodin/Allah intended, and just like it was on the Amiga and DOS. This is a better approach for real-time embedded systems than having to deal with some address translation and prolonged context switching. Remember the connection to music making. Note that as per hadith, only "vain entertainment" music is haram, and MODs are nerdy music.

Eventually I plan to write a B compiler for word-address variants, tentatively called HornyB or HoneyB, and write my own TempleOS or Oberon, tentatively called MosqueTOS, but it may end up more like SerenityOS because POSIX. The byte-address variants get a C compiler and a port of Plan9/Inferno, MINIX 1/2 or uClinux, and may thus run something actually useful. All BSDs ever since 386BSD require MMU. Porting HelenOS, MINIX 3, Mach/Hurd or other microkernel OSs to this noMMU architecture could probably net me a honorary PhD from MatFyz, VU or MIT, but RTOSes would be more fun and demoscene-y, and Selbstgesamtkunstwerke yield more to monolithic kernels. Should the compilers be too difficult, an assembly port of Workbench/AmigaOS or BASIC may be a possiblity.

Lest I be accused of creating a bytecode instead of an ISA, most bytecodes are stack-based and intended to be emitted by high-level object-oriented language compilers, whereas GISA has full access to entire memory and is made to be written directly, just like the olden days.


Different variants

Variants with different bit widths aren't binary compatible. Everything in them is that bit size, registers, instructions, and address space. There can be some jumpers for emulation of smaller sizes, but that's a sillicone wastage.

The B variants differ from the regular variants by having byte-addresible registers and memory with a byte select mask appended to every register number. This dramatically reduces the number of registers and the byte size of address space, however permits better string manipulation and struct packing, and allows more data types than just int, uint, and float. It would make sense to make these variants little endian, but that would be confusing.


GISA-8 - 4 opcodes, 3 registers, 256 B address space (W/B is redundant)

GISA-16W - 16 opcodes, 16/31 registers, 64k 16-bit words address space (128 kB)

GISA-16B - 16 opcodes, 3 registers, 64 kB address space

GISA-32W - 256 opcodes, 255 registers, 4G 32-bit words address space (16 GB)

GISA-32B - 256 opcodes, 16/31 registers, 4 GB address space

GISA-64W - 65536 opcodes, 65535 registers, 16E 64-bit word address space (128 EB)

GISA-64B - 65536 opcodes, 256/511 registers, 16 EB address space

GISA-128W - 4G opcodes, 4G-1 regs, 268435456Q 128-bit w addr space (4294967296 QB)

GISA-128B - 4G opcodes, 65536/131071 regs, 268435456 QB address space


I prefer 16W over 16B and 64B over 64W, but I am quite split on 32W and 32B. RISC-V has both RV32E and RV32I defined. Some may question the presence of 8 and 16 bit variants, but their paedagocial value is greater than the possibly too large 32 and 64 bit variants.

Each architecture can be harvardized, that is the program counter points to a completely different XOM, maybe on an EEPROM. This seems secure, like W^X, and also doubles the total memory size, but smells of console DRM, and gets rid of the fun self-modifying code provides. There is still a way to extract the contents from within using instruction recovery attack: https://arxiv.org/pdf/1909.05771.pdf . The bootstrap process of the von Neumann version is blit the code into RAM and set the program counter to the entry point.

Once I come up with a way to generate 4 milliards of instructions, I'll specify 128W and 128B, but I don't think there will be ever need to have more than 16 EB or address space at home, and datacenters won't be using some weird guy's embedded ISA.



GISA-8

This one is extremely simple, and could be fun for curious elementary school kids to implement in Minecraft, assuming they are being taught proper discrete mathematics instead of how to be walking calculators. It's not very much capable for a load-store architectiure, possibly try OISC, which means dump the registers. However if I can write a brainfuck interpreter, it means it's linearly bound Turing complete. Good luck fitting the program into 256 B and not overwriting it. Self-modifying code is needed.

Each instruction word is 8 bits and has 4 fields, each of 2 bits. The 1st field is always the opcode. The instruction types are RRI only. There are only 4 opcodes, so this is a FISC. Exceptionally, binary notation is preferred.

Opcodes:

00: AI  dest  src  imm   Add Immediate (signed)  dest = src + imm
01: LA  dest  src  imm   Load from Address       dest = mem[src+imm]
10: SA  dest  src  imm   Store to Address        mem[dest] = src + imm
11: BN  dest  src  imm   Branch if not Zero      pc += (!src) ? dest+imm : 0

Due to there only being 2 bits for the immediate, only values -2, -1, 0, and 1 are possible. After executing BZ, program counter still gets incremented, so you can skip over a instruction with 1.

Registers:

00: r0, zero
01: r1, sp
10: r2, s1
11: r3, s2

There is also a separate program counter, which physically occupies the space where the zero register would be, but isn't connected to the usual read and write circuitry.

There is no interrupt, you have to poll the I/O.

GISA-8 is much more of a Zachtronics puzzle than an actually usable architecture, and is provided mainly for completeness and paedagogy, illustrating the principles of larger GISAs.


Brainfuck interpreter for GISA-8

This is to prove that GISA-8 is linearly bound Turing complete. P" would be enough, but we can do some I/O at least. This is assuming a Harvardized version where the interpreter code itself is in a separate 256 B XOM addressed by the program counter.

Memory map:

00000000 ~ 01111111: Brainfuck Memory (128 B)
10000000 ~ 11110111: Brainfuck Program (120 B)
11111000: Program length; Counter; Code pointer; Data pointer 
11111100: Keyboard Input Data
11111101: Keyboard Input Advance Signal 
11111110: Terminal Output Data
11111111: Terminal Output Advance Signal

Register map:

00: ze      zero
01: mp      brainfuck memory pointer
10: ac      accumulator
11: pp      brainfuck program pointer
--: pc      machine program counter

The Brainfuck program can be self-modifying. Due to extreme space constraints, the syntax may have to be extended to allow numbers to be used for long runs of symbols, so insted of 50 pluses which would take 50 B, you would write "50+", which takes just 3 B. Most sample programs tend to fit within 120 B, though.

Brainfuck command interpretation in assembly:

+   LA ac mp 0 ; SA mp ac 1
-   LA ac mp 0 ; SA mp ac -1
>   AI mp mp 1
<   AI mp mp -1
[   LA ac mp 0 ; BN ze ac 7 ; (ascertain next ] location, put in ac) ; AI pp ac 0
]   LA ac mp 0 ; BZ ze ac 7 ; (ascertain prev [ location, put in ac) ; AI pp ac 0
,   AI ac ze -2 ; AI ac ac -2 ; SA mp ac 0 ; AI ac ac 1 ; SA ac ze 1 ; SA ac ze 0 
.   AI ac ze -2 ; SA ac mp 0 ; AI ac ac 1 ; SA ac ze 1 ; SA ac ze 0

(init)   LA mp ze 0 ; LA pp ...  (obtain value 128 somehow)
(next)   AI pp pp 1 ; LA ac pp 0 ; BN ze ac 1; BZ ze ze -1; (loop upon reaching \0)

The [ and ] would be much easier when implementing a compiler instead of an interpreter, which would have figured out the jump addresses at compile time. It's also annoying that there's no space for logic, and I haven't decided upon which condition to branch.



GISA-16W

This width conveniently comes out as exactly 1 hexadecimal digit for opcode and registers. This makes it highly resemble a Protracker channel, just missing the note column.

There's some VLIW potential of programming 4 cores independently. A possible nickname for GISA-16W-VLIW4 is PTISA, a pun on Interslavic "ptica", GISA-16W-VLIW8 could be OCTISA after Octalyzer, GISA-16W-VLIW16 could be STISA after Scream Tracker 3, and GISA-16-VLIW32 could be MTISA after Multitracker, for cores sitting on many I/O devices, as there would be only 8 kB per core. This is intended for controlling stage hardware without MIDI.

I'm indecissive of how to indicate hexadecimal - $AB like 6502, ABh like Intel, 0xAB like AT&T, or xAB like laziness.

Effects Opcodes:

0: ADD      dest src1 src2     Add
1: AND      dest src1 src2     And
2: OR       dest src1 src2     Or
3: SUB      dest src1 src2     Subtract (compare)
4: ADDI     dest src  imm      Add immediate
5: SLAI     dest src  imm      Shift left arithmetical immediate
6: SLLI     dest src  imm      Shift left logical immediate
7: SRI      dest src  imm      Shift right immediate
8: XOR      dest src1 src2     Exclusive or
9: XAND     dest src1 src2     Exclusive and (if and only if, exclusive not or)
A: LA       dest imm           Load address
B: SA       src  imm           Store address
C: BEZ      dest imm           Branch on equals zero
D: BNZ      dest imm           Branch on not zero
E: J        dest imm           Jump reg+imm
F: JAL      dest imm           Jump and link

The operations are inspired by school MIPS lessons. I'm not sure how much of a Turing tarpit is this selection. May need to fit interrupts somewhere, but they are basically a jump to a table offset, that says jump to a handler.

Note to self: TIS-100 has MOV, ADD, SUB, NEG, JMP, JEZ, JNZ, JGZ, JLZ, JRO.


Registers:

0: r0, zero
1: r1, sp
2: r2
3: r3
4: r4
5: r5
6: r6
7: r7
8: r8
9: r9
A: r10
B: r11
C: r12
D: r13
E: r14
F: r15

Because there is no byte addressibility, the address space spans 128 kB instead of usual 64 kB. This yields to UTF-16 instead of UTF-8, and the bit wastage isn't that high because 42069 (A3F3h) is a perfect 16-bit number.

The regular von Neumann version may be a good place for porting Wozmon, as those 256 B of dense 6502 code wouldn't fit in the address space of even Harvardized GISA-8 while having to express that with just 4 different types of instructions.



GISA-16B

There is no byte addressible VLIW version. 64 kB of RAM would only fit 8k VLIWs for 4core, or 1k VLIWs for 32core.

The register fileld is broken up into 2 evenly sized parts. The higher crumb determines the register, and the lower crumb represents a byte select mask. This leaves us with 4 registers.

There are 2 bytes in 16 bits, and so the bytemask is 2 bits.

00: no byte selected / the other register file (O)

01: low byte selected (L)

10: high byte selected (H)

11: all bytes selected (X)

Being able to select no bytes alleviates the need for a hardwired zero register, however there are now number of registers way to specify zero, so optionally there can be another set of registers minus one that will now be the only zero, this time without byte subdivisions, like for addresses, integers, UTF-16 and stuff. This will increase the number of registers to 8, at the expense of unnecessary complexity for a 16-bit processor, unlike the plain simple 16 16-bit only registers.

Intel's 4 general purpose registers come to mind here, so I'll use their letters.

Registers:

0: AO, r0, zero
1: AL, r4l
2: AH, r4h
3: AX, r4
4: BO, SI, r5 *
5: BL, r1l
6: BH, r1h
7: BX, r1, sp
8: CO, DI, r6 *
9: CL, r2l
A: CH, r2h
B: CX, r2
C: DO, BP, r7 *
D: DL, r3l
E: DH, r3h
F: DX, r3

* optional register, act as r0 if not implemented, or maybe halt and catch fire


Wozmon for GISA-16B

Wozmon is a memory editor and assembler for 6502, that Wozniak of Apple has managed to cram in zeropage, that is in 256 B. We only have 256 B of memory address space on GISA-8, and also none of that instruction compactness, so it would't fit there.

Porting of this needs to find a way around the lack of interrupts. It will finally use von Neumann memory architecture. The 6502 had 3 8-bit registers, here we have at least 4 16-bit ones.



GISA-32W

This is the smallest GISA to have float support, as the number of opcodes started to allow it. The float registers are separate from integer registers. There is now a full byte for each opcode and register. I would consider this to be the base ISA, other widths are mostly derivations.

Following the tracker inspiration, GISA-32W-VLIW32 could be called FTISA (ptica with a lisp, after FastTracker 2), GISA-32-VLIW64 ITISA after Impulse Tracker, and GISA-32W-VLIW128 MPTISA after (Open)ModPlug Tracker. Having 128 cores with just 16 GB of memory would still be quite silly (512M per core), but at least you can race condition the entire memory.

There was this unreadable opcode diagram I made in 2019:


May need some adjusting and rethinking, mainly regarding unsigned.


Opcodes, column is the 1st nibble, row is the 2nd nibble:

     0    1   2   3    4   5   6   7   8   9   A   B    C    D    E    F

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F


The instruction format is in the least significant bits of the opcode, as we the people don't see ADD as too different from ADDI, and unlike RISC-V, which is for machines, GISA is for people, at least those like me.


Registers:

00: r0, zero
01: r1, sp
02: r2
etc.
FF: r255


Floating point registers are a separate file. They can be combined to make vectors, but that's more of a compiler, scheduler and thermals thing. Naturally only IEEE single precisstion is supported.

The main programming language for GISA-32W is B. The sinuglar word type can get interpreted either as an uint32, int32, float32, char32, or pointers to such. Most importatntly, pointers in B are in words, so *(0xFFFFFFFF) points to the last word, which would be *(0x3FFFFFFF0) read size 4, on a byte addressible architecture. If somehow wanting to use C, you get only int, unsigned, float, wchar. Software packing of 16-bit and 8-bit types is possible, but then the pointers would have to be in bytes, so quarter of memory would be unreachable.


There it may make sense to have some control and status registers, which could even switch between GISA-W and GISA-B, as well as emulate GISA-16 and GISA-8.


GISA-32B

The byte for specifying register is split into 2 nibbles, therefore there are 16 registers with 16 possible modes of access using an agruably INTERCAL inspired select mask:

0: 0000: zero or alternate register file
1: 0001: the lowermost byte -> 8b
2: 0010: lower middle byte -> 8b
3: 0011: lower half or lower word -> 16b
4: 0100: upper middle byte -> 8 b
5: 0101: upper middle and lowermost byte -> 16 b
6: 0110: upper middle and lower middle byte -> 16 b
7: 0111: all but the uppermost byte -> 24 b
8: 1000: the uppermost byte -> 8 b
9: 1001: the uppermost and lowermost byte -> 16 b
A: 1010: the uppermost and lower middle byte -> 16 b
B: 1011: all but upper middle byte -> 24 b
C: 1100: upper half or upper word -> 16 b
D: 1101: all but lower middle byte -> 24 b
E: 1110: all but the lowermost byte -> 24 b
F: 1111: all bytes -> 32 b

The discontinuous selects can be slid together or continuous selects can be pulled apart using a different select mask on the destination register. If the number of selected bytes in the destination mismatches with number of bytes selected in the source, the result depends on endianess, but GISA should be big-endian. There are too many little endian ISAs already. This is basically connecting part of registers to the parts of the bus to make lesser sized reads.

This allows for some intersting string manipulation:

DB 400h "AZOV"
DB 404h "KRYM"
LI r2mf, *400h
LI r3mf, *404h
ADD r3mf, r3mc, r2m3  ; "KROV"
SA r3mf, 408h

Whereas on 32W you would need to make 2 ANDs and 1 OR instead of 1 ADD, and consume 1 more register.

It also makes it possible to implement 3 byte or 24 bit integers directly in hardware, and use the leftover byte for some char or bool, or to implement some rudimentary vectors.

Byte selector works also on floating point registers too, but casting is not so straightforward. 8-bit and 24-bits float are handled as truncations of 16-bit and 32-bit floats, until I come up with a useful format or control registers to change bit allocations to your liking.

The C types would be:

char - 8 bits

short short float - 8 bits (compiler hack)

short short - 16 bits (compiler hack)

short float - 16 bits (an old C++ proposal)

short - 24 bits

int - 32 bits

float - 32 bits

wchar - 32 bits

long - 64 bits (paired registers)

double - 64 bits (float-float or software)

long long - 128 bits (4 registers)


GISA-64W

This is in fact more of a CISC architecture, because there are 65536 opcodes, which is around the number on x86 with all those extensions. Some of the 16 bits in the opcode could correspond directly to ALU controls, and other could clearly encode the instruction format. Collecting all the instructions on all the CISC CPUs and adapting their format to RISC would be just like collecting roots for GeSeL.

For the intents of vector instructions, the register file is all zero registers beyond FFFF.

The memory addressed is 128 EiB or 2^67 B. This is about the maximum I expect consumer devices to ever reach. Such amount of sillicon atoms for the perfectly miniaturized memory cells would fill an entire room.

There is no VLIW because no tracker implements more than 256 effects. But if I finally finish ReTeMuS (Renderer/Redactor of Text Music Sequences, also called vitracker, MT637, GSS etc.), then there may be a 256core version, that is only 256T VLIWs in RAM.


GISA-64B

Much more preferable, since no one is willing to devote the sillicon space to 65536 64-bit registers (8 Mb). 256 of them is much more acceptable, with 256 different byte masks.

This makes it possible to directly implement also 5 byte (40 bit), 6 byte (48 bit), and 7 byte (56 bit) integers and floats, in addition to 3 byte (24 bit) intergers and floats from GISA-32B. For floats, default behavior is to truncate the mantissa of the next higher IEEE standard, however there are many bespoke extended precision float formats, notably the 40-bit one used by BASIC.

As usual, there can be another set of non byte-addressable registers for when accessing them with a zero mask, bringing the total to 512. To reiterate, float registers have their separate file, that too can have alternate file for zero select mask, so there can be 1024 total registers.

There is still no MMU, so porting TempleOS is made easier. This is not meant to run regular operating systems, where the kernel controls hardware-assisted memory management. User is considered smart enough to know where they are writing to or where they are reading from, like some MMIO device, and also when to not cause a race condition.

The C types would be:

char - 8 bits

short short float - 8 bits (compiler hack)

short short - 16 bits (compiler hack)

short float - 16 bits (an old C++ proposal)

short - 24 bits

int - 32 bits

float - 32 bits

wchar - 32 bits

short short short long - 40 bits (compiler hack)

short short short double - 40 bits (compiler hack)

short short long - 48 bits (compiler hack)

short short double - 48 bits (compiler hack)

short long - 56 bits (compiler hack)

short double - 56 bits (compiler hack)

long - 64 bits

double - 64 bits

long long - 128 bits (paired registers)

long double - 128 bits (double-double or software)

long long long - 256 bits (4 registers, compiler hack)


GISA-128W

There are 32 bits for the opcode, so each of them can be directly assigned to a control somewhere down the instruction decoder and ALU config. This can create many "undocumented" instructions, like with the 6502. I'll just need to decide on the topology. Also you won't need any cache with 4 milliard registers.

The address space without byte addressability is humongous. 2^132 B, with Harvardized program counter even 2^133 B. There are no SI prefixes for that, not yet approved QiB is only 2^100, 3 more prefixes are needed. Don't you ever come at me with this not being enough. There's barely enough sillicon atoms in the entire Earth for a milliard of such memory kits, less than 1 per 8 people.


GISA-128B

4G registers is so many it could be construed as a direct 32-bit addressing of the memory. Therefore it's highly advisable to reduce it to 65536 with 16 individually addressible bytes or their combinations. And yes, there is the option to have another 65536 non-byte addressible registers if that's too little now.

In addition to GISA-64B, there can be 9 to 15 byte (72 to 120 step 8 bits) integer and float types alike. A specific consideration should be made for 80-bit extended precission most notably known from Intel 8087, 72 bit float being truncation thereof.

The C types are the same as GISA-64B, but there are additionally these hacks:

short short short short short short short long long - 72 bits

short short short short short short short long double - 72 bits

short short short short short short long long - 80 bits

short short short short short short long double - 80 bits

short short short short short long long - 88 bits

short short short short short long double - 88 bits

short short short short long long - 96 bits

short short short short long double - 96 bits

short short short long long - 104 bits

short short short long double - 104 bits

short short long long - 112 bits

short short long double - 112 bits

short long long - 120 bits

short long double - 120 bits

long long double - 256 bits (quadruple-quadruple or software)

long long long long - 512 bits (4 registers)

But using int###_t and float###_t is probably better. The compiler is to manage the register splits and masks, but it may be a good idea to hint it with proper declaration order.


There is unlikely to ever be a need for an address space bigger than 128-bit, since there is not enough sillicon atoms in the Earth to make this much memory. Or would you like to mmap all the Internet? If you have read everything up to here, you could very well extrapolate GISA-256W and GISA-256B, but there would be nothing new, everything just would be wider with more dead space. You would get octuple precission floating point, though.




No comments:

Post a Comment

Barely anyone comments, so I don't moderate. Free advertising, I guess.