A PDF copy of this essay can be found here.
This is something that's a little bit out of the ordinary for my blog - something more philosophical than technical - but it's something that's been weighing on my mind for some time now. I've discussed the ideas I present here with many of my friends, and they understood (though not all necessarily agreed!), so maybe the internet at large would be interested as well. I think that the growing homogenization of the internet at large through sites like Facebook and YouTube, as well as the growing use of SSO (Single Sign-On) solutions to use a single account to login to a variety of smaller sites, has led to a chilling effect on the honesty of internet-based discourse; but more importantly, threatens to prevent our complete selves from being expressed to the world, which could prove extraordinarily harmful in the long run.
Saturday, November 17, 2018
Monday, April 30, 2018
Schego Part 4, Or The Heap, Linked Lists, and Other Scary Subsystems
This is the second half (see the first half here) of the giant blog update on Schego I wrote. I concentrated so hard on actually coding, I forgot to update the blog as I went! Hopefully I'll be able to split my time more evenly between documentation and coding, but on with the show; we're going to talk about a couple different new additions to Schego that most interpreted programming languages will have to implement in one way or another.
The first up, and the subsystem that everything else in this blog post depends on in one way or another, is the new heap allocator based on the buddy memory allocation technique. It's fairly venerable (I believe the Linux kernel uses a modified version for kmalloc) and is also pretty easy to implement, with Schego's implementation spanning a hair under 200 lines as of this time of writing.
I'm gonna skimp on justifying the need for a heap allocator since I feel it's kinda obvious (how many non-embedded programs have you written that just used the stack?), but I do think it would be a good idea to examine why I'd use the buddy allocator in particular. It's certainly not the only allocator on the block, but it's definitely the most popular one; when looking around for preexisting allocators to study, the buddy allocator was by far and away the most covered. I think the advantage of having one or more reference implementations is difficult to oversell; when you're unsure if you're "doing it right," having something beyond a white paper is pretty handy. You could argue that if you're about to write a memory allocator you should be able to implement it based solely on a white paper, but the whole point of this series is to teach, right? So let's do some teaching, especially since prose explanations of any memory allocation algorithm are basically non-existent at this point.
The buddy system at its core is pretty intuitive: The basic idea involves simply splitting up the heap into these stretches of memory called blocks that we then can place our variables into. Since we want to handle differently-sized objects, say, a single 8-bit value versus a 32-bit array of 8-bit values, we can have differently-sized blocks. Also, because referring to blocks by how many bits or bytes they take up is kind of unwieldy, let's use the idea of an order system to specify how large our blocks are. So at first, we have some base size that we'll define as order 0 (an order 0 block in Schego is currently 32 bytes, for instance), and then each order beyond that is twice as big as the one preceding it, or in other words, order n has a length of (order 0 length) * 2 ^ n. In Schego, the order calculation looks something like this right now:
By the way, the math in the OrderFor function is just a simplification of the order calculation I mentioned earlier so we can solve for n directly - don't worry that it looks a bit different.
Anyway, now we have a way to refer to chunks of memory, but how do we actually allocate objects? First, we figure out what the smallest order block would be capable of holding the requested number of bytes with OrderFor. If there's already a free block in the heap with the same size order, we're done - we simply return the address of that block. But what if only bigger blocks are available? This will have to happen at least once, too, as the heap starts its life as one giant block.
In that case, we have to subdivide one of the bigger blocks down the middle, and simply repeat the process on one of the halves until we have a block of the size they want. It requires a bit of bookkeeping in practice, but on paper it's fairly straightforward. As a nice bonus, we've created a bunch of other free blocks in the process that should fit the bill if the caller requests allocation of similarly-sized objects in the future, saving us a bit of work. Additionally, if only smaller blocks are available we've run out of memory and need to tell the caller that that's happened, so we only deal with if larger blocks are available. Schego implements the splitting process like so:
So that's allocation handled, but how do we deallocate objects? If you followed the block subdivision part of allocation, you might notice blocks of the same size get created in pairs. Sometimes one of those blocks then gets subdivided further, but this idea of pairs - or buddies - is where the buddy allocation system gets its name. After freeing up a single block (which is as easy as putting its address back on the list of free blocks), we check if its buddy exists and is free, too. If so, we merge the two blocks back together into a single larger block, and do the same buddy check/merge process until no buddy can be found. This keeps the heap from fragmenting and wasting space, and generally makes it such that there are only barely more blocks (each time we request a block, we're potentially creating 2 blocks with that size, remember?) than necessary to cover all objects in the heap. In Schego, the merge process is done like this:
After all that, our top-level Allocate() and Free() functions are as simple as this:
The only real downside that I'm aware of with the buddy allocator is that there's a bit of fine-tuning involved with block sizes; if blocks are too big, there will be a lot of wasted space. Going back to Schego, every time an 8-byte integer is allocated, 24 bytes of heap memory are currently wasted. On the other hand, smaller block sizes mean more merging upon deallocation, which is a performance hit and time wasted. I haven't looked yet to see how heap allocators in other virtual machines handle this, but I plan on doing so shortly.
That was a lot to take in, I know, but once we have our heap built and tested, the most we'll have to interact with it is to use its Allocate() and Free() functions - the rest is all under the hood. So all that said, let's examine the next piece of the puzzle - strings.
Strings aren't a difficult concept. They're just a bunch of bytes in memory all lined up in a row that get treated as letters and stuff - most of you reading this now could've probably implemented basic string functionality in Schego given a heap allocator pretty easily. The R7RS Scheme spec throws a mild curveball at us, however, in that proper string support also involves supporting UTF-8 as well.
Pretty much all of you are familiar with the ASCII character encoding that older languages like C support out of the box - one byte per character, all that good stuff. Most of you have probably heard of UTF-8 (and definitely used it - you're reading UTF-8-encoded text right now), but maybe not had to ever think about writing explicit support for it in your applications - many programming languages today like Python 3 offer built-in support for UTF-8 in their strings.
Going over the UTF-8 specification, it's fairly simple, but for those of you coming from ASCII, you now have to deal with the fact that you're not guaranteed that every character is going to be some fixed number of bytes; for instance, all ASCII characters have the same exact length and value in UTF-8, but all non-Latin characters take up more than one byte. In technical terms, UTF-8 is what is known as a variable-width character encoding.
This requires only a little bit of planning out on our end, though, since UTF-8 makes it pretty easy to figure out how large each character is. All we have to do is check the first four bits of the first byte. If we get all zeros, we've got a 1-byte ASCII character or the null character on our hands. If we get a 0xC, that means our character will be 2 bytes long, and for 0xE, 3 bytes. Currently Schego assumes that for any other value the character will be 4 bytes long, which will always be the case in properly-formed UTF-8 text, but could pose a problem when ensuring no invalid characters slipped through the cracks. Anyway, here's how Schego currently reads in UTF-8 text off the stack:
That's actually the entire current implementation of the opcode to push a string onto the stack. Not that bad, huh? Putting strings on the heap isn't too much extra work. All we really have to do is ensure there's enough space to store the string in the heap, store the length along with it, and maybe reallocate space if what we currently allocated isn't enough:
This is also a good time to introduce Schego's reference mnemonics into the picture, which are just 2 arbitrary bytes that can refer to some point in memory. You can think of them like variable names or pointers - the heap allocator decides what specific address in memory they point to, but you can always refer to that address with the name.
That about does it for strings. There's some test cases showing off the UTF-8 support, but you can go look at the code yourself if you're curious - they don't exactly tell you anything you don't already know at this point. So let's keep moving to linked lists.
Schego, like a good little Scheme implementation, provides bytecode-level support for linked lists. Under the hood, a single linked list node in Schego is pretty straightforward, being composed of just 3 different unsigned integers: The first stores the number of bytes allocated for the data held in the node, the next stores the address of the data itself, and the last integer stores the address of the next node in the list. This basically means we just pop 3 integers at a time under the hood inside the linked list-related opcodes, so as long as our underlying code is solid (implementation of one of the opcodes revealed a bug in the addi instruction that caused an extra byte to be prepended to every sum, ouch) and we always push/pop in the correct order, the resulting implementation is pretty boring. For instance, this is the implementation for what's known as car (Contents of Address Register - long story), or in other words, the function to retrieve the data stored at a linked list node and put it onto the stack:
Not that involved, right? Let's close out the article with the test case that shows how all the list opcodes are meant to be used together:
Pretty lengthy, I know, but only because of the tedium involved at writing out a linked list at the bytecode level. If you're already familiar with Scheme/LISP, or just functional programming in general, the general flow of the program should make a fair amount of sense. But that about does it for this installment. As usual, the code is on GitHub if you want to take a deeper look.
The first up, and the subsystem that everything else in this blog post depends on in one way or another, is the new heap allocator based on the buddy memory allocation technique. It's fairly venerable (I believe the Linux kernel uses a modified version for kmalloc) and is also pretty easy to implement, with Schego's implementation spanning a hair under 200 lines as of this time of writing.
I'm gonna skimp on justifying the need for a heap allocator since I feel it's kinda obvious (how many non-embedded programs have you written that just used the stack?), but I do think it would be a good idea to examine why I'd use the buddy allocator in particular. It's certainly not the only allocator on the block, but it's definitely the most popular one; when looking around for preexisting allocators to study, the buddy allocator was by far and away the most covered. I think the advantage of having one or more reference implementations is difficult to oversell; when you're unsure if you're "doing it right," having something beyond a white paper is pretty handy. You could argue that if you're about to write a memory allocator you should be able to implement it based solely on a white paper, but the whole point of this series is to teach, right? So let's do some teaching, especially since prose explanations of any memory allocation algorithm are basically non-existent at this point.
The buddy system at its core is pretty intuitive: The basic idea involves simply splitting up the heap into these stretches of memory called blocks that we then can place our variables into. Since we want to handle differently-sized objects, say, a single 8-bit value versus a 32-bit array of 8-bit values, we can have differently-sized blocks. Also, because referring to blocks by how many bits or bytes they take up is kind of unwieldy, let's use the idea of an order system to specify how large our blocks are. So at first, we have some base size that we'll define as order 0 (an order 0 block in Schego is currently 32 bytes, for instance), and then each order beyond that is twice as big as the one preceding it, or in other words, order n has a length of (order 0 length) * 2 ^ n. In Schego, the order calculation looks something like this right now:
func (h *VMHeap) OrderFor(requestedBytes uint64) uint8 { if(requestedBytes <= blockSize) { return 0 } // TODO: handle requests past maxOrder gracefully // this is ugly, but kinda fast, and less stupid than the old solution var order uint8 order = uint8(math.Log2(float64(requestedBytes)) - math.Log2(float64(blockSize))) return order }
By the way, the math in the OrderFor function is just a simplification of the order calculation I mentioned earlier so we can solve for n directly - don't worry that it looks a bit different.
Anyway, now we have a way to refer to chunks of memory, but how do we actually allocate objects? First, we figure out what the smallest order block would be capable of holding the requested number of bytes with OrderFor. If there's already a free block in the heap with the same size order, we're done - we simply return the address of that block. But what if only bigger blocks are available? This will have to happen at least once, too, as the heap starts its life as one giant block.
In that case, we have to subdivide one of the bigger blocks down the middle, and simply repeat the process on one of the halves until we have a block of the size they want. It requires a bit of bookkeeping in practice, but on paper it's fairly straightforward. As a nice bonus, we've created a bunch of other free blocks in the process that should fit the bill if the caller requests allocation of similarly-sized objects in the future, saving us a bit of work. Additionally, if only smaller blocks are available we've run out of memory and need to tell the caller that that's happened, so we only deal with if larger blocks are available. Schego implements the splitting process like so:
func (h *VMHeap) CreateBlock(order uint8) { // find smallest order that we can pull from freeOrder := order + 1 for { if h.NoFreeBlocksFor(freeOrder) { freeOrder += 1 } else { break } } // repeatedly split blocks until we get one (technically, two) of the order we originally wanted for freeOrder > order { blockAddress := h.GetFreeBlock(freeOrder) h.SplitBlock(blockAddress, freeOrder) freeOrder -= 1 } }
So that's allocation handled, but how do we deallocate objects? If you followed the block subdivision part of allocation, you might notice blocks of the same size get created in pairs. Sometimes one of those blocks then gets subdivided further, but this idea of pairs - or buddies - is where the buddy allocation system gets its name. After freeing up a single block (which is as easy as putting its address back on the list of free blocks), we check if its buddy exists and is free, too. If so, we merge the two blocks back together into a single larger block, and do the same buddy check/merge process until no buddy can be found. This keeps the heap from fragmenting and wasting space, and generally makes it such that there are only barely more blocks (each time we request a block, we're potentially creating 2 blocks with that size, remember?) than necessary to cover all objects in the heap. In Schego, the merge process is done like this:
func (h *VMHeap) MergeWithBuddy(address uint64, order uint8) { buddyAddress := h.GetBuddyAddress(address, order) // figure out which address is lower and delete the other block // take the lower address for the new merged block var newAddress uint64 if buddyAddress < address { newAddress = buddyAddress delete(h.blockMap, address) } else { newAddress = address delete(h.blockMap, buddyAddress) } buddyIndex := h.GetUnusedBlockIndex(buddyAddress, order) h.RemoveBlockFromUnused(buddyIndex, order) blockIndex := h.GetUnusedBlockIndex(address, order) h.RemoveBlockFromUnused(blockIndex, order) h.blockMap[newAddress] = order + 1 h.unusedBlocks[order+1] = append(h.unusedBlocks[order+1], newAddress) // recurse if we still have potential merging left undone if h.HasBuddy(newAddress, order+1) { h.MergeWithBuddy(newAddress, order+1) } }
After all that, our top-level Allocate() and Free() functions are as simple as this:
func (h *VMHeap) Allocate(numBytes uint64) uint64 { order := h.OrderFor(numBytes) if h.NoFreeBlocksFor(order) { h.CreateBlock(order) } blockAddress := h.GetFreeBlock(order) // GetFreeBlock always returns the first/0th free block, // so remove that one h.RemoveBlockFromUnused(0, order) return blockAddress } func (h *VMHeap) Free(address uint64) { order := h.blockMap[address] // add the newly freed block back to the list of unused blocks // MergeWithBuddy will take care of removing it if need be due to merging h.unusedBlocks[order] = append(h.unusedBlocks[order], address) if h.HasBuddy(address, order) { h.MergeWithBuddy(address, order) } }
The only real downside that I'm aware of with the buddy allocator is that there's a bit of fine-tuning involved with block sizes; if blocks are too big, there will be a lot of wasted space. Going back to Schego, every time an 8-byte integer is allocated, 24 bytes of heap memory are currently wasted. On the other hand, smaller block sizes mean more merging upon deallocation, which is a performance hit and time wasted. I haven't looked yet to see how heap allocators in other virtual machines handle this, but I plan on doing so shortly.
That was a lot to take in, I know, but once we have our heap built and tested, the most we'll have to interact with it is to use its Allocate() and Free() functions - the rest is all under the hood. So all that said, let's examine the next piece of the puzzle - strings.
Strings aren't a difficult concept. They're just a bunch of bytes in memory all lined up in a row that get treated as letters and stuff - most of you reading this now could've probably implemented basic string functionality in Schego given a heap allocator pretty easily. The R7RS Scheme spec throws a mild curveball at us, however, in that proper string support also involves supporting UTF-8 as well.
Pretty much all of you are familiar with the ASCII character encoding that older languages like C support out of the box - one byte per character, all that good stuff. Most of you have probably heard of UTF-8 (and definitely used it - you're reading UTF-8-encoded text right now), but maybe not had to ever think about writing explicit support for it in your applications - many programming languages today like Python 3 offer built-in support for UTF-8 in their strings.
Going over the UTF-8 specification, it's fairly simple, but for those of you coming from ASCII, you now have to deal with the fact that you're not guaranteed that every character is going to be some fixed number of bytes; for instance, all ASCII characters have the same exact length and value in UTF-8, but all non-Latin characters take up more than one byte. In technical terms, UTF-8 is what is known as a variable-width character encoding.
This requires only a little bit of planning out on our end, though, since UTF-8 makes it pretty easy to figure out how large each character is. All we have to do is check the first four bits of the first byte. If we get all zeros, we've got a 1-byte ASCII character or the null character on our hands. If we get a 0xC, that means our character will be 2 bytes long, and for 0xE, 3 bytes. Currently Schego assumes that for any other value the character will be 4 bytes long, which will always be the case in properly-formed UTF-8 text, but could pose a problem when ensuring no invalid characters slipped through the cracks. Anyway, here's how Schego currently reads in UTF-8 text off the stack:
utfBytes := make([]byte, 0) for { firstByte := v.ReadBytes(1)[0] if firstByte&0x80 == 0 || firstByte == 0 { // ASCII or null byte utfBytes = append(utfBytes, firstByte) // null? if firstByte == 0 { break } else { continue } } // with UTF-8, the most significant bits tell us // how many bytes to read, so construct some simple bitmasks // to check // see: https://en.wikipedia.org/wiki/UTF-8#Description // essentially, check the upper four bits of the first byte, // with 1111 meaning read 4 bytes total, 1110 3 bytes, and // 1100, 2 bytes var codepointLength int upperFourBits := firstByte >> 4 if upperFourBits == 0xC { codepointLength = 2 } else if upperFourBits == 0xE { codepointLength = 3 } else { // naively assume 4-byte codepoint length, // could be dangerous, should probably be replaced // at some point in the future codepointLength = 4 } // we've already read one byte (firstByte) numBytes := codepointLength - 1 extraBytes := v.ReadBytes(numBytes) utfRune := append([]byte{firstByte}, extraBytes...) // append the finished character utfBytes = append(utfBytes, utfRune...) } v.Stack.PushString(utfBytes)
That's actually the entire current implementation of the opcode to push a string onto the stack. Not that bad, huh? Putting strings on the heap isn't too much extra work. All we really have to do is ensure there's enough space to store the string in the heap, store the length along with it, and maybe reallocate space if what we currently allocated isn't enough:
mnemonic := string(v.ReadBytes(2)) address := v.mnemonicMap[mnemonic] strBytes := v.Stack.PopString() var strBuffer bytes.Buffer numBytes, _ := strBuffer.Write(strBytes) numBytes64 := uint64(numBytes) var allocatedBytes uint64 binary.Read(v.Heap.Read(8, address), binary.LittleEndian, &allocatedBytes) if numBytes64 > allocatedBytes { v.Heap.Free(address) newAddress := v.Heap.Allocate(8 + numBytes64) v.mnemonicMap[mnemonic] = newAddress intBuffer := bytes.NewBuffer(make([]byte, 0)) binary.Write(intBuffer, binary.LittleEndian, &numBytes64) v.Heap.Write(intBuffer, newAddress) // offset by 8 to avoid writing over the int we just wrote v.Heap.Write(&strBuffer, newAddress+8) } else { v.Heap.Write(&strBuffer, address+8) }
This is also a good time to introduce Schego's reference mnemonics into the picture, which are just 2 arbitrary bytes that can refer to some point in memory. You can think of them like variable names or pointers - the heap allocator decides what specific address in memory they point to, but you can always refer to that address with the name.
That about does it for strings. There's some test cases showing off the UTF-8 support, but you can go look at the code yourself if you're curious - they don't exactly tell you anything you don't already know at this point. So let's keep moving to linked lists.
Schego, like a good little Scheme implementation, provides bytecode-level support for linked lists. Under the hood, a single linked list node in Schego is pretty straightforward, being composed of just 3 different unsigned integers: The first stores the number of bytes allocated for the data held in the node, the next stores the address of the data itself, and the last integer stores the address of the next node in the list. This basically means we just pop 3 integers at a time under the hood inside the linked list-related opcodes, so as long as our underlying code is solid (implementation of one of the opcodes revealed a bug in the addi instruction that caused an extra byte to be prepended to every sum, ouch) and we always push/pop in the correct order, the resulting implementation is pretty boring. For instance, this is the implementation for what's known as car (Contents of Address Register - long story), or in other words, the function to retrieve the data stored at a linked list node and put it onto the stack:
cell := v.Stack.PopCell() numBytes := cell[0] dataAddress := cell[1] cellData := v.Heap.Read(numBytes, dataAddress).Bytes() for i := uint64(0); i < numBytes; i++ { v.Stack.PushByte(cellData[numBytes-i-1]) } // hack to get VMStack.lenLastPushed to play nicely with hcar // should probably be replaced with a dedicated method on // VMStack in the future (PushCellData?) v.Stack.lenLastPushed = numBytes
Not that involved, right? Let's close out the article with the test case that shows how all the list opcodes are meant to be used together:
func TestListSum(t *testing.T) { // create a list containing the values 1, 2, and 3, // and sum them up // mnemonics: // 0xBEEF - 1st list cell/list head // 0xDEAD - 2nd list cell // 0xFACE - 3rd list cell // 0xACED - sum counter opcodes := []byte{ 0x25, // hnewl 0xBE, 0xEF, // reference mnemonic - 0xBEEF 0x06, // cons 0x03, // pushi 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 1 0x4B, // hscar 0x0D, // hstorel 0xBE, 0xEF, // reference mnemonic - 0xBEEF 0x25, // hnewl 0xDE, 0xAD, // reference mnemonic - 0xDEAD 0x06, // cons 0x03, // pushi 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 2 0x4B, // hscar 0x0D, // hstorel 0xDE, 0xAD, // reference mnemonic - 0xDEAD 0x19, // hloadl 0xBE, 0xEF, // reference mnemonic - 0xBEEF 0x4D, // hscdr 0xDE, 0xAD, // reference mnemonic - 0xDEAD 0x0D, // hstorel 0xBE, 0xEF, // reference mnemonic - 0xBEEF 0x25, // hnewl 0xFA, 0xCE, // reference mnemonic - 0xFACE 0x06, // cons 0x03, // pushi 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 3 0x4B, // hscar 0x0D, // hstorel 0xFA, 0xCE, // reference mnemonic - 0xFACE 0x19, // hloadl 0xDE, 0xAD, // reference mnemonic - 0xDEAD 0x4D, // hscdr 0xFA, 0xCE, // reference mnemonic - 0xFACE 0x0D, // hstorel 0xDE, 0xAD, // reference mnemonic - 0xDEAD 0x22, // hnewi 0xAC, 0xED, // reference mnemonic - 0xACED // zero out our allocated memory, as we are not // currently guaranteed any new memory will be zeroed 0x03, // pushi 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0 0x0A, // hstorei 0xAC, 0xED, // reference mnemonic - 0xACED 0x19, // hloadl 0xBE, 0xEF, // reference mnemonic - 0xBEEF // use dup/hcdr to leave the list cells on the stack // in reverse sequential order (3-2-1 from top to bottom, and not 1-2-3) 0x07, // dup 0x49, // hcdr 0x07, // dup 0x49, // hcdr 0x47, // hcar 0x16, // hloadi 0xAC, 0xED, // reference mnemonic - 0xACED 0x36, // addi 0x0A, // hstorei 0xAC, 0xED, // reference mnemonic - 0xACED 0x47, // hcar 0x16, // hloadi 0xAC, 0xED, // reference mnemonic - 0xACED 0x36, // addi 0x0A, // hstorei 0xAC, 0xED, // reference mnemonic - 0xACED 0x47, // hcar 0x16, // hloadi 0xAC, 0xED, // reference mnemonic - 0xACED 0x36, // addi 0x43, // syscall 0x03, // print integer 0x03, // pushi 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0 0x43, // syscall 0x06, // exit } console := DummyConsole{} RunVM(opcodes, &console) if console.consoleOutput != "6" { t.Error("Incorrect output, got: ", console.consoleOutput) } }
Pretty lengthy, I know, but only because of the tedium involved at writing out a linked list at the bytecode level. If you're already familiar with Scheme/LISP, or just functional programming in general, the general flow of the program should make a fair amount of sense. But that about does it for this installment. As usual, the code is on GitHub if you want to take a deeper look.
Wednesday, April 25, 2018
Schego Part 3, Or Virtual Machines Aren't Really That Hard
Since the previous installment in the Schego saga, a lot's happened. So much, in fact, that I've split up the next update into 2 separate parts - the one you're reading now, and one that will follow very shortly. Life has a way of getting in the way; you get so absorbed with "important" stuff you forget the blog series you promised to update on a reasonably-sane basis. On the upside, those of you paying attention will notice that I've still been updating the actual codebase itself with some pretty exciting features, and I'll do my best to go over some of them with this entry, in which we'll be looking at stack-based virtual machines. I'm thinking most of you will be familiar with the concept of a virtual machine, but perhaps haven't coded one before, so I'm not going to delve into what a virtual machine is. You might not have heard the term "stack-based virtual machine," so I'll take a slight detour to cover that first.
Essentially, there are - at a very high level - two major types of virtual machines; stack-based and register-based. A stack-based virtual machine requires we yank any data we want to manipulate or analyze onto a stack the virtual machine makes available before doing anything with it. A register-based virtual machine, on the other hand, will allow you to directly store the results of any calculations directly into a register (hence the name) without any stack usage. I personally find stack-based VMs a bit easier to understand, and they're arguably more widespread; both the .NET CLR and the JVM are implemented as stack-based VMs, though as a notable exception Lua's VM is a register-based implementation. In any case, Schego's VM is a stack-based one.
With some terminology out of the way, how do we go about implementing a VM? They're actually fairly straightforward things - we grab the next instruction to execute and simply run a different function depending on the one we grab. Slightly over-simplified, we create a giant switch-case structure and switch on the instruction. The beginning of the VMState.Step() function in Schego, which executes a single opcode, actually starts like this:
You could also handle this as a hashtable if you wanted, with the keys being the instructions and the values being function pointers to their respective implementations, but Lua uses a switch-case structure for this part of its VM, and for me, that's a sign that I probably don't stand to gain enough from switching.
What does an instruction look like, though? In Schego, as in most other VMs, an instruction (or opcode) is just a single byte that signifies what the following bytes in memory should be taken as. For an opcode to push an integer onto the stack, for instance, we'd expect the following bytes in memory to be the integer to push, like we have here:
Not that hard to follow, right? Just grab the next 8 bytes in the program and treat them as a 64-bit integer to be pushed onto our stack for the rest of our program to manipulate. After that, we know the next byte should be treated as another opcode, and we can run our switch-case statement again. We can interact with the stack with a pop too, as in the case of comparing two integers:
And here's an instruction we might use after comparing two integers - jne (jump if not equal):
See how we're just simply reimplementing the same logic our physical CPU might be performing for us? That's all a virtual machine is, no black magic or anything else crazy. At least for now; things will get slightly hairy later on down the line. But the basic switch-case logic detailed above is something all VMs share.
Now, we've seen how individual opcodes are implemented, what might a full-blown program in our little language look like? Here's the very first test I wrote for Schego's VM - a small program that just pushes an ASCII string of "Hello, World!" onto the stack before printing it and exiting with status code 0:
The opcodes in the test are fairly decently annotated, so it should be pretty obvious what's going on. I've skipped some uninteresting stuff (like the specific implementation of Schego's stack) for the sake of keeping this article brief and interesting, and also some potentially interesting stuff (how do we push a string onto the stack?), since I intend on covering that in a later installment, especially considering how relatively technically dense the next article will be.
But you guys will have to wait just a little bit longer for that one. As always, though, the code is up on GitHub if you want to take a further look.
Essentially, there are - at a very high level - two major types of virtual machines; stack-based and register-based. A stack-based virtual machine requires we yank any data we want to manipulate or analyze onto a stack the virtual machine makes available before doing anything with it. A register-based virtual machine, on the other hand, will allow you to directly store the results of any calculations directly into a register (hence the name) without any stack usage. I personally find stack-based VMs a bit easier to understand, and they're arguably more widespread; both the .NET CLR and the JVM are implemented as stack-based VMs, though as a notable exception Lua's VM is a register-based implementation. In any case, Schego's VM is a stack-based one.
With some terminology out of the way, how do we go about implementing a VM? They're actually fairly straightforward things - we grab the next instruction to execute and simply run a different function depending on the one we grab. Slightly over-simplified, we create a giant switch-case structure and switch on the instruction. The beginning of the VMState.Step() function in Schego, which executes a single opcode, actually starts like this:
func (v *VMState) Step() { if v.CanStep() == false { // TODO: properly handle finished VM return } currentOpcode := v.NextOpcode() switch currentOpcode { // [opcode implementations here]
You could also handle this as a hashtable if you wanted, with the keys being the instructions and the values being function pointers to their respective implementations, but Lua uses a switch-case structure for this part of its VM, and for me, that's a sign that I probably don't stand to gain enough from switching.
What does an instruction look like, though? In Schego, as in most other VMs, an instruction (or opcode) is just a single byte that signifies what the following bytes in memory should be taken as. For an opcode to push an integer onto the stack, for instance, we'd expect the following bytes in memory to be the integer to push, like we have here:
case 0x03: // pushi // simply grab the next 8 bytes and push them intBytes := v.ReadBytes(8) v.Stack.PushInt(intBytes)
Not that hard to follow, right? Just grab the next 8 bytes in the program and treat them as a 64-bit integer to be pushed onto our stack for the rest of our program to manipulate. After that, we know the next byte should be treated as another opcode, and we can run our switch-case statement again. We can interact with the stack with a pop too, as in the case of comparing two integers:
case 0x40: // cmpi y := v.Stack.PopInt() x := v.Stack.PopInt() if x == y { v.Stack.PushByte(0) } else if x > y { v.Stack.PushByte(1) } else { v.Stack.PushByte(2) }
And here's an instruction we might use after comparing two integers - jne (jump if not equal):
func (v *VMState) jump() { addressBytes := v.ReadBytes(8) var address int64 binary.Read(bytes.NewBuffer(addressBytes), binary.LittleEndian, &address) v.opcodeBuffer.Seek(address, io.SeekCurrent) } //---------------------- case 0x2D: // jne cmpResult := v.Stack.PopByte() if cmpResult != 0 { v.jump() } else { // skip the jump address v.opcodeBuffer.Seek(8, io.SeekCurrent) }
See how we're just simply reimplementing the same logic our physical CPU might be performing for us? That's all a virtual machine is, no black magic or anything else crazy. At least for now; things will get slightly hairy later on down the line. But the basic switch-case logic detailed above is something all VMs share.
Now, we've seen how individual opcodes are implemented, what might a full-blown program in our little language look like? Here's the very first test I wrote for Schego's VM - a small program that just pushes an ASCII string of "Hello, World!" onto the stack before printing it and exiting with status code 0:
type DummyConsole struct { consoleOutput string } func (d *DummyConsole) Write(line string) { // trim null d.consoleOutput = strings.TrimRight(line, "\x00") } func TestHelloWorld(t *testing.T) { opcodes := []byte{ 0x05, // pushs 0x48, // H 0x65, // e 0x6C, // l 0x6C, // l 0x6F, // o 0x2C, // , 0x20, // space 0x57, // W 0x6F, // o 0x72, // r 0x6C, // l 0x64, // d 0x21, // ! 0x0A, // \n 0x00, // null 0x43, // syscall 0x05, // print string 0x03, // pushi 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0 0x43, // syscall 0x06, // exit } console := DummyConsole{} retcode := RunVM(opcodes, &console) if retcode != 0 { t.Error("Expected return code of 0, got:\n", retcode) } if console.consoleOutput != "Hello, World!\n" { t.Error("Incorrect output, got: ", console.consoleOutput) } }
The opcodes in the test are fairly decently annotated, so it should be pretty obvious what's going on. I've skipped some uninteresting stuff (like the specific implementation of Schego's stack) for the sake of keeping this article brief and interesting, and also some potentially interesting stuff (how do we push a string onto the stack?), since I intend on covering that in a later installment, especially considering how relatively technically dense the next article will be.
But you guys will have to wait just a little bit longer for that one. As always, though, the code is up on GitHub if you want to take a further look.
Monday, April 23, 2018
Sophomore year: a (rambling) retrospective
I wish there was a way to know you're in the good old days before you've actually left them.
"Gone in a flash" would be the phrase that first came to mind if you asked me to describe how my sophomore year of college went. Honestly, college in general has seemingly flown by; it doesn't seem like all that long ago I was a high school graduate, thinking of college as the "next big phase" of his life. And yet somehow, I'm on the tail end of that journey already. You've probably heard the old bit about how life seems to move faster as you get older - it certainly seems that way to me, at least.
The scenery is different compared to freshman year, if nothing else. I've traded the empty expanse of the dorm lobby for the living room of my apartment. I've got working internet without having to leave my own abode - though it was only a few hours prior to typing this that I got the wireless drivers working on the Debian installation on my Thinkpad. Unlike last time, I'm not sick, and I'm pretty grateful for that, considering finals week is looming. It's a lot quieter here in the living room than it was in the lobby, except for the unerring tick-tock of the wall clock mounted above the utility room door. The home theater setup me and my roommates collectively built sits against the wall opposite me - a gorgeous 4K TV and subwoofer in addition to a Blu-ray player and my trusty Playstation 2 and Playstation 4. I wish someone would've told me how little time for gaming I would have here, though.
The outside world isn't all that's changed. Skipping class has gone from a moral affront of the worst degree to a last resort if I can't keep myself awake during prior lectures. I've noticed myself becoming much more proactive and confident when it comes to getting involved in campus activities. I'm an undergraduate researcher now - not something I ever saw myself becoming during high school or even freshman year. I got a job offer from Cisco to come work with them in Silicon Valley over the summer. I've volunteered to a great degree to get the good word out about STEM, probably reaching as many grade school children as I did my entire time in high school. There's still room for improvement, though; I told myself I would get involved in hockey but have failed to do so yet, and I've been on the fence about a workout routine for some time now. Maybe putting all that into words will finally shame me into action, or so I hope.
Honestly, with the whirlwind that sophomore year has been, I wonder what I was doing at all during my freshman year. On paper, freshman year was pretty quiet for the most part; classes, some volunteering activities here and there, not much else. Ultimately, I just think I was simply unaware of how far I could truly push myself.
There's a balance to be found between work and pleasure. Some of my fondest memories from freshman year are of me doing inane things with friends, or lazing around doing nothing in particular. I remember a particular afternoon a copy of Watchmen caught my eye in the university bookstore, and I spent several subsequent afternoons reading it on the steps in front of the plinth at the center of campus. It's a really pretty spot, still one of my favorites on campus. The pool surrounding the plinth, the row of trees adorning the linear pools further on that you can just make out if you crane your neck from the top of the steps. It wasn't there my freshman year, but immediately behind where I once read now stands the largest indoor Starbucks in the state, so there's that too.
I wouldn't trade any of those memories for the world, but I wouldn't part with any of my newfound achievements, either; both have their place. I think it does one no good to be on top of the world, but have nobody to celebrate it with. Conversely, I would almost consider my life wasted if I spent it in a purely cavalier manner - there is far too much I want to do, to see, to contribute, to burn it all away on pleasure. I don't claim to know how much of both is right for me or anyone else, mind you; I wouldn't be surprised if I found myself dissatisfied with sophomore year come junior year.
Yet, to a certain extent, I welcome that sense of discontent. I take it to mean that I am holding myself to a higher standard than I did before. At the end of the day, how much more can you ask from someone aside from steady improvement? As long as you know where you stand and where you're headed, I don't think you can go wrong.
So here's to the latter half of undergraduate life, with even more joy and achievements to be had. I hope one day I'll be able to look back on sophomore year with dissatisfaction.
Subscribe to:
Posts (Atom)