Monday, April 23, 2018

Sophomore year: a (rambling) retrospective

I wish there was a way to know you're in the good old days before you've actually left them.

"Gone in a flash" would be the phrase that first came to mind if you asked me to describe how my sophomore year of college went. Honestly, college in general has seemingly flown by; it doesn't seem like all that long ago I was a high school graduate, thinking of college as the "next big phase" of his life. And yet somehow, I'm on the tail end of that journey already. You've probably heard the old bit about how life seems to move faster as you get older - it certainly seems that way to me, at least.

The scenery is different compared to freshman year, if nothing else. I've traded the empty expanse of the dorm lobby for the living room of my apartment. I've got working internet without having to leave my own abode - though it was only a few hours prior to typing this that I got the wireless drivers working on the Debian installation on my Thinkpad. Unlike last time, I'm not sick, and I'm pretty grateful for that, considering finals week is looming. It's a lot quieter here in the living room than it was in the lobby, except for the unerring tick-tock of the wall clock mounted above the utility room door. The home theater setup me and my roommates collectively built sits against the wall opposite me - a gorgeous 4K TV and subwoofer in addition to a Blu-ray player and my trusty Playstation 2 and Playstation 4. I wish someone would've told me how little time for gaming I would have here, though.

The outside world isn't all that's changed. Skipping class has gone from a moral affront of the worst degree to a last resort if I can't keep myself awake during prior lectures. I've noticed myself becoming much more proactive and confident when it comes to getting involved in campus activities. I'm an undergraduate researcher now - not something I ever saw myself becoming during high school or even freshman year. I got a job offer from Cisco to come work with them in Silicon Valley over the summer. I've volunteered to a great degree to get the good word out about STEM, probably reaching as many grade school children as I did my entire time in high school. There's still room for improvement, though; I told myself I would get involved in hockey but have failed to do so yet, and I've been on the fence about a workout routine for some time now. Maybe putting all that into words will finally shame me into action, or so I hope.

Honestly, with the whirlwind that sophomore year has been, I wonder what I was doing at all during my freshman year. On paper, freshman year was pretty quiet for the most part; classes, some volunteering activities here and there, not much else. Ultimately, I just think I was simply unaware of how far I could truly push myself.

There's a balance to be found between work and pleasure. Some of my fondest memories from freshman year are of me doing inane things with friends, or lazing around doing nothing in particular. I remember a particular afternoon a copy of Watchmen caught my eye in the university bookstore, and I spent several subsequent afternoons reading it on the steps in front of the plinth at the center of campus. It's a really pretty spot, still one of my favorites on campus. The pool surrounding the plinth, the row of trees adorning the linear pools further on that you can just make out if you crane your neck from the top of the steps. It wasn't there my freshman year, but immediately behind where I once read now stands the largest indoor Starbucks in the state, so there's that too.

I wouldn't trade any of those memories for the world, but I wouldn't part with any of my newfound achievements, either; both have their place. I think it does one no good to be on top of the world, but have nobody to celebrate it with. Conversely, I would almost consider my life wasted if I spent it in a purely cavalier manner - there is far too much I want to do, to see, to contribute, to burn it all away on pleasure. I don't claim to know how much of both is right for me or anyone else, mind you; I wouldn't be surprised if I found myself dissatisfied with sophomore year come junior year.

Yet, to a certain extent, I welcome that sense of discontent. I take it to mean that I am holding myself to a higher standard than I did before. At the end of the day, how much more can you ask from someone aside from steady improvement? As long as you know where you stand and where you're headed, I don't think you can go wrong.

So here's to the latter half of undergraduate life, with even more joy and achievements to be had. I hope one day I'll be able to look back on sophomore year with dissatisfaction.

Sunday, October 1, 2017

Schego Part 2, Or On Parsing and Other Related Stuff

Alright, part 2. And only a couple weeks later than planned to boot!

Last time, we discussed the basics of lexing - turning a string into a list of tokens - and the ins and outs of how that's done. So now, we'll move onto parsing, which is the next step in the process of turning our original source code into something the computer can understand.
The rationale behind the need for the parsing process is very simple, and mirrors that of the need for lexing: It's a lot easier to reason about generating code if you're working at the statement/expression level instead of at the token level. Think about it. If you were trying to write a program using a description of what that program is supposed to do, wouldn't you rather the description say "put an if statement here" instead of "stick a semicolon here and a number there"?
Additionally - and this is probably at least two future blog posts away - it's a lot easier to perform optimizations when thinking at the statement level as well. If you have a general idea of what the original code is attempting to do, you can potentially replace it with something faster. We'll need other data structures besides what our parsing process implements, but we'll cross that bridge when we get to it.
So, how is Schego's parser built? At a high level, much the same way as any parser: We represent the program's structure with a tree, known as an AST (Abstract Syntax Tree), with each individual node adhering to this interface:

// base interface for functions needing to accept any kind of AST node
type AstNode interface {
 GetSubNodes() []AstNode
 GetType() AstNodeType
 DebugString() string

If you recall the first article, I didn't create any interfaces there, so why the change? There will be functions - as we'll see in a moment - that should ideally work with/return any potential combination of AST nodes; when you're parsing a language where literally everything evaluates to some kind of expression, you need flexibility without repeating yourself. Due to Go not having partially abstract classes, I had to split the definition and implementation into two separate places, with the definition being in AstNode and the baseline implementation in another struct, SExp - short for S-Expression, which is what the structure of LISP/Scheme is based off of:

// base struct that all AST node implementations build off of
type SExp struct {
 subNodes []AstNode

func (s *SExp) GetSubNodes() []AstNode {
 return s.subNodes
func (s *SExp) AddSubNode(node AstNode) {
 s.subNodes = append(s.subNodes, node)

Still pretty straightforward, right? Specific nodes implement GetType() and DebugString() to return their proper values, and the basic stuff that should only be written once gets written once. By the way, GetType() returns a value from an enum describing what sort of node the struct is. This isn't normally something we'll need, but for certain optimizations, including the tail-call optimization technique required for all Scheme implementations, it'll be handy to see if some node has other sub-nodes with a certain type.
For the actual implementation of one of the nodes, here's the current implementation of if expressions:

type IfExp struct {

func NewIfExp(cond AstNode, onTrue AstNode, onFalse AstNode) *IfExp {
 node := new(IfExp)
 return node
func (i IfExp) GetType() AstNodeType {
 return IfNode
func (i IfExp) DebugString() string {
 return "IfExp(" + i.subNodes[0].DebugString() + ", " + i.subNodes[1].DebugString() + ", " + i.subNodes[2].DebugString() + ")"

Nothing too out of the ordinary here. But what about literals?

type StringLiteral struct {
 Value string

func NewStringLiteral(value string) *StringLiteral {
 node := new(StringLiteral)
 node.Value = value
 return node
func (s StringLiteral) GetType() AstNodeType {
 return StringNode
func (s StringLiteral) DebugString() string {
 return "\"" + s.Value + "\""

You might think that there's really no need for a string literal - or any other sort of literal, really - to have the ability to contain sub-nodes, which is an ability brought in by the SExp struct. And you'd be right; this is something of a leaky abstraction we've got going on here. For now, I think it's fine. Any code that will actually be messing with the AST after the fact should be calling GetType() in the first place, and due to the way parsing works in Schego, nothing else except post-AST-construction stuff will be modifying the literal except the exact same code that creates the literal from the original tokens in the first place. I'll also admit that I don't know for sure if what I describe here is an idiomatic Go solution to the problem, but it seemed fairly intuitive to me. If there's something more elegant out there, do let me know - I'm all in favor of doing no more work than necessary!
On a side note, structuring our data like this lets us write our tests fairly easily. All we have to do is compare the AST our parser made with one made ourselves in the test, like this:

func TestParseSingleExp(t *testing.T) {
 tokens := LexExp("(+ 5 3)")
 program := ParseTokens(tokens)
 expectedProgram := NewProgram(NewAddExp(NewIntLiteral(5), NewIntLiteral(3)))
 checkProgram(program, expectedProgram, t)

But this article was about parsing, right? How do we actually do the parsing bit? We've covered the data structures behind the process, but haven't detailed the actual parsing algorithm itself. Schego's parsing algorithm is what is known as a "recursive descent" parser; essentially, we define our "grammar" - our set of rules that describes our language - recursively. Take the if expression, for instance. We might say it's defined by the word "if," followed by a conditional expression, followed by two more expressions that will be evaluated if the conditional evaluates to true or false.
What do we define as an expression? It could be a lot of different things - a literal, a math formula, a function, another if expression (!), lots of stuff. What's important is that when we're constructing an if expression, we ask the parser to run its general expression-parsing function for the if's expressions. As long as we handle the base cases (literals/identifiers) properly, we can throw all sorts of wonky expressions at our if-parsing routine and it'll handle it just fine due to the recursion.
Additionally, the resultant code is fairly reusable inside the parser itself. If we define a function to parse, say, a list literal, we can simply call that function any time we expect a list literal, and it handles all the nitty-gritty details of validation and list-literal-specific parsing for us. Pretty much a win-win no matter how you look at it. But enough justification. What does Schego's parsing code look like?

// ParseTokens takes tokens and returns an AST (Abstract Syntax Tree) representation
func ParseTokens(tokens []*Token) *Program {
 program := NewProgram()
 currentIndex := 0
 for len(tokens)-1 >= currentIndex {
  node, _ := parseExpression(tokens, &currentIndex)
 return program

// accept checks to see if the current token matches a given token type, and advances if so
func accept(tokens []*Token, expectedType TokenType, currentIndex *int) bool {
 if tokens[*currentIndex].Type == expectedType {
  return true
 return false

// grabAccepted returns the token just before current, useful for grabbing the value of an accepted token
func grabAccepted(tokens []*Token, currentIndex *int) *Token {
 return tokens[*currentIndex-1]

// expect returns an error if the current token doesn't match the given type
func expect(tokens []*Token, expectedType TokenType, currentIndex *int) error {
 if len(tokens)-1 < *currentIndex {
  return errors.New("Unexpected EOF")
 } else if tokens[*currentIndex].Type != expectedType {
  return errors.New("Unexpected token")
 return nil

func parseExpression(tokens []*Token, currentIndex *int) (AstNode, error) {
 // try literals first
 if accept(tokens, TokenIntLiteral, currentIndex) {
  literal := grabAccepted(tokens, currentIndex)
  return NewIntLiteral(bufferToInt(literal.Value)), nil
 } else if accept(tokens, TokenFloatLiteral, currentIndex) {
  literal := grabAccepted(tokens, currentIndex)
  return NewFloatLiteral(bufferToFloat(literal.Value)), nil
 } else if accept(tokens, TokenStringLiteral, currentIndex) {
  literal := grabAccepted(tokens, currentIndex)
  return NewStringLiteral(literal.Value.String()), nil
 } else if accept(tokens, TokenBoolLiteral, currentIndex) {
  literal := grabAccepted(tokens, currentIndex)
  return NewBoolLiteral(literal.Value.Bytes()[0] == 1), nil
 // not a literal, attempt to parse an expression
 lparenError := expect(tokens, TokenLParen, currentIndex)
 if lparenError != nil {
  return nil, lparenError
 // jump past the lparen
 if accept(tokens, TokenOp, currentIndex) {
  // operator-parsing code here
 if accept(tokens, TokenIdent, currentIndex) {
  identToken := grabAccepted(tokens, currentIndex)
  switch identToken.Value.String() {
  case "if":
   // TODO: error-handling here (and throughout the parser!)
   cond, _ := parseExpression(tokens, currentIndex)
   ifTrue, _ := parseExpression(tokens, currentIndex)
   ifFalse, _ := parseExpression(tokens, currentIndex)
   ifNode := NewIfExp(cond, ifTrue, ifFalse)
   expError := closeExp(tokens, currentIndex)
   if expError != nil {
    return nil, expError
   return ifNode, nil
 // no matches?
 return nil, errors.New("Unexpected token")

// convenience function to ensure an expression is properly closed
func closeExp(tokens []*Token, currentIndex *int) error {
 rparenError := expect(tokens, TokenRParen, currentIndex)
 if rparenError != nil {
  return rparenError
 *currentIndex += 1
 return nil

func bufferToInt(buffer bytes.Buffer) int64 {
 num, _ := binary.Varint(buffer.Bytes())
 return num
func bufferToFloat(buffer bytes.Buffer) float64 {
 bits := binary.LittleEndian.Uint64(buffer.Bytes())
 return math.Float64frombits(bits)

I've gutted some case-specific stuff like the details of parsing operators in an attempt to get the point across as succinctly as possible, but the overarching algorithm is still there. You can see the recursive calls to parseExpression present in the if-parsing code, and the literal-parsing code that forms the base cases. This basic setup is enough to get this tricky-looking test to pass:

func TestIfExp(t *testing.T) {
 tokens := LexExp("(if (> 6 5) \"true\" \"false\")")
 program := ParseTokens(tokens)
 expectedProgram := NewProgram(NewIfExp(NewGtExp(NewIntLiteral(6), NewIntLiteral(5)), NewStringLiteral("true"), NewStringLiteral("false")))
 checkProgram(program, expectedProgram, t)

All the if-parsing code knows is that an if node takes three expressions. It doesn't know anything about what constitutes a valid expression, or how to parse one; it leaves that up to the parseExpression() function, and it all Just Works.

As always, the code is up on GitHub if you want to take a look. We'll be looking at stack-based virtual machines next time, so we can actually do something with the AST we've generated. As I alluded to earlier, at some point in the (near? Distant?) future we'll also take a look at optimization, which is something I haven't really found in any similar how-to-make-a-language series. So that'll be something to look forward to.

Thursday, August 24, 2017

Schego Part 1, or Why Implementing Scheme in Go Isn't All That Crazy

So for anyone who has been following me on Github closely, this article is a week or two late - I want to apologize for that firstly. I've been busy with getting back up to speed with college and such. But on to the main event.

So, you may be wondering, what exactly is a Schego? It's my current baby, my main pet project, the first on the list of bullet points that I'd point to when people ask about what I've done. It's an R7RS Scheme implementation written in the Go programming language, with some nifty technical features planned for it down the road.

I imagine the first thing you'd ask is, why implement yet another Scheme interpreter, let alone one in Go of all languages? The answer is pretty simple and self-centered - I've been planning on learning Scheme since at least 2015, and I've taken an interest in Go as of late. Implementing a language won't familiarize you with its 3rd-party libraries, but it will give you a reasonably in-depth look at the way the language's syntax is built, and its various ins and outs. Even for a minimalist language such as Scheme, building an interpreter is no small undertaking, so the project stands to give me some reasonable experience in Go as well. Additionally, most of the relevant techniques and materials required to undertake such a project are usually senior-level classes, so it would look fairly good on a sophomore's resume, at least if I did it well.

As far as how to differentiate it from the other Scheme implementations out there, I have two main features in mind: The first is R7RS compatibility. This is the newest version of the official Scheme specification, ratified only about 4 years ago as of this time of writing. Only about 7-8 or so other Scheme interpreters currently implement the full R7RS spec to my knowledge, so in the Scheme world, that's reasonably rare air.

The second feature is a much more notable one, even though it arguably requires less work: Software transactional memory, or STM for short. It's a language implementation-level replacement for the lock-based multithreading paradigm that's usually implemented at the OS-level, unless we're talking about lightweight threads that are often cooperatively scheduled.  As the name might imply, it works very similarly to the transactions you might find operating in a database or at a bank: Critical stretches of code are marked as atomic, and are run without actually committing any of their memory writes to RAM. At the end of the block, the first atomic block to write to a shared space of memory "wins," and its changes are committed. The rest are "rolled back" and made to run again. There's a little bit more to it than that, but that's the gist of it, and what will eventually end up in Schego. Unless someone with more time and/or motivation sees this and decides to one-up me, I think I'll have the first ever STM-capable Scheme implementation, too. So that's nice.

On top of doing all this, I've planned on writing a blog post every once in a while detailing the design decisions I've made in Schego, both as a documentation/teaching tool and as a justification method - if I can't defend it in writing, it needs to be reworked. At the time of this post, the only major Schego subsystem that's currently built is the lexer, and it's almost (but not quite) finished. For the uninitiated, a lexer is a system that takes in a stream of characters and turns it into a list of tokens, which are simply small bits of data that other code later on can use. For instance, a function or variable name would be a token, a mathematical operator would be a token, and so on and so forth. For the code that needs to figure out how to actually run our original Scheme program, it's much easier to think in terms of tokens than raw characters. So let's get on with that lexer, shall we?

I think usability is important. It's better to have software that's easy to use for the end-user and tough to implement compared to the other way around. Even if your end-user is another programmer, like if you were developing a library for instance, or even if your end-user is yourself, usability is still important. There's no easier way to create good library-level usability than to write software that has an interface that you'd want to use yourself, which in turn is done by writing the unit tests for your code before any underlying code has been written. Test-driven development has some other advantages (such as having reasonable code coverage if you genuinely do write all your tests before writing the code that makes them succeed), but I'm mostly concerned with this one facet. So how would I want a lexer to give me tokens back?
// test lexing a single s-expression
func TestLexSingleExp(t *testing.T) {
 tokens := LexExp("(abc def ghi)")
 expectedTokens := []*Token{
  NewTokenString(TokenLParen, "("),
  NewTokenString(TokenIdent, "abc"),
  NewTokenString(TokenIdent, "def"),
  NewTokenString(TokenIdent, "ghi"),
  NewTokenString(TokenRParen, ")")}
 checkTokens(tokens, expectedTokens, t)
This is what I came up with - the first unit test I wrote for Schego. It's really very simplistic - a LexExp function that takes a string to be lexed for tokens, and then returns a list of Token pointers containing their type and value. A single Token looks like this, by the way:
type TokenType int

const (
 TokenNone TokenType = iota

type Token struct {
 Type  TokenType
 Value bytes.Buffer
Pretty straightforward, right? I chose a bytebuffer to hold the value for the token since, to the best of my knowledge, there is no easier method for storing a generic bunch of bytes of uncertain value in Go. It could be my inexperience with the language showing, but this is the best method I've found.
So how does LexExp work?
// LexExp lexes an input string into Token objects. There are no possible user-facing
// errors from this process.
func LexExp(input string) []*Token {
 var tokens []*Token
 // accumulation variables for multi-character tokens such as idents and literals
 accumulating := false
 var accumulatingType TokenType
 var accumulatorBuffer bytes.Buffer
 for index, glyphRune := range input {
  glyph := string(glyphRune)
  if glyph == "(" {
   if accumulating == true {
    flushAccumulator(&accumulatingType, &accumulatorBuffer, &tokens)
    accumulating = false
   tokens = append(tokens, NewTokenString(TokenLParen, glyph))
   // rparen
  } else if glyph == ")" {
   if accumulating == true {
    flushAccumulator(&accumulatingType, &accumulatorBuffer, &tokens)
    accumulating = false
   tokens = append(tokens, NewTokenString(TokenRParen, glyph))
   // ident
  } else if unicode.IsLetter(glyphRune) {
   // were we building a number literal beforehand?
   if accumulating == true && accumulatingType != TokenIdent {
    flushAccumulator(&accumulatingType, &accumulatorBuffer, &tokens)
   accumulating = true
   accumulatingType = TokenIdent
   // we're not sure what this character is, let the parser deal with it
  } else {
   tokens = append(tokens, NewTokenString(TokenChar, glyph))
 // corner case if the input string while we're still accumulating
 // should never happen in proper Scheme, but still...
 if accumulating == true {
  tokens = append(tokens, NewTokenRaw(accumulatingType, accumulatorBuffer))
  accumulating = false
 return tokens
If this isn't completely valid Go code, please forgive me. This is a rough recreation of what the original function looked like after implementing the first test case. Let's go over its internals all the same, though. You'll notice that an extended if/else is used instead of a switch block. Especially for those of you who read Lua's source code (which is a delightful read by the way - it's a very clear and well-constructed codebase), you might be wondering why I went with an if/else and not a switch. The main reason I did so is that R7RS is a little more complicated to lex - not parse! - compared to Lua, in my opinion. There's stuff like Unicode identifiers that really cannot be easily handled in a switch statement. I will admit that, similar to how Lua does things, a switch/case block with a default case that takes care of other situations that the switch could not would be worth a look, though; it's very possible I'll change LexExp over at a later date.

For multi-character tokens - identifiers, numerical/string literals, and so on - we have an accumulator buffer. Essentially, if we recognize that we've started lexing what could end up being a multi-character token, then we write it in the accumulator buffer instead of creating a token immediately. We then go on about our merry way accumulating (hence the name) characters in the accumulator buffer up until we find a character that cannot belong to a multi-character token, like a parenthesis, when that happens, we "flush" the accumulator and add a new token to the token list using the accumulator buffer's contents as a value before continuing to lex our not-multi-char token.
Lua does things a little differently. If it encounters a character that's (potentially) the start of a multi-character token, it lets each individual lexing function take care of proceeding further on down the line until it finds a suitable place to stop. For example, this is how Lua 5.3 handles numbers:
** this function is quite liberal in what it accepts, as 'luaO_str2num'
** will reject ill-formed numerals.
static int read_numeral (LexState *ls, SemInfo *seminfo) {
  TValue obj;
  const char *expo = "Ee";
  int first = ls->current;
  if (first == '0' && check_next2(ls, "xX"))  /* hexadecimal? */
    expo = "Pp";
  for (;;) {
    if (check_next2(ls, expo))  /* exponent part? */
      check_next2(ls, "-+");  /* optional exponent sign */
    if (lisxdigit(ls->current))
    else if (ls->current == '.')
    else break;
  save(ls, '\0');
  if (luaO_str2num(luaZ_buffer(ls->buff), &obj) == 0)  /* format error? */
    lexerror(ls, "malformed number", TK_FLT);
  if (ttisinteger(&obj)) {
    seminfo->i = ivalue(&obj);
    return TK_INT;
  else {
    seminfo->r = fltvalue(&obj);
    return TK_FLT;
See the loop there? For comparison, this is what Schego's number-lexing code looks like, which admittedly doesn't handle things like exponents:
         } else if glyph == "." {
   // . is a valid character in an ident - add it to the accumulator
   // if we were building an ident
   if accumulating == true && accumulatingType == TokenIdent {
    // we can't start an ident with . - are we building a floating point literal?
   } else if chr := peek(input, index); !unicode.IsSpace(chr) && unicode.IsNumber(chr) {
    accumulating = true
    accumulatingType = TokenFloatLiteral
    // there's situations where a standalone . is valid
   } else {
    tokens = append(tokens, NewTokenString(TokenDot, glyph))
 // number literal
  } else if unicode.IsNumber(glyphRune) {
   if accumulating == true && accumulatingType == TokenIdent {
    flushAccumulator(&accumulatingType, &accumulatorBuffer, &tokens)
   accumulating = true
   // only declare that we are accumulating an int if we didn't see a . already
   if accumulatingType != TokenFloatLiteral {
    accumulatingType = TokenIntLiteral
   // we're not sure what this character is, let the parser deal with it
Something you'll notice comparing the two - or something you might've figured out from reading the leading comment on LexExp - is that I intend on not having any errors occur during the lexing process. I think malformed code is easier to catch during parsing, and that the resulting error messages are still informative enough such that the culprit has enough info to to correct their mistake. Is this actually the case? This is simply me theorycrafting, so we'll have to wait and see.
But getting back to numbers and lexing numbers, personally I feel my method leads to less overall boilerplate, though you do have to check the accumulator essentially every time you grab a new token. You also need functions to manipulate the accumulator, which look like this:
// bufferStringToNum takes an input buffer and converts it from a string of
// character bytes to a float/int
func bufferStringToNum(tokenType TokenType, inputBuffer bytes.Buffer) *bytes.Buffer {
 bufferString := inputBuffer.String()
 var byteBuffer [binary.MaxVarintLen64]byte
 if tokenType == TokenFloatLiteral {
  num, _ := strconv.ParseFloat(bufferString, 64)
  binary.LittleEndian.PutUint64(byteBuffer[:], math.Float64bits(num))
 } else {
  num, _ := strconv.ParseInt(bufferString, 10, 64)
  binary.PutVarint(byteBuffer[:], num)
 returnBuffer := bytes.NewBuffer(byteBuffer[:])
 return returnBuffer

// flushAccumulator empties the contents of the given Buffer into a new Token
// and resets it and the accumulator token type. A convenience function for LexExp.
func flushAccumulator(
 accumulatorType *TokenType,
 accumulatorBuffer *bytes.Buffer,
 tokenBuffer *[]*Token) {
 if *accumulatorType == TokenFloatLiteral || *accumulatorType == TokenIntLiteral {
  convertedBuffer := bufferStringToNum(*accumulatorType, *accumulatorBuffer)
  *tokenBuffer = append(*tokenBuffer, NewTokenRaw(*accumulatorType, *convertedBuffer))
 } else {
  *tokenBuffer = append(*tokenBuffer, NewTokenRaw(*accumulatorType, *accumulatorBuffer))
 *accumulatorType = TokenNone
Not that bad though, right? Most of this code will need to be written only the one time; the checks for the accumulator are the only stretches that have to be done repeatedly throughout LexExp.

That's all for this first installment. The actual lexer is a bit longer to handle stuff like operators and other things, but the basic accumulator-based framework is still there. If you want to see what the lexer currently looks like, the code is up on GitHub if you want to look further. The complete lexer test suite can also be found there. Next time we'll be taking a look at parsing and a little bit of parser theory!

Tuesday, August 23, 2016

An end, and a beginning

How lucky I am to have something that makes saying goodbye so hard.
If you've been paying attention to my blog, I don't delve into personal stuff often. Or at all, really. This is a reasonably big time in my life, so I thought I'd break with "tradition" - much as a barely-active 5-year old blog can have, at least - and talk a little bit about myself.

For starters, I'm now a student of Computer Science at the University of Texas at Dallas. As I type this, the second day of class has come to an end, and I'm sitting in an empty residence hall lobby late at night; the only sounds I can hear are the air conditioning, my fingers on the keyboard, and the celebratory welcome banners brushing against the walls in time with the A/C. I'm still suffering from the stubborn vestiges of a cold I came down with over the past weekend that also took out several of my newfound friends and colleagues. Additionally, the internet has been out in my dorm room since I've moved in, so I've had to use the lobby Wifi for email and the like instead. Regardless, college life has been fun so far. Plenty of friends to hang out with, lots of campus events to go to, friendly faculty to network with - you get the idea. Contrary to popular opinion, there's more to life than what's on the internet. I've barely noticed I don't have a dorm connection except when I have to do something I can't do from my phone.

From a high school student's limited perspective, college is an interesting place. You're given enough freedom that you're basically in a loosely-padded training environment for being an adult. You're given opportunities to seriously succeed, and to seriously fail, both by your own hands. You're given the freedom to stay up at 2AM to go to Walmart to get milk, or to sit in a lobby blogging, if you want. You're free to make your own choices, and live with the consequences; there's relatively few instances where there is someone looking over your shoulder, telling you what you should and shouldn't do. For someone who has potentially lived their entire life with someone else at the wheel, that kind of freedom can be a very scary feeling, and a daunting adjustment. But enough about that.

For someone who lived at home with family their whole life - like me, for example - being away from family and old friends can be a bitter pill to swallow. For almost 2 decades, I grew used to being able to see parents and siblings when I got up, to them being there when I went to bed. I grew used to having conversational partners in my siblings that I could engage whenever I wanted, who held interests similar to mine. Being away from people that you've built such powerful connections with can be a difficult ordeal, indeed. But, it's something you have to learn to cope with. You call friends and family as often as you can, and visit when possible. You find new friends that you can trust and open up to. Maybe even discover that you have family members living nearby and visit them.
Eventually, you might lose these people too. If I get a Masters out of state, and decide not to move back to the Dallas/Richardson area, this whole process of loss will repeat. And I'll start from square 0, again. I wouldn't blame anyone in my shoes for not feeling a little depressed over the whole situation, for wondering whether or not it was worth it to build all those precious connections in the first place, only to painfully tear them all down again.
However, at the time, I know my feelings were real. I know that the memories of elated moments with friends and family were genuine, were not fake; that they were caused by me being sincerely happy to be with them in that instant. And those memories - those fleeting, ephemeral instants in time - stick with you forever, regardless of how long your friendships last. And nobody can take them away from you. I don't want to stop making them because I might be afraid the people involved might eventually leave me. It's a common occurrence, with an element of certainty to it: People drift away over time, and finally, die. But until the day that you yourself leave this earth, those memories will faithfully serve as a living testament that the people you cherished so dearly ever existed. And I think that's a beautiful thing.

So here's to making many more memories, with the many people I've met and will meet. To my parents, thank you. To my dear old group of friends, goodbye, I wish you all the best. And to all my fellow incoming freshmen: Congratulations.
Life is a continuous cycle of losses, you see. But in order to lose, you must first gain. And the greater the joy at your gain, the greater the sorrow will be at your loss. But as long as we have the memories of those happy times - indeed, because we have those memories - people are able to go on. 

Thursday, May 19, 2016

FuzzBuzz and the Art of Programming

Programming: It's as much an art as it is a science. The general distinction between art and science being the realm of subjectivity - in other words, while in the realm of science, "correctness" is still debated, nobody is under the illusion that there is more than one "correct" solution. Art, on the other hand, has no single "correct solution," though at the same time, the merits of each solution are generally weighed against each other solutions, debated, and so on.

Programming is a mix of both: There are certainly "solutions" - algorithms, programs, etc. - that can be objectively proven without a shadow of a doubt to be ineffective or outright incorrect. There is also the possibility of an algorithm that solves the solution, but is objectively inferior to another, such as an algorithm that runs in O(n) time versus one that runs in O(1) time. On the other hand, there is a fair amount of subjective nuance involved in arriving to a correct solution. Design methodologies, languages, and APIs all come and go in a technological blink of an eye, and each will have its pros and cons, its proponents and its skeptics.

Nobody - sane, at least - involved in all this is under the impression that there is One True Way to do things. You could lock several seasoned programmers up in separate rooms, all of whom subscribe to the exact same design methodologies and use the exact same programming languages, ask them to design a program, and they would all likely emerge with completely separate, and likely equally or near-equally satisfactory, solutions. So, without further ado, let's talk about a simple programming problem, and some possible solutions to solve it. At the end, we'll discuss for a quick minute about which of these solutions someone might actually end up using.

The problem in question is one I've named FuzzBuzz, which was definitely not a reference to an overused programming interview question. To solve FuzzBuzz, one must write a program that considers the numbers from 1 to 100, inclusive. For each number, if the number in question is even, the program must print "Fuzz." If the number is odd, the program must print "Buzz." If the number is a multiple of 3, the program should ignore its even or odd nature and print "FuzzBuzz" instead.

This is an intentionally simplified problem, both to make it easy to make multiple solutions, and to make the solutions themselves clearer. By the time you finish reading this sentence, I bet you'll probably have thought of at least one way to solve this problem, but let me go over a couple first.

First up is probably the one that came to everyone else's mind, just a simple restating of the problem's definition in code.

for x in xrange(1, 101): # range/xrange stops at the integer before the 2nd argument
    if x % 3 == 0:
        print "FuzzBuzz"
    elif x % 2 == 0:
        print "Fuzz"
        print "Buzz"

Not really a whole lot to talk about with this one; like I said earlier, it's probably the first one you thought up yourself - it was the first one I came up with, as well. Honestly, if this were a real-world problem, most people would (and probably should) stop here. But innovation never sprung from a contented mind, right?

So, for our next solution, let's limit ourselves in some way. The modulo operator is pretty central to the prior solution, so let's take that out next.

def is_multiple_of(number, mul):
    """Return True if number is multiple of mul, false otherwise.
    Okay, so here's how this works: Normally, Python - and most
    other programming languages - chops off the decimal part of the quotient
    if both terms are integers, i.e, integer division is performed. However,
    if one of the terms is a float, then the decimal part will be kept.
    Any number plus a decimal part is going to be greater than simply the number itself,
    so we compare the two. If number is in fact a multple of mul, then there will be
    no decimal part, and both sides of the expression will be equal."""
    if float(number) / mul == number / mul:
        return True
        return False

def multiple_of_three(number):
    return is_multiple_of(number, 3)

def multiple_of_two(number):
    return is_multiple_of(number, 2)
for x in xrange(1, 101):
    if multiple_of_three(x):
        print "FuzzBuzz"
    elif multiple_of_two(x):
        print "Fuzz"
        print "Buzz"

Still pretty straightforward. Not having access to the modulo operator means we had to basically reinvent one ourselves, though. But we could go further still, couldn't we? How about no floating-point math?

def is_multiple_of(number, mul):
    """Return True if number is multiple of mul, False otherwise.
    So again we're exploiting the properties of integer division.
    For any divisor, there are at least two pairs of two consecutive dividends that will return
    the exact same quotient under integer divison, and the smaller number in each
    is a multiple of the divisor. All we have to do is check that number
    is the smaller of the pair. We do that by seeing if the result changes
    if we subtract 1. If so, number is the smaller of the pair (i.e, number - 1
    is part of some other pair), so return True.
    Otherwise, we were given the larger of the pair, or alternatively,
    some completely different number, so return False.
    if number / mul > (number - 1) / mul:
        return True
        return False

def multiple_of_three(number):
    return is_multiple_of(number, 3)

def multiple_of_two(number):
    return is_multiple_of(number, 2)
for x in xrange(1, 101):
    if multiple_of_three(x):
        print "FuzzBuzz"
    elif multiple_of_two(x):
        print "Fuzz"
        print "Buzz"

Now things are getting ever-so-slightly hairy. But let's get dangerous now. How about no Boolean logic whatsoever?

from collections import deque
sequence = deque(["Buzz", "Fuzz", "FuzzBuzz", "Fuzz", "Buzz", "FuzzBuzz"])

def next_in_sequence():
    """Return next string in FuzzBuzz sequence.
    The FuzzBuzz problem follows a pattern:
        * Odd
        * Even
        * Multiple of 3
        * Even
        * Odd
        * Multiple of 3
    All we have to do is keep track of where we are in the sequence.
    To accomplish this, we use a deque and a little bit of pushing and popping
    to keep from having to either pre-generate the entire sequence beforehand,
    or from having to use an iterator.
    next = sequence.popleft()
    return next

for x in xrange(1, 101):
    print next_in_sequence()

Alright, now we're talking. Admittedly, you probably saw this one coming if you were running the programs alongside reading this article - once you recognize the pattern, this solution is pretty obvious. But let's make one final push: No modulo operator, no math, no Boolean logic, and no loops.

from collections import deque
import sys
# Python allows the user to manually set the recursion limit.
# So, we exploit this by setting it to 101, which simulates the number of
# loop iterations we'd execute if we were using a loop.
# Eventually, we'll exceed this number, and Python will throw
# a RuntimeError, essentially accomplishing the same thing as
# a regular for/while loop.

sequence = deque(["Buzz", "Fuzz", "FuzzBuzz", "Fuzz", "Buzz", "FuzzBuzz"])

def next_in_sequence():
    """Print next string in FuzzBuzz sequence.
    This is identical to the deque solution in, save for
    printing out the next string instead of returning it, and the recursion bit,
    so go see for an explanation of what's going on here.
    next = sequence.popleft()
    print next

except RuntimeError:
    # swallow the eventual exception due to recursion taking one step too many

I feel now is a good time to stop and discuss the various solutions, now that we have an ample amount. Like I mentioned earlier on, if this were a real-world problem, I imagine most people using the first solution. It's clear, concise, and gets its point across without needing much in the way of hand-holding. It's easy to extend if we need to tackle additional multiples, as well.

Most of the rest of the solutions are pretty silly if we examine them without the restrictions we came up with. The modulo operator is a pretty useful tool for this problem, so it makes no sense to try to work around it, or reimplement it with our own code. I could see an argument being made for the non-recursive sequence solution - some might consider it more "elegant" than the original solution, though obviously that's up for debate.

Some might also think the sequence version is faster, considering the O(1) performance time of deque.popleft() and deque.append() (collections.deque is implemented as a doubly-linked list under the hood), but at least on my machine, the initial solution is faster when testing 10,000,000 numbers, completing in about 3.1 seconds versus 3.3 seconds for the sequence version. After some fiddling around, I was able to produce a variant on the sequence solution that ran in about 2.9 seconds on the same set:

from collections import deque
sequence = deque(["Buzz", "Fuzz", "FuzzBuzz", "Fuzz", "Buzz", "FuzzBuzz"])

# Binding oft-used functions to local scope results in a speed increase
# due to decreased variable lookup time
popleft = sequence.popleft
append = sequence.append

for x in xrange(1, 101):
    # Moving the function's code to the loop body removes the function call,
    # which actually increases performance slightly
    next = popleft()
    print next

By this point though, we've lost a fair amount of readability, which was one of the main reasons to use the sequence solution. On the upside, we have a reasonable increase in speed, which might be useful in a performance-critical application. On a side note, the performance of the other solutions is pretty abysmal in comparison; the modulo-less solution ran in about 8.5 seconds, whereas the float-less solution ran in about 6 seconds. As for the recursion-based solution, it segfaulted on my machine before it could actually complete the 10,000,000-iteration loop.

So, to summarize: Programming is about 50% art and 50% science; there's usually more than one demonstrably correct solution to the problem, and on top of that, there's often some fair room for debate about which of the solutions one should proceed with. We presented a simple programming problem, came up with some possible solutions for it, and discussed their pros and cons.

As an added bonus, I've set up a Github repository at that contains all the solutions in this article, as well as reference output to test your own solutions with. If you come up with a solution you like, regardless of its performance or what language it's written in, send me a pull request, and I'll add it to the repo.

Thursday, September 5, 2013

Ambiguous developer intention in fighting games

This is a topic that really grinds my gears: Ambiguous intention from developers in fighting games. I believe this sort of thing is very harmful to fighting games, and I'll be using this article to explain what in particular I have against ambiguous intention, provide some examples of ambiguous intention in several fighting games, and how I think those games would be better of without such things.
First off, let me give a solid, simple definition of ambiguous intention. Basically, it is an instance in a fighting game where a restriction the game's mechanics impose on the players can be avoided via use of one or more mechanics/techniques the game provides. There are several issues with such a thing, in my opinion:
  1. It adds an artificial execution barrier to the game. Anyone playing the game at a competitive level will have to manually perform something that would otherwise (and should) be done automatically or not at all.
  2. It makes the developer's intent unclear - do they want the restriction to be respected or not?
  3. It makes the game seem harder or "deeper" than it actually is.
The first example of unambiguous developer intent that comes to my mind, not to mention one of the most generally-accepted in the fighting game community, is nicknamed "chicken guarding" in the vs. Capcom series. Normally in those games, during a short period after you initiate a dash, regardless of what direction it's in or whether it's in the air or not, you are unable to block. Clearly, the developer's intent is to make dashing punishable if predicted.
However, it is possible to instantly cancel a dash into a jump in most games in the vs. Capcom series, and after you do so, to instantly block. This overrides the previously-implied restriction about punishable dashes. Now, I have absolutely nothing against making vs. Capcom dashes punishable or unpunishable; what I do have a problem with is the ambiguous developer intent: Do the developers want dashes to be punishable or not?
If the developers do want them to be punishable, then disallow blocking even after the dash-canceled jump, or if they do not want them to be punishable, then allow blocking directly during a dash, but by no means should they require the players to go through an unnecessarily-difficult input to accomplish a competitively mandatory technique which may or may not have been intended in the first place!

The second example, and one that I think is particularly egregious, is L-canceling in Super Smash Bros. Melee. L-canceling is a "mechanic" - I am not entirely certain it is deserving of the title - that allows you to cut the period of time your character cannot move or attack following an aerial attack in half by pressing the shield/block button a few frames before you hit the ground.
With the previous example of chicken guarding in addition to the 3 problems with ambiguous developer intent I listed earlier in mind, it should be immediately obvious what the problem with L-canceling is. If the developers want aerial attacks to have x amount of landing lag, then they should leave it at that; if they want aerial attacks to have 0.5x amount of landing lag, then they should do so, but they should not require players to press an extra button for something that could easily be done automatically. Artificial "difficulty" like this makes a fighting game seem more difficult and technical than it really is, and oftentimes hides serious problems behind a fake execution barrier.

The last example I'm going to give is of "kara throwing" in the Street Fighter series. Kara throwing essentially involves using an attack that moves your character forward, but before the part of the attack that actually hurts the opponent is activated, you cancel the attack into a throw, giving the throw more range than it otherwise would have. The window of opportunity to cancel the attack into a throw is usually very small - normally about 1 or 2 frames (1/60th to 1/30th of a second).
Again, it should be obvious what the issue with kara throwing is. If the developers want certain character's throw range to be longer, then they should be longer by default, or if they want throws to be shorter, then they should prevent kara canceling, but they should not require players to perform a difficult input for something which was not only unintended in the first place (kara throwing was originally a glitch in Street Fighter 3, but is intentionally in Street Fighter 4), but which obviously could be solved with a much simpler solution. 

I've seen several arguments stating that mechanics like the ones I've listed are good things, the majority of which basically boil down to the opinion that the more buttons you have to press and the more complicated things are, the better. Personally, I believe the simpler things are, the more time and energy you can focus on the mental aspect of the game, which, in my opinion, is far deeper and more satisfying than any arbitrarily difficult "mechanic" could ever be.
It could also be argued that since no-one really complains about arbitrary difficulty and ambiguous intent, it is not a problem that needs to be solved. I believe there are a majority of reasons this is the case, the largest being that since practically all fighting games ever made have had some sort of arbitrary difficulty, with the recent ones intentionally and the older ones unintentionally, means that anyone who has played fighting games for any decent amount of time is quite used to this sort of thing by now. So yes, there is some complacency, but I think many fighting game players have merely failed to realize how deleterious ambiguous intent and arbitrary difficulty can really be, since they have had to live with it for as long as they have played fighting games. I think once more players realize how much simpler and easier to play fighting games would be without this sort of thing, the more they will warm up to the idea and let developers know about it.

I should note that I think there is a definite place for this sort of mechanic in video games; intentionally difficult single-player games like Devil May Cry and I Wanna Be The Guy thrive off this sort of thing, and many people play those sort of games just because of all the arbitrary difficulty - I'm one of them. However, I don't think arbitrary difficulty has a place in fighting games; I'm trying to fight my opponent, not the game, and the easier it is to do whatever it is I should to in a given situation, the better.
I think I should also make it clear that the presence of an arbitrary mechanic does not automatically ruin a game. I enjoy playing and watching many games that have a number of arbitrary mechanics, including the games I used as examples. All I'm saying is how much better those games would be than they already are without the arbitrarily difficult mechanics, in addition to a larger userbase, since a (somewhat correct) public notion of fighting games is that you have to spend a lot of time practicing arbitrary stuff in training mode before you can become good at them.

And you know what? Eliminating ambiguous developer intent and arbitrary difficulty would go a long way towards eliminating that notion.

Tuesday, October 23, 2012

Super Smash Bros., Combos, and You - an article on a theoretical combo system for SSB

Super Smash Bros. has something of a mixed history as far as combos go. The original SSB64 had combos galore - most of the characters had combos that could take an as-of-yet untouched opponent and KO them without a single chance to dodge or counter - the definition of a "true" combo. Super Smash Bros. Melee significantly reduced the stun the average move does to your opponent, so combos (true combos anyway) became weaker and shorter compared to SSB64.

Now, with the latest release in the SSB series - Super Smash Bros. Brawl - true combos are practically non-existent, meaning the guaranteed damage the average character in the cast can do is typically limited to one or two moves in a row. This may not sound important, but it has several ramifications:
  • Lack of combos means mistakes aren't as costly. In the typical fighting game (SSBM and SSB64 included), 2-5 mistakes might mean a stock (or the game). SSBB, on the other hand, typically lets you get away with more than a dozen mistakes before you lose.
  • The skill ceiling is significantly lowered. SSBB pros can punish mistakes no harder than intermediate players, with the possible exception of what's called "gimping" - KO'ing your opponent with relatively weak move, usually used away from the stage edge.
  • Comebacks become a lot harder, since you'll typically have to repeatedly use one-off attacks, while only a few attacks (many times just one well-placed one) from your opponent will cost you the game.
What I suggest is to give SSB a combo system/mechanic. I'm not going to describe what I have in mind for the system upfront - rather, I'll evolve it over the course of this article, which I think will lead to a better understanding of how a "true" combo system would significantly help SSB and alleviate the aforementioned issues Brawl has. So let's get started!

The first step is to see what other fighting games use. The combo mechanic you most often see in other games is this thing called canceling. Basically, canceling let's you literally cancel one move's animation frames into an entirely different move altogether. Depending upon the game, you can only cancel a move if it touches the opponent (regardless of whether or not it does damage), and for the purposes of this article, we'll assume that's the case for our combo system. Additionally, you can typically only cancel regular attacks into heavier regular attacks (think SSB's standard attack/auto combo) or special attacks.

So now we have our base mechanic. How do we apply it to SSB? Let's assume we can cancel a standard attack into a tilt attack or special attack, we can cancel a tilt attack into a smash attack or special attack, and we can cancel a smash attack into a special attack. So now, every character has this basic combo: Standard attack > tilt attack > smash attack > special attack.

This is pretty good already, but this system suffers from the same issue Brawl has: There's little - if any - room for improvement over the simple combo I just mentioned. The skill ceiling issue remains basically untouched. We need to figure out a way to allow creativity within our little combo system. What can we do to fix this problem?

We have two options: We could further extend the boundary of what can or cannot be canceled, or we can come up with another method of extending combos. The first method will simply lengthen our previous combo, while the latter's benefit will be dependent upon how versatile the introduced sub-mechanic is. We'll try the latter approach, since it helps alleviate the skill ceiling issue Brawl suffers from. So without further ado, I give you: The launcher.

Simply put, a launcher is a move that, like the name implies, launches the opponent up into the air. You can also typically cancel the launcher's animation with a jump if the launcher hurts the opponent, and for our combo system, we'll also allow launchers to be canceled into a dash, which should allow for a bit more creativity as well.

We have to integrate launchers with the little cancelling hierarchy we defined earlier now. Let's insert launchers between smash attacks and special attacks, so that our standard combo now looks like: Standard attack > tilt attack > smash attack > launcher + aerial attack or special attack.

It's not much, but this combo system is already shaping up to be relatively open-ended. However, since we've been focusing on adding new features to it, we've been left with a lot of loose ends to tie up:
  • Knockback (not to mention DI) will effect the viability of a combo - i.e, an opponent might be knocked back far enough that even if you cancel, the new move will not connect. Something needs to be done about this.
  • Hitstun - the amount of time the opponent is stunned and unable to do anything after being hit with one of your moves - will probably need to be increased on average, at least relative to Brawl. Otherwise, even if you perfectly cancel a move into another move, your opponent might recover before the second move's active frames activate.
  • It's conceivable that the combo system could possibly be abused to the point of infinite combos. We obviously can't think of all possible combos, so we should put in some sort of mechanic that will prevent combos from becoming excessively long.
Let's tackle knockback first. We obviously can't take knockback completely out of the system - it's too much a part of SSB to remove it now. So I propose that the hitlag system normally reserved for dramatic, powerful attacks be applied to all moves in the game, with about 5-10 frames of hitlag per hit. Those frames will be the "cancel window" during which you can cancel to another move and be guaranteed that it will connect - no knockback will be applied until after the cancel window has passed. Launchers will have set vertical knockback to make comboing easier, similar to Falco's shine in SSBM.

For those not familiar with the hitlag (also known as "freeze frame") system, hitlag is basically a period of time after which a move connects with an opponent during which the game is, for all intents and purposes, frozen - neither the attacker nor the defender can move during the hitlag period.

This doesn't address the tricky problem that is DI (Directional Influence), however. It's possible in SSBM and SSBB (SSB64 didn't have DI) to use DI to shift the direction in which you get launched and thus escape combos. Whether or not that's a good thing probably comes down to personal opinion, but DI is here to stay in all likelihood, so I'd prefer to design our theoretical combo system to incorporate DI.

The way I see it, the easiest way to incorporate DI into our combo system is to allow DI only after the cancel window on each hit has passed. This brings up a dilemma: What about multi-hit moves? I want to keep comboing as offense-oriented as possible, and thus have relatively few options - if any - to break out of a combo, so I think having a cancel window after each hit, but no knockback and DI until after the cancel window for the final hit has passed is a good idea. Launchers probably shouldn't be DI-able for similar reasons, so let's assume that we can't DI out of a launcher as well.

Now we only have to deal with infinite combos, something a lot of fighting games don't get right. Even the borderline combo-less SSBB had a few infinites here and there, so we'd better take extra caution with our combo system, since it's a lot easier to chain multiple moves back-to-back. Again, taking a look at what other fighting games typically do to alleviate this, I now introduce: hitstun deterioration.

In a nutshell, hitstun deterioration gradually lowers the amount of hitstun of each successive hit in a combo. Eventually, the hitstun of each hit will be practically 0, and the opponent will be able to easily escape the combo. It's obvious that hitstun deterioration could possibly limit legitimate non-infinite combos, as well as put characters with lots of multi-hit moves at a disadvantage, but it's probably the best defense against infinite combos out there right now.

Let's take a quick review of what our combo system's main mechanics are (this is a good TL;DR paragraph if you skimmed over the other parts of the article):

  • Canceling. We can cancel moves into certain other moves in this order: Standard attack > tilt attack > smash attack > launcher + aerial attack or special attack.
  • Cancel windows. There's a small window of opportunity after each hit of a move (about 5-10 frames), during which you can cancel the current move into another move, as per our canceling mechanic defined earlier. No knockback is applied (and DI is impossible) until after the cancel window is over.
  • Launchers. We can perform a launcher, which is basically a move which launches the opponent into the air with set vertical knockback, to perform an aerial combo. If the launcher connects with the opponent, you can cancel the launcher into a jump or a dash. You cannot DI out of a launcher.
  • Hitstun deterioration. The hitstun of each successive hit in a combo gradually decreases to almost 0, thus effectively preventing infinite combos. Note that this doesn't prevent a zero-to-death combo, just infinite combos.

So there you have it. We evolved a theoretical combo system around SSB's unique mechanics, ironed out some minor kinks, and even introduced a few unique mechanics of our own along the way. It's probably not perfect, and might have some errors I've overlooked while I wrote this article, but it's a simple, solid combo system that should be enjoyable by newbies and professionals alike. So that said, it's time for closing statements. :)

  • Project M, a very popular mod for SSBB, includes a combo system for Lucario very similar to the one described here. The most notable differences is that the cancel window system described here is missing, and Lucario's launcher (his up smash) does not have set knockback, meaning it's better as a standalone kill move, but somewhat hard to use at times as a launcher. Hitstun deterioration is also missing, and all his combos are DI-able as far as I know, but the system is still quite similar to the one in this article.
  • It would be feasible to make a mod for SSBB that implements our theoretical combo system, but although I'm not lacking in programming skills, I'm still quite unfamiliar with modding SSBB in general, unfortunately.
  • SSB4 will be coming out sometime next year (2013), and Masahiro Sakurai (SSB's creator) has stated that the game will be taking a "new direction" compared to previous SSB titles. Namco - who develops the popular fighting game series Tekken - is heading development of SSB4, so who knows, you might see something like this in SSB4.
Thanks for taking the time to read through this giant wall of text!