Mind your grammar

Series: Starting from the top
Topic: WTF JavaScript

In Mozilla’s documentation on JavaScript they have a section on lexical grammar, and this is how they summarize it:

The source text of ECMAScript scripts gets scanned from left to right and is converted into a sequence of input elements which are tokens, control characters, line terminators, comments or white space. ECMAScript also defines certain keywords and literals and has rules for automatic insertion of semicolons to end statements.

Let’s unpack the various parts of this.

checkboxThe source text of ECMAScript scripts checkboxgets scanned from left to right checkboxand is converted into a sequence of input elements checkboxwhich are tokens, control characters, line terminators, comments or white space. checkboxECMAScript also defines certain keywords and literals checkboxand has rules for automatic insertion of semicolons to end statements.

This just means a document (text file) that contains ECMAScript.

[the document] gets scanned from left to right

Just like you are reading the words on this page, and interpreting their meaning.

[the document] is converted into a sequence of input elements

An input element is basically a single instruction written in the form of code, that you are giving to a compiler.

// this is an input stream
let result = 1 + 1
// spaces separate the input elements

…which are tokens, control characters, line terminators, comments or white space.

Let’s define each of the above:

  • Tokens
  • Control characters
  • Line terminators
  • Comments or white space

WTF is a token?

Image of an NYC subway token Photo courtesy of a.pitch

When you are writing code for the most part you aren’t using some special character set that you are completely unfamiliar with (unless your native speaking language uses some other alphabet). You are using the characters from the Latin Alphabet. When you write text that’s meant for another human to read you are using the following things:

  • Letters A-Z
  • Numbers
  • Punctuation marks
  • White space

By putting combinations of letters, white space, numbers and punctuation marks together…you form sentances, and paragraphs that other humans can read.

Image of a to do list Readable by another human

In the to-do list image above, you can think of each word…as tokens. White space is used to separate tokens, just like we do with words.

Control characters

An average person writing code will probably never have to know that control characters even exist, so I will just touch on them briefly here for completeness sake.

There are three control characters used by the JavaScript interpreter to read your code. When you are writing code you won’t ever see these control characters, they are hidden from you:

  • Zero width non-joiners
  • Zero width joiner
  • Byte order mark

Zero width non-joiners are a bit of white space placed between two characters on your screen to ensure that you (a human) can read the text.

Image of text with and without zero width non-joiners

Zero width joiner do the opposite of non-joiners, they bring two characters together to form a new character. This mostly applies in Arabic and Brahmic scripts.

Byte order mark is a mark at the beginning of your code telling the compiler that the code is using Unicode.

Line terminators

Line terminators are used to help with readability of the JavaScript code you write. An example of a line terminator is a new line.

Reading
like
this
is
totally
fine.

…but, it is awkard to read.

// this line of code
let result = 42

// is the same as this one
let
result
=
42

There is also another line terminator, the semicolon.

JavaScript does this weird thing called automatic semicolon insertion. I say its weird because you don’t have to put semicolons in your code, but most code you see will use semicolons despite the fact that the JavaScript compiler will put them in for you.

I don’t particularly like this feature, but who the fuck am I. For what it is worth I use semicolons in my code.

Writing notes

Writing code can be really difficult, and reading your own or others code can be even more difficult. That’s where comments come in. The compiler ignores comments completely, they are only there for the readers benefit.

You can write comments as single lines in your code by using two slashes //

// Here is a single line comment
let result = 42;

You can also write comments as several lines using a slash and star to start it /*, and a star and slash to end it */

/* 
Here is a
multiline 
comment 
*/
let result = 42;