Java Lexcial Structure

字数 228阅读 75

Lexical analysis

lexical analysis is the process of translation from a raw Unicode character stream to a sequence of tokens. The tokens are the terminal symbols of the syntactic grammar. A program that perform lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. In detail, there are three steps in turn :

  1. translate all Unicode escapes to the corresponding Unicode character, for example, translate \n to 0A
  2. recognize line terminators to separate the stream resulting from step 1 to the input characters and terminators, this step will save line numbers of source code so that you can debug your program by some error message with corresponding line number
  3. split result from step 2 to white space (including line terminator), comments and tokens , and then tokens are reserved


Token is a very important concept in compiler. Java tokens contain :

  • Identifier
  • Keyword
  • Literal
  • Separator
  • Operator

The Tokens are non-terminal symbols of the lexical grammar with characters as terminal symbols, like this :


but the terminal symbols of the syntactic grammar. A parser which analyze the syntax of programming language uses token stream as input, and abstract syntax tree (AST) as output.




  • 灯红酒绿 翩翩起舞的蝴蝶 通过风传来诱惑 而我只是与长头发的杨柳一起 望着小河 夏天用雨用浪表达着热情 我也希望心...
  • 地表鸡卵随风熟,远方脚步无可阻。 碧海蓝天畅辽阔,山青水秀载桃花。 普陀梵音响竹苑,禅寺孕妇沾圣水。 起早贪黑逐日...
  • iPhone 8、X发布前,马化腾带领高管去了趟苹果的总部,虽然不知道他们跟库克聊了什么,但肯定是很愉快的。 上周...
  • 今天才发现微信storage增加了“手机已用空间”的展示,刚看到时有一种“wow,真贴心真强大”的感觉,因为查看系...
  • 在大阪染了发,算是这五年来最深的一次,但是格外满意,哈哈。 这次去买了新的彩妆、秋冬装、鞋子,无比满意,自我感觉眼...