Let-s-Build-A-Simple-Interpreter笔记-6

课程文档

原文

翻译

新的翻译

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, MUL, DIV, LPAREN, RPAREN, EOF = (
'INTEGER', 'PLUS', 'MINUS', 'MUL', 'DIV', 'LPAREN', 'RPAREN', 'EOF'
)
class Token:
def __init__(self, type, value):
self.type = type
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS, '+')
Token(MUL, '*')
"""
return f'Token({self.type}, {repr(self.value)})'
def __repr__(self):
return self.__str__()
class Lexer:
def __init__(self, text):
# client string input, e.g. "4 + 2 * 3 - 6 / 2"
self.text = text
# self.pos is an index into self.text
self.pos = 0
self.current_char = self.text[self.pos]
def error(self):
raise Exception('Lexer error')
def advance(self):
"""Advance the `pos` pointer and set the `current_char` variable."""
self.pos += 1
if self.pos >= len(self.text):
self.current_char = None # Indicates end of input
else:
self.current_char = self.text[self.pos]
def skip_whitespace(self):
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Return a (multidigit) integer consumed from the input."""
value = ''
while self.current_char is not None and self.current_char.isdigit():
value += self.current_char
self.advance()
return int(value)
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens. One token at a time.
"""
while self.current_char is not None:
if self.current_char.isspace():
self.skip_whitespace()
continue
elif self.current_char.isdigit():
return Token(INTEGER, self.integer())
elif self.current_char == '+':
self.advance()
return Token(PLUS, '+')
elif self.current_char == '-':
self.advance()
return Token(MINUS, '-')
elif self.current_char == '*':
self.advance()
return Token(MUL, '*')
elif self.current_char == '/':
self.advance()
return Token(DIV, '/')
elif self.current_char == '(':
self.advance()
return Token(LPAREN, '(')
elif self.current_char == ')':
self.advance()
return Token(RPAREN, ')')
else:
self.error()
return Token(EOF, None)
class Interpreter:
def __init__(self, lexer):
self.lexer = lexer
# set current token to the first token taken from the input
self.current_token = self.lexer.get_next_token()
def error(self):
raise Exception('Interpreter error')

def eat(self, token_type):
# compare the current token type with the passed token
# type and if they match then "eat" the current token
# and assign the next token to the self.current_token,
# otherwise raise an exception.
if self.current_token.type == token_type:
self.current_token = self.lexer.get_next_token()
else:
self.error()
def factor(self):
"""factor : INTEGER | LPAREN expr RPAREN"""
token = self.current_token
if token.type == INTEGER:
self.eat(INTEGER)
return token.value
elif self.current_token.type == LPAREN:
self.eat(LPAREN)
value = self.expr()
self.eat(RPAREN)
return value
def term(self):
"""term : factor ((MUL | DIV) factor)*"""
value = self.factor()

while self.current_token.type in (MUL, DIV):
token_type = self.current_token.type
if token_type == MUL:
self.eat(MUL)
value *= self.factor()
elif token_type == DIV:
self.eat(DIV)
value /= self.factor()
return value
def expr(self):
"""Arithmetic expression parser / interpreter.

calc> 7 + 3 * (10 / (12 / (3 + 1) - 1))
22

expr : term ((PLUS | MINUS) term)*
term : factor ((MUL | DIV) factor)*
factor : INTEGER | LPAREN expr RPAREN
"""
value = self.term()
while self.current_token.type in (PLUS, MINUS):
token_type = self.current_token.type
if token_type == PLUS:
self.eat(PLUS)
value += self.term()
elif token_type == MINUS:
self.eat(MINUS)
value -= self.term()
return value
def parse(self):
return self.expr()
def main():
while True:
try:
text = input('calc> ')
except EOFError:
print()
break
if len(text.strip()):
interpreter = Interpreter(Lexer(text))
print(interpreter.parse())
else:
continue
if __name__ == '__main__':
main()

笔记

这一课在加减乘除的基础上加入了括号,而括号里的内容又可以看做是一个新的加减乘除表达式,即如下语法:

1
2
3
expr   : term   ((PLUS | MINUS) term)*
term : factor ((MUL | DIV) factor)*
factor : INTEGER | LPAREN expr RPAREN

可以看出,exprterm没有任何变化,而factor则多了一种可能:(expr),因此需要作出以下修改:

  1. Lexer部分添加左右括号的返回;
  2. Interpreter现在分成了两个分支,字符类型是INTEGER时,和之前的一样,返回该字符的value;字符类型是左括号时,则返回该括号里的expr的值,并把右括号也读过去。

这样的递归语法,称之为递归下降解释器。

练习

  1. 如本文的描述,写一个你自己版本的算术表达式解释器。记住:重复是学习之母。

    试着回顾一下整个代码的思路:

    1. 代码除了main()以外,有3个class,分别是TokenLexerInterpreter

    2. Token是最基础的词法单元,有两个成员变量:typevalue,此外有初始化和输出的函数。

    3. Lexer是词法分析器,成员变量为:text(完整的语句)、pos(当前读取到语句位置的索引)、current_char(当前读取出的字符),成员函数为:error(报词法错误)、advance(往后读取一个新字符)、skip_whitespace(跳过空格)、integer(把连续的数字字符转成整数返回)、get_next_token()(返回当前词法单元)。

    4. Interpreter是语法解释器,成员变量为:lexer(词法分析器)、current_token(当前处理的词法单元),成员函数为:error(报语法错误)、eat()(往后处理下一个词法单元)、factortermexpr三个都是语法图里的内容,最后加了一个parse返回expr()

    大致就是这样了。

检查理解

这一课没有这一部分。

  • Copyrights © 2020-2024 Kun Li

请我喝杯咖啡吧~

支付宝
微信