-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathdoc.txt
152 lines (127 loc) · 4.27 KB
/
doc.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
MCTS algorithm
1. Create root node
2. Set root as current
3. If current has children:
a. Find best child according to selection algorithm
b. Set child as current
c. Go to 3
4. If current is leaf (no children):
a. If current is game over:
a. Go to 2 (?)
b. If current is not game over:
a. Find a random next move
b. Create a child node for this move
c. Make a playout from child node
d. Propagate results to each parent
e. Go to 2 (if not timeout)
Selection algorithm:
calculate k = C * sqrt(ln(current.visits))
return child with max UCB = (child.wins/child.visits) + k * 1/sqrt(visits)
Results propagation:
do:
increment current.visits
if result was a win:
increment current.wins
current = current.parent
while current is not root;
teresa_node {
teresa_node* sibling;
teresa_node* parent;
teresa_node* child;
color pl;
move move;
int wins;
int visits;
}
TODO (Problems with Teresa's algorithm)
- Each player would try to maximize his own UCB score, not that of the player currently thinking.
- Handle game-overs correctly
- Need better debugging tools w/ more accurate & understandable info
- Currently tends to favor passing twice at some point (even though this theoretically results in a 100% loss) --> why?
Pseudocode:
C := params.C;
root := params.root;
while can still think:
current := root
state := clone(state0)
// Selection
while current has children:
k := C * sqrt(ln(current.visits))
for each child:
if child.visits == 0:
UCB := Infinity
else:
UCB := (child.wins/child.visits) + k * 1/sqrt(child.visits)
find child with highest UCB for that player (randomly if tied)
current := child with highest UCB
play_move(state, current.move)
if is_game_over(state):
destroy(state)
continue;
// Expansion
list := get_legal_moves(state);
for each move in list:
create child node {parent: current, move: move, wins: 0, visits: 0}
// Simulation
current := random child node;
play_move(state, current.move)
result = play_out(state)
// Back-propagation
do:
++current.visits
if result.win:
++current.wins
current = current.parent
while current != root;
destroy(state);
TODO
Karl still fills in friendly eyes sometimes (real MCTS probably resolves this)
Sorting out go.c (low level of abstraction to high)
3 *color_char: --
0 *color_opponent: --
3 index_char: --
1 char_index: --
1 is_star_point: --
2 dot_char: is_star_point, *color_char
1 heatmap_char: --
0 *move_create: --
0 *move_destroy: --
0 *move_parse: char_index
2 *move_sprint: index_char
2 *move_print: *move_sprint
2 -change_neighbors_freedoms: --
1 -change_neighbors_freedoms_if_specific_color: --
1 stone_init: --
1 group_init: --
2 group_list_remove: --
2 group_list_insert: --
1 group_pool_borrow: group_list_remove, group_list_insert
2 group_pool_return: group_list_remove, group_list_insert
1 group_dump: *move_sprint, *color_char, *move_print
1 group_add_stone: --
1 group_merge_and_destroy_smaller: group_pool_return
2 group_kill_stones: (-change_neighbors_freedoms_if_specific_color)
1 count_territory: --
0 *state_create: --
0 *state_copy: --
0 *state_destroy: --
0 *state_print: index_char, *color_char, dot_char, *state_score
0 *state_dump: *move_print, group_dump
3 *state_score: count_territory
1 fills_in_friendly_eye: --
2 check_possible_ko: --
1 is_neighbor_enemy_dead: --
1 remove_dead_neighbor_enemy: group_kill_stones, group_pool_return
1 count_liberties: --
2 has_living_friendlies: --
1 has_dying_friendlies: --
1 create_lone_group: group_pool_borrow, group_init
1 merge_with_every_friendly: stone_init, group_add_stone, group_merge_and_destroy_smaller
2 *go_is_game_over: --
1 *go_is_move_legal: check_possible_ko, is_neighbor_enemy_dead, has_living_friendlies, (-change_neighbors_freedoms)
1 *go_get_legal_moves: *go_is_game_over, *go_is_move_legal
1 *go_play_move: check_possible_ko, (-change_neighbors_freedoms), remove_dead_neighbor_enemy, count_liberties, has_living_friendlies, has_dying_friendlies, create_lone_group, merge_with_every_friendly
1 *go_play_random_move: *state_score, fills_in_friendly_eye, *go_play_move, *go_get_legal_moves
0 *go_play_out: *go_is_game_over, *go_play_random_move, *state_score
0 *go_print_heatmap: index_char, heatmap_char, dot_char
0 *go_print_move_result: --