ãåæã«æè¯åªå æ¢çŽ¢ãšããŒã ã»ãµãŒãã§Water Jug Problemã解ãçŽããŠã¿ããã®ç¶ç·šã§ããä»åã¯ãæ¢çŽ¢ã¢ã«ãŽãªãºã ããã¹ã¿ãŒããŠããã°ãããšã¯ç°¡åãªã³ããŒïŒããŒã¹ãã§ã«ãŒããã¯ã»ãã¥ãŒãã®ãããªé£æ床ãé«ãåé¡ã解ãã¡ãããã ãã£ãŠã®ããããŸãã
ãšãã£ãŠãã解æ³ã¯ç§ã®ãªãªãžãã«ã§ã¯ãªããã«ãªãã©ã«ãã¢å€§åŠã¢ãŒãã€ã³æ ¡ã®DeepCubeAã§ãããã®æãAlphaZeroãèŠããšããããŸãã«åçŽãªã®ã§é©ããã®ã§ããã©ãDeepCubeAãåçŽïŒãã¡ããè€ãèšèïŒã§é¢çœãã§ããã
äœæããããã°ã©ã ã¯GitHubã«ãããŸããNVIDIAã®GPUãæã£ãŠããæ¹ã¯ã以äžã®æé ã§å®è¡ããŠã¿ãŠãã ããïŒGPUããªãå Žåã¯ã5ãé£ã°ããŠå®è¡ããŠã¿ãŠãã ããïŒã
- æªã€ã³ã¹ããŒã«ãªããPythonãšTensorflowãšCUDAãã»ããã¢ããããã
git clone https://github.com/tail-island/rubic-cube.git
cd rubic-cube
git lfs pull
- model/cost.h5ãåé€ããŠ
python train-all.py
ããŠã10æ¥ãããåŸ ã€ãçµæã ãèŠããå Žåã¯ããã®ã¹ãããã¯é£ã°ããŠãã ããã python solve.py
ããŠãã«ãŒããã¯ã»ãã¥ãŒãã®åé¡ãšè§£çãåºåãããã®ãèŠãã- Webãã©ãŠã¶ã§test-ui/test-ui.htmlãéããŠã6ã§åºåãããåé¡ãšè§£çãå ¥åããŠã解çãæ£ããããšã確èªããã
ååã®æçš¿ãèŠãŠããã ããã°åããã®ã§ããã©ãæè¯åªå æ¢çŽ¢ãããŒã ã»ãµãŒããã®ãã®ã¯ãšãŠãç°¡åã§ããã³ãŒãå°ããã§ããã
ã§ããæè¯åªå æ¢çŽ¢ãããŒã ã»ãµãŒãåãã®è©äŸ¡é¢æ°ãäœãããšã¯ããšãŠãé£ãããã§ããâŠâŠãããšãã°ãå°æ£ãå²ç¢ã®ç€é¢ã®è¯ãã枬ãè©äŸ¡é¢æ°ãäœããªããŠã®ã¯ã人éïŒå°ãªããšãç§ïŒã«ã¯äžå¯èœã§ããã«ãŒããã¯ã»ãã¥ãŒãçšã®è©äŸ¡é¢æ°ãåæ§ã§ãç§ããšãã§ã¯ã©ãã«ãäœæã§ããŸãããã§ããè©äŸ¡é¢æ°ããªããšãæè¯åªå æ¢çŽ¢ãããŒã ã»ãµãŒããã§ããŸãããã©ãããŸãããâŠâŠã
ãŸãã人éã«ã§ããªããªããæ©æ¢°ã«ããããã°ããã ããªãã ãã©ãã深局åŠç¿ã§ãæ©æ¢°ã«è©äŸ¡é¢æ°ãäœããã¡ãããŸãããã
ãã ã深局åŠç¿ãšããã®ã¯å ¥åãšæ£è§£ã®ãã¢ã倧éã«ã¶ã¡èŸŒãã§å ¥åãšåºåã®é¢ä¿ã®ãã¿ãŒã³ãæ©æ¢°ã«å°ããããšããææ³ãªã®ã§ãã©ãã«ãããŠå€§éã®å ¥åãšæ£è§£ã®ãã¢ãäœããªããã°ãªããŸãããå²ç¢ãå°æ£ãATARIã®ã²ãŒã ã®å Žåã¯ãå®éã«ã²ãŒã ãããããŠãã®çµæããã£ãŒãããã¯ãã圢ã§ããŒã¿ãäœãã¿ããã ãã©ïŒAlphaZeroãDQNïŒãã«ãŒããã¯ã»ãã¥ãŒãã®å Žåã«ã¯ãã£ãšç°¡åãªæ¹æ³ããããŸãã
èããŠã¿ãŸããããæè¯åªå æ¢çŽ¢ãããŒã ã»ãµãŒãã§å¿ èŠãªã®ã¯ããŽãŒã«ãŸã§ã®ã³ã¹ããäºæž¬ããè©äŸ¡é¢æ°ã§ããä»åã®é¡æã®ã«ãŒããã¯ã»ãã¥ãŒããªããããšäœåãŸããã°6é¢æãã®ããäºæž¬ããé¢æ°ãšãªãã®ã§ã深局åŠç¿ãžã®å ¥åããŒã¿ã¯ã«ãŒããã¯ã»ãã¥ãŒãã®ç¶æ ãæ£è§£ããŒã¿ã¯ããšäœåãŸããã°6é¢æãã®ãã®æ°å€ã«ãªãããã
ã§ããããéã«ããŠã6é¢ãæã£ãç¶æ ããããšãã°3åé©åœã«ãŸãããŠãæ£è§£ïŒ3ãå ¥åïŒ3åé©åœã«åããçµæãšããã°ãã»ãããããã§ãç¡éã«ããŒã¿ãäœãã¡ããïŒ
ãšããããã§ãå®éã«ãã£ãŠã¿ãŸãããããŸãã¯ãã«ãŒããã¯ã»ãã¥ãŒãã®ã«ãŒã«ãå®è£ ããŸããããŸãéèŠã§ã¯ãªãã®ã§è§£èª¬ã¯çç¥ããŸããã©ãNumPyã䜿ã£ãããšãŠã楜ã¡ãã§ããã詳现ãç¥ãããå Žåã¯game.pyãåç §ããŠãã ããã
次ã«ã深局åŠç¿ã®ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯âŠâŠãªã®ã§ããã©ãè«æãæãèªã¿ãããResNetã ãšæžãããŠããã®ã§ãææžããã³ãŒãããã³ããŒïŒããŒã¹ãããŠäœããŸãããçµæã¯ãããªæãã
import tensorflow as tf
from funcy import *
from game import *
from pathlib import *
def computational_graph():
def add():
return tf.keras.layers.Add()
def batch_normalization():
return tf.keras.layers.BatchNormalization()
def conv(filter_size, kernel_size=3):
return tf.keras.layers.Conv2D(filter_size, kernel_size, padding='same', use_bias=False, kernel_initializer='he_normal')
def dense(unit_size):
return tf.keras.layers.Dense(unit_size, use_bias=False, kernel_initializer='he_normal')
def global_average_pooling():
return tf.keras.layers.GlobalAveragePooling2D()
def relu():
return tf.keras.layers.ReLU()
####
def residual_block(width):
return rcompose(ljuxt(rcompose(batch_normalization(),
conv(width),
batch_normalization(),
relu(),
conv(width),
batch_normalization()),
identity),
add())
W = 1024
H = 4
return rcompose(conv(W, 1),
rcompose(*repeatedly(partial(residual_block, W), H)),
global_average_pooling(),
dense(1),
relu()) # ãã€ãã¹ã®å€ãåºããšé¢åãªæ°ãããã®ã§ãReLUããŠã¿ãŸããã
è«æäžã«regularizationã¯äœ¿ããªãã£ããšæžãããŠããã®ã§kernel_regularizer
ã®èšè¿°ãæ¶ããŠãããšã¯å
¥åãå°ããã®ã§ããŒãªã³ã°ãåã£ãŠãæåŸãdense(1), relu()
ã«ãããããã§ããªãè³ãäžå䜿ããªãæ©æ¢°äœæ¥ã§ããšã«ãã楜ã¡ããããããããããªåçŽãªã³ãŒãã§ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãå®çŸ©ã§ããç§å¯ã¯ãKerasãšé¢æ°åããã°ã©ãã³ã°ãã¹ãŽã€ãããã§ãã
ã§ããã®ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãžã®å
¥åã®åã¯ã3Ã3Ã36ã®è¡åã«ããŸãããè¡åãšãããšé£ããæããŸããã©ãå®ã¯3Ã3ã®ã¢ãã¯ãç»åã36æãšããã ãã®æå³ãªã®ã§ãšãŠãç°¡åã§ããã«ãŒããã¯ã»ãã¥ãŒããèŠãŠã¿ããšã3Ã3ã®é¢ã6åããã§ããïŒãã§ã深局åŠç¿ã§ã¯èµ€ã1ã§éã2ãšãã§è¡šçŸããããšã¯ã§ããªãïŒéã¯èµ€ã®2åãšããé¢ä¿ããããªãèµ€=1ã§é=2ãšããŠãããã®ã§ããã©ãã«ãŒããã¯ã»ãã¥ãŒãã§ã¯ãããªé¢ä¿ã¯ãªãïŒã®ã§ãèµ€å°çšã®3Ã3ã®ã¢ãã¯ãç»åïŒèµ€ããã°1ã§ãããã§ãªããã°0ã«ããŸãïŒã6é¢åãéå°çšã®3Ã3ã®ã¢ãã¯ãç»åã6é¢åãšãã圢ã§è¡šçŸããªããã°ãªããªããŠãã ããã3Ã3ÃïŒ6é¢Ã6è²ïŒã§3Ã3Ã36ã®è¡åã«ãªã£ããšããããããã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãžã®å
¥å圢åŒãžã®å€æã¯ãgame.pyã®get_x()
é¢æ°ã§å®æœããŠããŸããååãx
ãšãªã£ãŠããã®ã¯ãTensorflowãæ¡çšããŠãã䟿å©ã©ã€ãã©ãªã®Kerasã«ã¯å
¥åãx
ã«ããŠåºåãy
ã«ãããšããç¿æ
£ãããããã§ãã
æºåãæŽã£ãã®ã§ãå®éã«èšç·ŽããŸããããããããææžããã³ãŒãããã³ããŒïŒããŒã¹ãããŠå°ãä¿®æ£ããã ãã§ãã
def main():
def create_model():
result = tf.keras.Model(*juxt(identity, computational_graph())(tf.keras.Input(shape=(3, 3, 6 * 6))))
result.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_absolute_error'])
result.summary()
return result
def create_generator(batch_size):
while True:
xs = []
ys = []
for i in range(batch_size):
step = randrange(1, 32)
xs.append(get_x(get_random_state(step)[0]))
ys.append(step)
yield np.array(xs), np.array(ys)
model_path = Path('./model/cost.h5')
model = create_model() if not model_path.exists() else tf.keras.models.load_model(model_path)
model.fit_generator(create_generator(1000), steps_per_epoch=1000, epochs=100)
model_path.parent.mkdir(exist_ok=True)
tf.keras.models.save_model(model, 'model/cost.h5')
tf.keras.backend.clear_session()
ä»åã¯ããŒã¿ããã®å Žã§çæã§ããŸããããKerasã®ãµã³ãã«ã§ããèŠãmodel.fit()
ã§ã¯ãªããmodel.fit_generator()
ã䜿çšããŸããfit_generator()
ã®åŒæ°ã¯create_generator()
é¢æ°ã®æ»ãå€ã§ãããã¯ãããŒã¿ãçæããé¢æ°ãè¿ãé¢æ°ã§ããcreate_generator()
ãè¿ãé¢æ°ã§ã¯ããŸããåæ°ãã©ã³ãã ã«éžãã§ããã®åæ°ã©ã³ãã ã«ãŸãããçµæãx
ã«ãåæ°ãy
ã«ããŠããã ãã§ãã
ããããmodel.fit_generator()
ã®åŒæ°ã®ããã«ã1,000Ã1,000Ã100åã®1ååç¹°ãè¿ããŠã¿ãŸããè«æã«ã¯10 billionïŒ100åïŒãšæžããŠãã£ãã®ã§ããã§ã¯å°ãªããããããŸãããããtrain-all.pyã§ããã®åŠçã10åç¹°ãè¿ããŠ10å件ã®ããŒã¿ã§åŠç¿ãããŠã¿ãŸãããç§ãæã£ãŠããåèœã¡ã®GPUïŒGeForce 1080 TiïŒã ãšãåŠç¿ã«10æ¥ãããããã£ãŠèŸãã£ãã§ãâŠâŠã
ã§ãåŠç¿ã®çµæãå¯èŠåããŠã¿ããšã以äžã®å³ã®ããã«ãªããŸããã暪軞ãæ£è§£ããŒã¿ïŒäœåãŸããããïŒã§ã瞊軞ããã¥ãŒã©ã«ã»ãããã¯ãŒã¯ããã®åºåã§ãã1å件ã®çµæã2å件ã®çµæâŠâŠãšé ã«ã¢ãã¡ãŒã·ã§ã³ããŸãã
ããŒã¿ã®äœãæ¹ããã¿ã©ã¡ãªã®ã§ãããšãã°åãæ¹åã«3åãŸãããããŒã¿ã¯ãéæ¹åã«1åã ããŸãããããŒã¿ãšåãã«ãªããŸããäžã®å³ãèŠãŠã¿ããšããã®ã±ãŒã¹ã§æ£ãã1ãšçããŠããïŒæšªè»žã®3ã®ãšããã§ã瞊軞ã®å€ã3ãš1ã®ãšããã«çµæãéäžããŠããïŒããã§çŽ æŽãããïŒããŒã¿äœææã«ãŸãããŠæ»ãåäœã¯é€å€ããã®ã§ã暪軞ã®å€ã2ã®ãšããšã¯çžŠè»žã2ã®ãšããã ãã«éäžããŠããŸãïŒãå³ã«è¡ããšäžäžã®ãã¬ã倧ãããªã£ãŠç²ŸåºŠãåºãŠããŸãããã©ãããã¯ãäºæž¬ãã®ãã®ãé£ããããšã«å ããŠãåŠç¿ããŒã¿ã®æ£è§£ãæ¬åœã®æ£è§£ã§ã¯ãªãïŒããšãã°ã10åãŸãããçµæã ãã©ãæ¬åœã¯8åãŸããã ãã§6é¢æããããïŒãããªã®ã§ãããããªããããšãä»åã®ããã°ã©ã ã®ããã«90床ãŸãããš1æãšæ°ããå Žåã¯æé·ã§ã26æã§è§£ãããããã®ã§ããã©ã瞊軞ã®æ倧å€ã26è¿èŸºã«ãªã£ãŠããŠãšãŠãé¢çœããã«ãŒããã¯ã»ãã¥ãŒãã®ççã«å°éããã®ãããããŸãããªã
ããŠãäžã®ã¢ãã¡ãŒã·ã§ã³ãèŠããšãåŠç¿ã®ãã³ã«å°ãã¥ã€è¯ããªãç¶ããŠããããã«èŠããŸããããè«æéãã«ããŒã¿éã100å件ãŸã§å¢ããã°æŽã«ç²ŸåºŠãäžããã®ãããããŸãããã§ããè«æã§ã¯å€§éã®ã³ã³ãã¥ãŒã¿ãŒã»ãªãœãŒã¹ã䜿ã£ãŠ36æéã§åŠç¿ãå®äºãããŠããã©ãäžè¬åº¶æ°ã®ç§ã¯ãããªã¹ãŽã€ç°å¢ã¯æã£ãŠããŸããâŠâŠãããã«åŠç¿ãç¶ããã®ã¯èŸãã®ã§ããã®ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã§ç¶ããããããšã«ããŸãããã
ãããããäžã®å³ã§äžäžã«ãã³ãã³ããŠå®å®ããŠããªãããã«èŠããã®ã¯ããã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã®æåŸã®Dense()
ã®ãã€ã¢ã¹é
ã®å€ãåŠç¿ã§å€æŽããããããªã®ã§ãç¡èŠããŠãã ãããDense()
ã®åŒæ°ã«useBias=False
ãå
¥ããŠããã°ããã£ãâŠâŠã誰ããããã€ã¢ã¹é
ãåé€ããŠãã§ã10æ¥ããããããŠååŠç¿ããŠãããªãããªãâŠâŠã
DeepCubeAã®å®è£ ã¯å ¬éãããŠããŠããã®äžã«ã¯TensorFlowã®ã¢ãã«ãå«ãŸããŠããŸããã ãããã¢ãã«ããªããŒã¹ããã°æ£ç¢ºãªãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãåããâŠâŠã¯ããªã®ã§ããã©ãç§ã®ã³ã³ãã¥ãŒã¿ãŒäžã®TensorFlow2.0ã§ã¯ã¢ãã«ãéããªãã£ãã®ã§æ念ããŸãããã³ãŒãã®è§£æã§ãæ å ±ãåŸãããã¯ããªã®ã§ããã©ãããŸãã«ã³ãŒããè€éã ã£ãã®ã§éæ»ã§æ念ããªã®ã§ããã¶ããã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã®æ§é ã¯è«æãšç°ãªã£ãŠããŸããããã£ãœãçµæãã§ãŠããã®ã§å€§ããééããŠã¯ããªãã¯ããªã®ã§ããã©ã誰ã調ã¹ãŠãããªãããªãâŠâŠã
è«æã§ã¯ãAlphaGoã®ããã«ããããŸã§ã«æãåªç§ãªçµæãåºãããã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãšåŠç¿çµæã®ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã察æŠãããåå©ããå Žåã«ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã眮ãæããããæ¹ãæ¡çšããŠããã®ã§ããã©ãä»åã®å®è£ ã§ã¯ãã£ãŠããŸãããäžå¿ããã«ã¯çç±ããã£ãŠãAlphaGoã®åŸç¶ã®AlphaZeroã§ã¯ããæ¹ãå€ããŠããŠã1ã€ã®ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãã²ãããåŠç¿ãããããããããç§ã¯ãããç¥ããªãã£ãã®ã§ãéå»ã«ééãã解説ãæžããŠããŸãâŠâŠããã£ã³ããªã³äºã ã®éšåãé€ãã°æŠãæ£ãããšæãã®ã ãã©ã誰ããã§ãã¯ããŠãããªãããªãâŠâŠã
ããšãäžã®ã³ãŒãã®åŠç¿çšããŒã¿ãçæããcreate_generator()
é¢æ°ã®æ»ãå€ã®é¢æ°ã§ã¯ã«ãŒããã¯ã»ãã¥ãŒãããŸããåæ°ã®æ倧å€ã31ã«ãªã£ãŠããŸããã©ãè«æã§ã¯30ã§ãããçç±ã¯åãªãèŠèœãšãã§ããã§ããŸãã31ã§ãå€åããŸãå€ãããªãããããªãããªã
æè¯åªå æ¢çŽ¢ã¯ååã®æçš¿ã§äœæããŸããã®ã§ãåãåŠçãPythonã§æžãçŽãã°çµããâŠâŠã§ã¯ãªããŠã念ã®ããã«ããäžåºŠè«æãçºããŠã¿ãããè«æã®å·çè ãBWASïŒBatch Weighted A StarïŒãšåŒãã§ããã«ã¹ã¿ãã€ãºãããA*ã䜿çšãããŠããŸããããšãã£ãŠããæè¯åªå æ¢çŽ¢ã®ããªãšãŒã·ã§ã³ã§Wikipediaã«ã説æãããA*ããã®å€æŽç¹ã¯ãWeightedãšBatchã®2ç¹ã ãã以äžããã®å€æŽç¹ã«ã€ããŠè¿°ã¹ãŸãã
A*ïŒA StarïŒã¯æè¯åªå æ¢çŽ¢ã®ããªãšãŒã·ã§ã³ã§ãè©äŸ¡é¢æ°ãããããŸã§ã®ã³ã¹ãïŒãŽãŒã«ãŸã§ã®ã³ã¹ãã®äºæž¬ããšããŠããŽãŒã«ãŸã§ã®ã³ã¹ãã®äºæž¬ãå®éã®ã³ã¹ã以äžã§ããããšãä¿èšŒãããå Žåã®ã¢ã«ãŽãªãºã ã§ããå¿ ãæççµè·¯ãæ±ãããããšããã®ã売ããªã®ã§ããã©ãä»åäœæãã深局åŠç¿ã®è©äŸ¡é¢æ°ã¯ããŽãŒã«ãŸã§ã®ã³ã¹ãã®äºæž¬ãå®éã®ã³ã¹ã以äžã§ãããããšãä¿èšŒã§ããŠããŸããã®ã§ãæ¬åœã¯A*ãããªãâŠâŠãã§ããè«æã§A*ãšåŒãã§ããã®ã§ä»¥äžA*ã§ãããŸãã
ããŠãA*ã§ã¯ãäžã§è¿°ã¹ãããã«ããããŸã§ã®ã³ã¹ãïŒãŽãŒã«ãŸã§ã®ã³ã¹ãã®äºæž¬ããå°ãããã®ããé ã«æ¢çŽ¢ãé²ããŠãããŸããã ãããã8æ©é²ãã§ããã¶ãããš2æ©ã§ãŽãŒã«ã§ãããšäºæž¬ããç¶æ ããšã2æ©é²ãã§ããã¶ãããš8æ©ã§ãŽãŒã«ã§ãããšäºæž¬ããç¶æ ãã®åªå 床ã¯åãã§ããã§ãããŸã 2æ©ããé²ãã§ããªãåŸè ã®ç¶æ ã¯ãã®å ãã®ãããæ°ã®ç¶æ ãæ¢çŽ¢ããªããã°ãªããªããã§ãã§ããã°ããš2æ©ã®åè ã®æ¢çŽ¢ãåªå ãããæ°ãããŸããã§ããåŸè ãå®å šã«ç¡èŠããã®ã¯ããããã®æ°ããããâŠâŠã
ãªã®ã§ãããããŸã§ã®ã³ã¹ãããããæãã«å²ãåŒãããšã«ããŸããããããšãã°ãè©äŸ¡é¢æ°ãã0.5ÃãããŸã§ã®ã³ã¹ãïŒãŽãŒã«ãŸã§ã®ã³ã¹ãã®äºæž¬ãã«ãã¡ãããæççµè·¯ã«ã¯ãªããªããããããªãããã©ãæ¢çŽ¢ç¯å²ãå°ãããªãããæ©ã解ãã§ãŸããããããã¯éã¿ãä»ãããšãè¡šçŸã§ããã®ã§ãWeighted A*ãšåŒã°ããŠããŸãã
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããäºæž¬ã¯ãèšç®éã倧ãããããé·ãæéãããããŸããã§ãã䞊ååã§ãããšããç¹åŸŽããããã§ããGPUçã䜿ããªãã1䞊åã§å®è¡ããå Žåãš100䞊åã§å®è¡ããå Žåã§ããåŠçæéã¯ã»ãŒåãã§ããã ããã§ããã ã䞊åã§åŠçãããâŠâŠã®ã§ãããã©ãæ®éã®A*ã ãšããã¥ãŒãã次ã®æ¢çŽ¢ããŒããååŸããéšåã䞊ååã®é害ãšãªããŸãã
ããã§ãè«æã§ã¯ããã¥ãŒããæå®ããåæ°ã®ç¶æ ãååŸããŠãããããã®æ¬¡ã®ç¶æ ãžã®é·ç§»ããããŠããã®åŸã«ãŸãšããŠãŽãŒã«ãŸã§ã®ã³ã¹ãã䞊åã§äºæž¬ãããšããããæ¹ãææ¡ããŠããŸãããŸãšããŠåŠçã¯ããããšåŒã°ããã®ã§ãBatch Weighted A*ãšããååã«ãªã£ãããã§ããªã
ã§ããã®ãããã®éšåã¯ãé«éåã ãã§ã¯ãªãã解ã®ç²ŸåºŠãšãé¢ä¿ããŸããããšãã°ãæ®éã®A*ã®ããã«ããã¥ãŒããæãã³ã¹ããäœãç¶æ ãååŸããŠãã§ããã®æ¬¡ã®ç¶æ ãäºæž¬ããã³ã¹ããèšç®ããäžã§ãã¥ãŒã«å ¥ãããšããŸããA*ã§æ¬¡ã«ååŸãããã®ã¯æãã³ã¹ããå°ããç¶æ ãšããçžããããããŸãããããä»è¿œå ããã°ããã®ç¶æ ãéžã°ãããããããŸãããã§ããBatchãã€ããA*ã ãšãè¿œå ã¯åŸåãã«ãªãã®ã§ãããåŠçéå§æç¹ã§2çªç®ã«ã³ã¹ããå°ããç¶æ ãå¿ ãéžã°ãããšãããããçµæãšããŠãç¡é§ã¯å€ããªããããããªãããã©ãæ¢çŽ¢ç¯å²ãåºãããã§ãã
ãšããããã§ãWeightedã§æ¢çŽ¢ç¯å²ãçããŠãBatchã§é«éåãã€ã€æ¢çŽ¢ç¯å²ãåºããŠããã®ããDeepCubeAãææ¡ããŠããBatch Weighted A*ã§ããä»åã¯ããã®Batch Weighted A*ãå®è£ ããŸãããã
ãŸãããã®Batch Weighted A*ã¯ãã³ãŒãã«ãããšãããããšç°¡åãªãã ãã©ãâŠâŠããããªæãã§ãã
from game import *
from heapq import *
def get_answer(initial_state, cost_model, n, l): # nã¯Batchã®æ°ã§ãlã¯Weightã®å€§ããã
def get_next_state_and_next_answers():
for _ in range(min(n, len(queue))):
_, state, answer = heappop(queue)
for action in ACTIONS.keys():
next_state = get_next_state(state, action)
next_answer = answer + (action,)
if next_state not in visited_states or visited_states[next_state] > len(next_answer):
visited_states[next_state] = len(next_answer)
yield next_state, next_answer
queue = [(0, initial_state, ())]
visited_states = {initial_state: 0}
while queue:
next_states, next_answers = zip(*get_next_state_and_next_answers())
for next_state, next_answer in zip(next_states, next_answers):
if next_state == GOAL_STATE:
return next_answer
cost_to_goals = cost_model.predict(np.array(tuple(map(get_x, next_states))), batch_size=10000).flatten()
for next_state, next_answer, cost_to_goal in zip(next_states, next_answers, cost_to_goals):
heappush(queue, (l * len(next_answer) + cost_to_goal, next_state, next_answer))
return ()
ãŸããããšããšæè¯åªå
æ¢çŽ¢ã®ã³ãŒãã¯ç°¡åãªããã ããBatchãè¡šçŸããããã«get_next_state_and_next_answers()
ãå ããŠãWeightedãè¡šçŸããããã«ã³ã¹ãèšç®ã®ãšããã«l *
ãè¿œå ããã ãã ããããé£ãããªãããããªãã
ããäžã®ã³ãŒãã®åŒæ°ã®cost_model
ã¯ãåŠç¿æžã¿ã®Kerasã®ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã§ãmodel.predict()
ã§äºæž¬ãå®è¡ã§ããŸããcost_to_goals
ã«å€ãèšå®ããŠããéšåã§ããªã
ãã®ã³ãŒãã䜿ã£ãŠãè«æãæšå¥šããŠãããã©ã¡ãŒã¿ãŒïŒn=10000ãl=0.6ïŒã§æé·ææ°ã§ãã26æãããã«ãŒããã¯ã»ãã¥ãŒãã®åé¡ïŒã«ãŒããã¯ã»ãã¥ãŒãã¯20æã§è§£ãããšããã®ã¯180床ãŸããã®ã1æãšæ°ããå Žåã§ãä»åã®å®è£ ã®ããã«90床ãŸããæ¹åŒã ãš26æã«ãªããããïŒã解ããŠã¿ããšãããã¿ããšã«æçææ°ã®26æã§è§£ããŸãããç§ã®ç°å¢ïŒCore i5 + GeForce 1080 Tiãã¡ã¢ãª16GBïŒã ãš487ç§ãããã£ããã©âŠâŠã
import batch_weighted_a_star
import tensorflow as tf
from game import *
from time import *
def main():
model = tf.keras.models.load_model('model/cost.h5')
question = "U U F U U R' L F F U F' B' R L U U R U D' R L' D R' L' D D".split(' ')
state = GOAL_STATE
for action in question:
state = get_next_state(state, action)
starting_time = time()
answer = batch_weighted_a_star.get_answer(state, model, 10000, 0.6) # è«æã ãšãæé©è§£ãåºãå Žåã¯n=10000ã§l=0.6ãè¯ããããã
print(f'{len(answer)} steps, {time() - starting_time:6.3f} seconds')
print(' '.join(map(lambda action: action if len(action) == 2 else action + ' ', question)))
print(' '.join(map(lambda action: action if len(action) == 2 else action + ' ', answer )))
tf.keras.backend.clear_session()
ããã°ã©ã ãåºããçãã¯ããããªæãã§ãã
ã»ããæçã®26æã§è§£ããŠãã§ããïŒããŸããè«æã®å®è£ ã§ãæé©è§£ãåºããã®ã¯60.3%ãšæžããŠããã®ã§ãããŸããŸãããããŸãããã©ããå®éã25æã®åé¡ã®äžã€ã§ã¯æé©è§£ãéããŠ27æã®è§£çãåºãããã£ããã
ãããªç°¡åãªã³ãŒãïŒããã深局åŠç¿ã®éšåã¯ã»ãŒã³ããŒïŒããŒã¹ãïŒãšæ°ççšã®GPUãš10æ¥éã®ãã©ãã©ã§ãä»ãŸã§äžåºŠãã«ãŒããã¯ã»ãã¥ãŒãã解ããããšããªãç§ã®ããã°ã©ã ãã«ãŒããã¯ã»ãã¥ãŒãã解ãããã§ããããæè¯åªå æ¢çŽ¢ãšæ·±å±€åŠç¿ãšãããããèŠäºã«çµã¿åãããŠãããDeepCubeAã¯çŽ æŽãããã§ããªã
è«æã«ããã°ãåãããæ¹ã§ã¹ã©ã€ãããºã«ãå庫çªãšãã解ãããããããã¶ãã解ããã瀟äŒãããã«è¯ããªãã ããããã®é£ããåé¡ããã