policy iteration 中文意思是什麼
policy iteration
解釋
策略迭代法-
The aco algorithms are fitted into the framework of generalized policy iteration ( gpi ) in rl based on incomplete information of the markov state. furthermore, we show that the pheromone update in the acs and ant - q algorithm is based on the mc methods or some formalistic combination of mc methods and td methods
此外在強化學習的理論框架內說明了as演算法是一種基於蒙特卡洛方法的強化學習演算法, acs和ant - q演算法是一種蒙特卡洛方法與瞬時差分方法在形式上相結合的強化學習演算法。 -
To solve the problem that the ph distribution proposed changes the state space of system, the value iteration algorithm for the semi - markov decision process is improved to get the optimal inspection and maintenance policy
將位相型( ph )分佈引入模型后,決策過程的狀態空間發生變化,為了獲得適用於原有模型假設的檢測與維修優化策略,提出了一種改進的值迭代演算法。
分享友人