-
>
全國計算機等級考試最新真考題庫模擬考場及詳解·二級MSOffice高級應用
-
>
決戰行測5000題(言語理解與表達)
-
>
軟件性能測試.分析與調優實踐之路
-
>
第一行代碼Android
-
>
JAVA持續交付
-
>
EXCEL最強教科書(完全版)(全彩印刷)
-
>
深度學習
策略前展、策略迭代與分布式強化學習 版權信息
- ISBN:9787302599388
- 條形碼:9787302599388 ; 978-7-302-59938-8
- 裝幀:一般膠版紙
- 冊數:暫無
- 重量:暫無
- 所屬分類:>
策略前展、策略迭代與分布式強化學習 本書特色
讀者通過本書可以了解強化學習中策略迭代,特別是Rollout方法在分布式和多智能體框架下的新進展和應用。本書可用作人工智能或系統與控制科學等相關專業的高年級本科生或研究生作為一個學期的課程教材。也適用于開展相關研究工作的專業技術人員作為參考書閱讀。
策略前展、策略迭代與分布式強化學習 內容簡介
本書主要內容:第1章為動態規劃原理;第2章為策略前展與策略改進;第3章為專用策略前展算法;第4章為值和策略的學習;第5章為無限時間分布式和多智能體算法。 橫空出世的圍棋軟件AlphaZero算法對本書有很大影響。本書內容同樣基于策略迭代、值網絡和策略網絡的神經網絡近似表示、并行與分布式計算和前瞻*小化約簡技術的核心框架構建,并對算法的適用范圍做了拓展。本書的特色在于給出了分布式計算和多智能體系統框架下的強化學習策略改進計算的效率提升技術,建立了一步策略改進策略前展方法同控制系統中廣泛使用的模型預測控制(MPC)設計方法之間的聯系,并描述了策略前展方法在復雜離散和組合優化問題方面的應用。 通過閱讀本書,讀者可以了解強化學習中的策略迭代,特別是策略前展方法在分布式和多智能體框架下的近期新進展和應用。本書可用作人工智能或系統與控制科學等相關專業的高年級本科生或研究生的教材,也適合開展相關研究工作的專業技術人員作為參考書。
策略前展、策略迭代與分布式強化學習 目錄
1 Exact and Approximate Dynamic Programming Principles
1.1 AlphaZero, Off-Line Training, and On-Line Play
1.2 Deterministic Dynamic Programming
1.2.1 Finite Horizon Problem Formulation
1.2.2 The Dynamic Programming Algorithm
1.2.3 Approximation in Value Space
1.3 Stochastic Dynamic Programming
1.3.1 Finite Horizon Problems
1.3.2 Approximation in Value Space for Stochastic DP
1.3.3 Infinite Horizon Problems-An Overview
1.3.4 Infinite Horizon-Approximation in Value Space
1.3.5 Infinite Horizon-Policy Iteration, Rollout, andNewton's Method
1.4 Examples, Variations, and Simplifications
1.4.1 A Few Words About Modeling
1.4.2 Problems with a Termination State
1.4.3 State Augmentation, Time Delays, Forecasts, and Uncontrollable State Components
1.4.4 Partial State Information and Belief States
1.4.5 Multiagent Problems and Multiagent Rollout
1.4.6 Problems with Unknown Parameters-AdaptiveControl
1.4.7 Adaptive Control by Rollout and On-LineReplanning
1.5 Reinforcement Learning and Optimal Control-SomeTerminology
1.6 Notes and Sources
2 General Principles of Approximation in Value Space
2.1 Approximation in Value and Policy Space
2.1.1 Approximation in Value Space-One-Step and Multistep Lookahead
2.1.2 Approximation in Policy Space
2.1.3 Combined Approximation in Value and Policy Space
2.2 Approaches for Value Space Approximation
2.2.1 Off-Line and On-Line Implementations
2.2.2 Model-Based and Model-Free Implementations
2.2.3 Methods for Cost-to-Go Approximation
2.2.4 Methods for Expediting the Lookahead Minimization
2.3 Deterministic Rollout and the Policy Improvement Principle
2.3.1 On-Line Rollout for Deterministic Discrete Optimization
2.3.2 Using Multiple Base Heuristics-Parallel Rollout
2.3.3 The Simplified Rollout Algorithm
2.3.4 The Fortified Rollout Algorithm
2.3.5 Rollout with Multistep Lookahead
2.3.6 Rollout with an Expert
2.3.7 Rollout with Small Stage Costs and Long Horizon-Continuous-Time Rollout
2.4 Stochastic Rollout and Monte Carlo Tree Search
2.4.1 Simulation-Based Implementation of the Rollout Algorithm
2.4.2 Monte Carlo Tree Search
2.4.3 Randomized Policy Improvement by Monte Carlo Tree Search
2.4.4 The Effect of Errors in Rollout-Variance Reduction
2.4.5 Rollout Parallelization
2.5 Rollout for Infinite-Spaces Problems-Optimization Heuristics
2.5.1 Rollout for Infinite-Spaces Deterministic Problems
2.5.2 Rollout Based on Stochastic Programming
2.6 Notes and Sources
3 Specialized Rollout Algorithms
3.1 Model Predictive Control
3.1.1 Target Tubes and Constrained Controllability
3.1.2 Model Predictive Control with Terminal Cost
3.1.3 Variants of Model Predictive Control
3.1.4 Target Tubes and State-Constrained Rollout
3.2 Multiagent Rollout
3.2.1 Asynchronous and Autonomous Multiagent Rollout
3.2.2 Multiagent Coupling Through Constraints
3.2.3 Multiagent Model Predictive Control
3.2.4 Separable and Multiarmed Bandit Problems
3.3 Constrained Rollout-Deterministic Optimal Control
3.3.1 Sequential Consistency, Sequential Improvement, and the Cost Improvement Property
3.3.2 The Fortified Rollout Algorithm and Other Variations
3.4 Constrained Rollout-Discrete Optimization
3.4.1 General Discrete Optimization Problems
3.4.2 Multidimensional Assignment
3.5 Rollout for Surrogate Dynamic Programming and Bayesian Optimization
3.6 Rollout for Minimax Control
3.7 Notes and Sources
4 Learning Values and Policies
4.1 Parametric Approximation Architectures
4.1.1 Cost Function Approximation
4.1.2 Feature-Based Architectures
4.1.3 Training of Linear and Nonlinear Architectures
4.2 Neural Networks
4.2.1 Training of Neural Networks
4.2
策略前展、策略迭代與分布式強化學習 作者簡介
Dimitri P. Bertsekas,德梅萃 P.博塞克斯(Dimitri P. Bertseka),美國MIT終身教授,美國國家工程院院士,清華大學復雜與網絡化系統研究中心客座教授。電氣工程與計算機科學領域國際知名作者,著有《非線性規劃》《網絡優化》《動態規劃》《凸優化》《強化學習與最優控制》等十幾本暢銷教材和專著。
- >
企鵝口袋書系列·偉大的思想20:論自然選擇(英漢雙語)
- >
上帝之肋:男人的真實旅程
- >
苦雨齋序跋文-周作人自編集
- >
【精裝繪本】畫給孩子的中國神話
- >
伯納黛特,你要去哪(2021新版)
- >
伊索寓言-世界文學名著典藏-全譯本
- >
龍榆生:詞曲概論/大家小書
- >
人文閱讀與收藏·良友文學叢書:一天的工作