1) multi-step Q learning

多步Q学习
1.
To solve the problem of slow update speed in Q learning,a multi-step Q learning scheduling algorithm is proposed,in which the value function is updated based on the information in multiple steps.
首先建立任务调度问题的目标模型,在分析Q学习算法的基础上,给出调度问题的马尔可夫决策过程描述;针对任务调度的Q学习算法更新速度慢的问题,提出一种基于多步信息更新值函数的多步Q学习调度算法。
2.
By using multi-step information updating strategy and Metropolis criterion in simulated annealing,a new multi-step Q learning method, which is called simulated annealing-based multi-step Q learning(SAMQ), was proposed to compensate for the drawbacks of slow update speed in standard Q learning.
针对强化学习中标准Q学习算法更新速度慢的缺点,通过引入多步信息更新策略和模拟退火中的Metropolis准则,提出了一种新颖的多步Q学习算法,称为SAMQ算法。
3.
With the aim to overcome the curse of dimensionality\',an algorithm combining parallel simulation of heuristic policies and multi-step Q learning was put forward to solve the above stochastic dynamic programming model.
通过并行启发式策略进行仿真和多步Q学习,有效解决了"维数灾难"问题,结合示例阐述了算法执行过程,说明了其可行性与可靠性。
2) Q-learning

Q学习
1.
Power supplier bieding strategies based on Q-learning algorithm;

基于Q学习算法的发电商报价策略模型
2.
Research on application of multi-agent Q-learning algorithm in multiAUV coordination;

多智能体Q学习在多AUV协调中的应用研究
3.
Application of Agent-based Q-learning in the Traffic Flow Control of Single Intersection;

基于Q学习的Agent在单路口交通控制中的应用
3) Q-learning

Q-学习
1.
A Traffic Signal Control Method Based on Q-Learning;

基于Q-学习的交通信号控制方法
2.
Non-linear Control Based on Q-learning Algorithms;

基于Q-学习的非线性控制
3.
Research on regional cooperative multi-agent Q-learning;

局部合作多智能体Q-学习研究
4) B-Q learning

B-Q学习
1.
Aiming to the problem of dynamic scheduling in knowledgeable manufacturing system,the B-Q learning algorithm is proposed by combining the high intelligent characteristic of knowledgeable manufacturing cell,and a kind of adaptive scheduling control strategy is presented based on this algorithm.
针对知识化制造系统中的动态调度问题,结合知识化制造单元的高智能特征,提出了B-Q学习算法,并基于该算法构建了一种自适应调度控制策略。
5) Q learning

Q学习
1.
Application of improved Q learning algorithm to job shop problem;

改进的Q学习算法在作业车间调度中的应用
2.
The paper first presents an objective model of task scheduling,and then based on the analysis of Q learning algorithm,the Markov decision process description of the scheduling problem is given.
首先建立任务调度问题的目标模型,在分析Q学习算法的基础上,给出调度问题的马尔可夫决策过程描述;针对任务调度的Q学习算法更新速度慢的问题,提出一种基于多步信息更新值函数的多步Q学习调度算法。
3.
In this paper, a mechanism of behavior learning for soccer robot action selection based on Q learning and case based learning is proposed.
提出了一种足球机器人基于Q学习与案例学习(CBL)相结合的自主学习机制。
6) Q(λ) learning

Q(λ)学习
补充资料:部分学习与整体学习
部分学习与整体学习
part learning and whole learning
部分学习与整体学习(part learningand whole learning)在运动学习和记忆学习中,根据对学习内容的处理方式可以分成部分学习和整体学习。部分学习就是将材料分成几个部分,每次学习一个部分:整体学习就是每次学习整个材料。一般来讲,整体学习的效果优于部分学习。但是,课题复杂彼此没有意义联系的材料,用部分学习的效果好:课题简短或具有意义联系的材料,用整体学习的效果好。在进行学习时,可以将部分学习与整体学习结合起来,先进行整体学习再进行部分学习,或者相反。这种相互结合的学习方式叫做综合学习,效果更好些。 (周国帕撰成立夫审)
说明:补充资料仅用于学习参考,请勿用于其它任何用途。
参考词条