[1]吴 鹏,魏上清,董嘉鹏,等.基于 SARSA 强化学习的审判人力资源调度方法[J].计算机技术与发展,2022,32(09):82-88.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 013]
 WU Peng,WEI Shang-qing,DONG Jia-peng,et al.Trial Human Resources Scheduling Method Based on SARSA Reinforcement Learning[J].,2022,32(09):82-88.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 013]
点击复制

基于 SARSA 强化学习的审判人力资源调度方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年09期
页码:
82-88
栏目:
人工智能
出版日期:
2022-09-10

文章信息/Info

Title:
Trial Human Resources Scheduling Method Based on SARSA Reinforcement Learning
文章编号:
1673-629X(2022)09-0082-07
作者:
吴 鹏12 魏上清1 董嘉鹏1 潘 理12
1. 上海交通大学 电子信息与电气工程学院,上海 200240
2. 信息内容分析技术国家工程实验室,上海 200240
Author(s):
WU Peng12 WEI Shang-qing1 DONG Jia-peng1 PAN Li12
1. School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai,200240 China
2. National Engineering Laboratory for Information Content Analysis Technology,Shanghai 200240,China
关键词:
强化学习资源调度决策优化贪婪策略马尔可夫决策过程
Keywords:
reinforcement learningresource schedulingdecision optimizationgreedy strategyMarkov decision process
分类号:
TP181
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 09. 013
摘要:
为对法官员额资源进行调度优化,平衡司法资源有限和现实司法需求之间的矛盾,该文建立审判人力资源调度优化模型,提出基于强化学习的审判团队调度优化策略。 基于对审判人员调度问题和场景的分析,建立以案件的平均处理时间最小化为优化目标的审判人员调度优化数学模型以及相应的约束条件。 在此基础上建立宏观的司法系统排队模型,定义审判人力资源调度马尔可夫决策过程, 并基于状态 / 动作 / 奖励 / 状态 / 动作 ( Sate - Action - Reward - State - Action,SARSA)算法提出动态自适应的审判人员调度强化学习算法。 该算法以案件的平均处理时间为奖励,通过贪婪行为策略选择调度策略,采用时序差分更新方法在与司法系统交互的过程中学习最优调度策略。 相比于传统分案方法及其他基于规则的简单启发式算法,该算法能够提高案件审判效率、优化人力资源配置。
Abstract:
In order to optimize the scheduling of legal officials and balance the contradiction between the limited judicial resources and theactual judicial needs,a trial human resource scheduling optimization model and the trial team scheduling optimization strategy based onreinforcement learning are proposed. On the basis of analysis of the judiciary scheduling problems and scenarios,a mathematical model ofjudiciary scheduling optimization with the optimization goal of minimizing the average processing time of the case is established. On thisbasis,a macroscopic judicial system queuing model is established, the Markov decision - making process of trial human resourcescheduling is defined,and a dynamic adaptive reinforcement learning algorithm for judicial personnel scheduling based on SARSA ( Sate-Action- Reward - State - Action) is proposed. The algorithm uses the average processing time of the case as a reward, selects thescheduling strategy through the greedy behavior strategy, and uses the time - series differential update method to learn the optimalscheduling strategy in the process of interacting with the judicial system. Compared with the traditional division method and other simplerule-based heuristic algorithms,the proposed algorithm can improve the efficiency of case trials and optimize the allocation of human resources.

相似文献/References:

[1]廖宁 刘建勋 王俊年.DPSO算法在服务网格资源调度中的应用[J].计算机技术与发展,2009,(08):104.
 LIAO Ning,LIU Jian-xun,WANG Jun-nian.Application of Discrete Particle Swarm Optimization Algorithm to Service Grid Resource Optimization Scheduling[J].,2009,(09):104.
[2]徐慧慧 石磊 陈信.网格资源调度算法研究[J].计算机技术与发展,2009,(09):76.
 XU Hui-hui,SHI Lei,CHEN Xin.Research on Grid Resource Scheduling Algorithm[J].,2009,(09):76.
[3]冯林 李琛 孙焘.Robocup半场防守中的一种强化学习算法[J].计算机技术与发展,2008,(01):59.
 FENG Lin,LI Chen,SUN Tao.A Reinforcement Learning Method for Robocup Soccer Half Field Defense[J].,2008,(09):59.
[4]陈小飞 徐宏炳.基于网格的并行FFT计算研究[J].计算机技术与发展,2008,(03):67.
 CHEN Xiao-fei,XU Hong-bing.Research of Parallel FFT Computing Based on Grid[J].,2008,(09):67.
[5]汤萍萍 王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
 TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(09):142.
[6]储凡静 刘方爱.一种基于XML的个性化的资源需求描述机制[J].计算机技术与发展,2008,(06):67.
 CHU Fan-jing,LIU Fang-ai.Personal Resource Requirement Description Mechanism Based on XML[J].,2008,(09):67.
[7]王朝晖 孙惠萍.图像检索中IRRL模型研究[J].计算机技术与发展,2008,(12):35.
 WANG Zhao-hui,SUN Hui-ping.Research of IRRL Model in Image Retrieval[J].,2008,(09):35.
[8]林联明 王浩 王一雄.基于神经网络的Sarsa强化学习算法[J].计算机技术与发展,2006,(01):30.
 LIN Lian-ming,WANG Hao,WANG Yi-xiong.Sarsa Reinforcement Learning Algorithm Based on Neural Networks[J].,2006,(09):30.
[9]姜姗 刘方爱.基于多任务拍卖的资源调度算法[J].计算机技术与发展,2006,(12):86.
 JIANG Shan,LIU Fang-ai.Resource Scheduling Algorithm Based on Multi- Job Auction[J].,2006,(09):86.
[10]舒文迪 解福.基于信誉度效益最优的网格调度算法研究[J].计算机技术与发展,2011,(01):133.
 SHU Wen-di,XIE Fu.Research of Grid Dispatch Algorithm Based on Optimal Credit Benefit[J].,2011,(09):133.

更新日期/Last Update: 2022-09-10