«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2022. 09. 013]
点击复制

基于 SARSA 强化学习的审判人力资源调度方法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 32
期数:: 2022年09期

页码:: 82-88

栏目:: 人工智能

出版日期:: 2022-09-10

文章信息/Info

Title:: Trial Human Resources Scheduling Method Based on SARSA Reinforcement Learning

文章编号:: 1673-629X(2022)09-0082-07

作者:: 吴鹏¹; 2 ; 魏上清¹ ; 董嘉鹏¹ ; 潘理¹; 2; 1. 上海交通大学电子信息与电气工程学院,上海 200240
2. 信息内容分析技术国家工程实验室,上海 200240

Author(s):: WU Peng1; 2 ; WEI Shang-qing1 ; DONG Jia-peng1 ; PAN Li1; 2; 1. School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai,200240 China
2. National Engineering Laboratory for Information Content Analysis Technology,Shanghai 200240,China

关键词:: 强化学习; 资源调度; 决策优化; 贪婪策略; 马尔可夫决策过程

Keywords:: reinforcement learning; resource scheduling; decision optimization; greedy strategy; Markov decision process

分类号:: TP181

DOI:: 10. 3969 / j. issn. 1673-629X. 2022. 09. 013

摘要:: 为对法官员额资源进行调度优化,平衡司法资源有限和现实司法需求之间的矛盾,该文建立审判人力资源调度优化模型,提出基于强化学习的审判团队调度优化策略。基于对审判人员调度问题和场景的分析,建立以案件的平均处理时间最小化为优化目标的审判人员调度优化数学模型以及相应的约束条件。在此基础上建立宏观的司法系统排队模型,定义审判人力资源调度马尔可夫决策过程, 并基于状态 / 动作 / 奖励 / 状态 / 动作 ( Sate - Action - Reward - State - Action,SARSA)算法提出动态自适应的审判人员调度强化学习算法。该算法以案件的平均处理时间为奖励,通过贪婪行为策略选择调度策略,采用时序差分更新方法在与司法系统交互的过程中学习最优调度策略。相比于传统分案方法及其他基于规则的简单启发式算法,该算法能够提高案件审判效率、优化人力资源配置。

Abstract:: In order to optimize the scheduling of legal officials and balance the contradiction between the limited judicial resources and theactual judicial needs,a trial human resource scheduling optimization model and the trial team scheduling optimization strategy based onreinforcement learning are proposed. On the basis of analysis of the judiciary scheduling problems and scenarios,a mathematical model ofjudiciary scheduling optimization with the optimization goal of minimizing the average processing time of the case is established. On thisbasis,a macroscopic judicial system queuing model is established, the Markov decision - making process of trial human resourcescheduling is defined,and a dynamic adaptive reinforcement learning algorithm for judicial personnel scheduling based on SARSA ( Sate-Action- Reward - State - Action) is proposed. The algorithm uses the average processing time of the case as a reward, selects thescheduling strategy through the greedy behavior strategy, and uses the time - series differential update method to learn the optimalscheduling strategy in the process of interacting with the judicial system. Compared with the traditional division method and other simplerule-based heuristic algorithms,the proposed algorithm can improve the efficiency of case trials and optimize the allocation of human resources.

相似文献/References:

[1]廖宁刘建勋王俊年.DPSO算法在服务网格资源调度中的应用[J].计算机技术与发展,2009,(08):104.
　LIAO Ning,LIU Jian-xun,WANG Jun-nian.Application of Discrete Particle Swarm Optimization Algorithm to Service Grid Resource Optimization Scheduling[J].,2009,(09):104.
[2]徐慧慧石磊陈信.网格资源调度算法研究[J].计算机技术与发展,2009,(09):76.
　XU Hui-hui,SHI Lei,CHEN Xin.Research on Grid Resource Scheduling Algorithm[J].,2009,(09):76.
[3]冯林李琛孙焘.Robocup半场防守中的一种强化学习算法[J].计算机技术与发展,2008,(01):59.
　FENG Lin,LI Chen,SUN Tao.A Reinforcement Learning Method for Robocup Soccer Half Field Defense[J].,2008,(09):59.
[4]陈小飞徐宏炳.基于网格的并行FFT计算研究[J].计算机技术与发展,2008,(03):67.
　CHEN Xiao-fei,XU Hong-bing.Research of Parallel FFT Computing Based on Grid[J].,2008,(09):67.
[5]汤萍萍王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
　TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(09):142.
[6]储凡静刘方爱.一种基于XML的个性化的资源需求描述机制[J].计算机技术与发展,2008,(06):67.
　CHU Fan-jing,LIU Fang-ai.Personal Resource Requirement Description Mechanism Based on XML[J].,2008,(09):67.
[7]王朝晖孙惠萍.图像检索中IRRL模型研究[J].计算机技术与发展,2008,(12):35.
　WANG Zhao-hui,SUN Hui-ping.Research of IRRL Model in Image Retrieval[J].,2008,(09):35.
[8]林联明王浩王一雄.基于神经网络的Sarsa强化学习算法[J].计算机技术与发展,2006,(01):30.
　LIN Lian-ming,WANG Hao,WANG Yi-xiong.Sarsa Reinforcement Learning Algorithm Based on Neural Networks[J].,2006,(09):30.
[9]姜姗刘方爱.基于多任务拍卖的资源调度算法[J].计算机技术与发展,2006,(12):86.
　JIANG Shan,LIU Fang-ai.Resource Scheduling Algorithm Based on Multi- Job Auction[J].,2006,(09):86.
[10]舒文迪解福.基于信誉度效益最优的网格调度算法研究[J].计算机技术与发展,2011,(01):133.
　SHU Wen-di,XIE Fu.Research of Grid Dispatch Algorithm Based on Optimal Credit Benefit[J].,2011,(09):133.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1042
全文下载/Downloads485
评论/Comments