IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, ACCEPTED 17 MAY 2023. 19
[34] Z. Fu, L. Yu, and X. Niu, “Trace: Travel reinforcement recommenda-
tion based on location-aware context extraction,” ACM Trans. Knowl.
Discovery Data, vol. 16, no. 4, pp. 1–22, August 2022.
[35] Y. Sun, F. Zhuang, H. Zhu, Q. He, and H. Xiong, “Cost-effective
and interpretable job skill recommendation with deep reinforcement
learning,” in Proc. WWW, 2021, pp. 3827–3838.
[36] Y. Wang, “A hybrid recommendation for music based on reinforcement
learning,” in Proc. PAKDD, 2020, pp. 91–103.
[37] P. Wei, S. Xia, R. Chen, J. Qian, C. Li, and X. Jiang, “A deep-
reinforcement-learning-based recommender system for occupant-driven
energy optimization in commercial buildings,” IEEE Internet Things J.,
vol. 7, no. 7, pp. 6402–6413, July 2020.
[38] X. He, B. An, Y. Li, H. Chen, R. Wang, X. Wang, R. Yu, X. Li, and
Z. Wang, “Learning to collaborate in multi-module recommendation via
multi-agent reinforcement learning without communication,” in Proc.
ACM Conf. Rec. Syst., 2020, pp. 210–219.
[39] G. Ke, H.-L. Du, and Y.-C. Chen, “Cross-platform dynamic goods
recommendation system based on reinforcement learning and social
networks,” Appl. Soft Comput., vol. 104, p. 107213, June 2021.
[40] J. O, J. Lee, J. W. Lee, and B.-T. Zhang, “Adaptive stock trading with
dynamic asset allocation using reinforcement learning,” Inf. Sci., vol.
176, no. 15, pp. 2121–2147, August 2006.
[41] J. Zhang, B. Hao, B. Chen, C. Li, H. Chen, and J. Sun, “Hierarchical
reinforcement learning for course recommendation in moocs,” in Proc.
AAAI, 2019, pp. 435–442.
[42] Y. Lin, F. Lin, W. Zeng, J. Xiahou, L. Li, P. Wu, Y. Liu, and
C. Miao, “Hierarchical reinforcement learning with dynamic recurrent
mechanism for course recommendation,” Knowl.-Based Syst., vol. 244,
p. 108546, May 2022.
[43] L. Wang, W. Zhang, X. He, and H. Zha, “Supervised reinforcement
learning with recurrent neural network for dynamic treatment recom-
mendation,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery
Data Mining, 2018, pp. 2447–2456.
[44] Z. Zheng, C. Wang, T. Xu, D. Shen, P. Qin, X. Zhao, B. Huai, X. Wu,
and E. Chen, “Interaction-aware drug package recommendation via
policy gradient,” ACM Trans. Inf. Sys., pp. 1–32, February 2022.
[45] S. M. Shortreed, E. Laber, D. J. Lizotte, T. S. Stroup, J. Pineau, and
S. A. Murphy, “Informing sequential clinical decision-making through
reinforcement learning: an empirical study,” Mach. learn., vol. 84,
no. 1, pp. 109–136, July 2011.
[46] M. M. Afsar, T. Crump, and B. H. Far, “Reinforcement learning based
recommender systems: A survey,” ACM Comput. Surv., pp. 1–37, June
2022.
[47] X. Chen, L. Yao, J. Mcauley, G. Zhou, and X. Wang, “A survey of deep
reinforcement learning in recommender systems: A systematic review
and future directions,” ArXiv Preprint ArXiv:2109.03540v1, 2021.
[48] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis,
“Optimal and autonomous control using reinforcement learning: A
survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp.
2042–2062, June 2018.
[49] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,
2nd ed. Massachusetts Ave, MA: MIT, 2018.
[50] C. J. Watkins and P. Dayan, “Technical note q-learning,” Mach. Learn.,
vol. 8, no. 3, pp. 279–292, May 1992.
[51] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforce-
ment learning,” ArXiv Preprint ArXiv:1312.5602, 2013.
[52] R. J. Williams, “Simple statistical gradient-following algorithms for
connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3, pp.
229–256, May 1992.
[53] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy
gradient methods for reinforcement learning with function approxima-
tion,” in Proc. NIPS, 2000, pp. 1057–1063.
[54] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
“Proximal policy optimization algorithms,” ArXiv Preprint
ArXiv:1707.06347, 2017.
[55] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust
region policy optimization,” in Proc. ICML, 2015, pp. 1889–1897.
[56] S. Levine and V. Koltun, “Guided policy search,” in Proc. ICML, 2013,
pp. 1–9.
[57] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” Siam
Journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166,
2003.
[58] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T. P. Lillicrap,
D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep
reinforcement learning,” in Proc. ICML, 2016, pp. 1928–1937.
[59] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
policy maximum entropy deep reinforcement learning with a stochastic
actor,” in Proc. ICML, 2018, pp. 1856–1865.
[60] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Ried-
miller, “Deterministic policy gradient algorithms,” in Proc. ICML,
2014, pp. 387–395.
[61] S. Adam, L. Busoniu, and R. Babuska, “Experience replay for real-
time reinforcement learning control,” IEEE Trans. Syst. Man Cybern.,
vol. 42, no. 2, pp. 201–212, March 2012.
[62] X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose, “Self-supervised
reinforcement learning for recommender systems,” in Proc. SIGIR,
2020, pp. 931–940.
[63] F. Liu, H. Guo, X. Li, R. Tang, Y. Ye, and X. He, “End-to-end
deep reinforcement learning based recommendation with supervised
embedding,” in Proc. WSDM, 2020, pp. 384–392.
[64] N. Taghipour, A. Kardan, and S. S. Ghidary, “Usage-based web
recommendations: A reinforcement learning approach,” in Proc. ACM
Conf. Rec. Syst., 2007, pp. 113–120.
[65] Y. Zhang, C. Zhang, and X. Liu, “Dynamic scholarly collaborator
recommendation via competitive multi-agent reinforcement learning,”
in Proc. ACM Conf. Rec. Syst., 2017, pp. 331–335.
[66] T. Mahmood and F. Ricci, “Learning and adaptivity in interactive
recommender systems,” in Proc. ICEC, 2007, pp. 75–84.
[67] R. Gao, H. Xia, J. Li, D. Liu, S. Chen, and G. Chun, “Drcgr: Deep
reinforcement learning framework incorporating cnn and gan-based for
interactive recommendation,” in Proc. IEEE Int. Conf. Data Mining
(ICDM), 2019, pp. 1048–1053.
[68] Y. Lei and W. Li, “Interactive recommendation with user-specific deep
reinforcement learning,” ACM Trans. Knowl. Discovery Data, vol. 13,
no. 6, p. 61, October 2019.
[69] L. Zou, L. Xia, Z. Ding, J. Song, W. Liu, and D. Yin, “Reinforcement
learning to optimize long-term user engagement in recommender
systems,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery
Data Mining, 2019, pp. 2810–2818.
[70] L. Zou, L. Xia, P. Du, Z. Zhang, T. Bai, W. Liu, J.-Y. Nie, and D. Yin,
“Pseudo dyna-q: A reinforcement learning framework for interactive
recommendation,” in Proc. WSDM, 2020, pp. 816–824.
[71] S. Zhou, X. Dai, H. Chen, W. Zhang, K. Ren, R. Tang, X. He,
and Y. Yu, “Interactive recommender system via knowledge graph-
enhanced reinforcement learning,” in Proc. SIGIR, 2020, pp. 179–188.
[72] R. Zhang, T. Yu, Y. Shen, H. Jin, and C. Chen, “Text-based interactive
recommendation via constraint-augmented reinforcement learning,” in
Proc. NIPS, 2019, pp. 15 214–15 224.
[73] H. Chen, X. Dai, H. Cai, W. Zhang, X. Wang, R. Tang, Y. Zhang, and
Y. Yu, “Large-scale interactive recommendation with tree-structured
policy gradient,” in Proc. AAAI, 2019, pp. 3312–3320.
[74] W. Liu, F. Liu, R. Tang, B. Liao, G. Chen, and P. A. Heng, “Balancing
between accuracy and fairness for interactive recommendation with
reinforcement learning,” in Proc. Pacific-Asia Conf. Knowl. Discovery
Data Mining, 2020, pp. 155–167.
[75] T. Xiao and D. Wang, “A general offline reinforcement learning
framework for interactive recommendation,” in Proc. AAAI, 2021.
[76] T. Yu, Y. Shen, R. Zhang, X. Zeng, and H. Jin, “Vision-language
recommendation via attribute augmented multimodal reinforcement
learning,” in Proceedings of the 27th ACM International Conference
on Multimedia, 2019, pp. 39–47.
[77] F. Liu, R. Tang, X. Li, W. Zhang, Y. Ye, H. Chen, H. Guo, Y. Zhang,
and X. He, “State representation modeling for deep reinforcement
learning based recommendation,” Knowl.-Based Syst., vol. 205, p.
106170, October 2020.
[78] M. S. Llorente and S. E. Guerrero, “Increasing retrieval quality in
conversational recommenders,” IEEE Trans. Knowl. Data Eng., vol. 24,
no. 10, pp. 1876–1888, October 2012.
[79] T. Mahmood and F. Ricci, “Improving recommender systems with
adaptive conversational strategies,” in Proc. HT, 2009, pp. 73–82.
[80] Y. Wu, C. Macdonald, and I. Ounis, “Partially observable reinforcement
learning for dialog-based interactive recommendation,” in Proc. ACM
Conf. Rec. Syst., 2021, pp. 241–251.
[81] D. Tsumita and T. Takagi, “Dialogue based recommender sys-
tem that flexibly mixes utterances and recommendations,” in Proc.
IEEE/WIC/ACM Int. Conf. Web Intelligence, 2019, pp. 51–58.
[82] W. Lei, G. Zhang, X. He, Y. Miao, X. Wang, L. Chen, and T.-S. Chua,
“Interactive path reasoning on graph for conversational recommenda-
tion,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2020, pp. 2073–2083.